Friday 24 April 2009

What's in the name Midlands Historical Data ?

The name Midlands Historical Data - of this blog and of my web site - was chosen to reflect my project aims.

Firstly, Midlands. I only plan to include in the digital library local history books and directories from the local area of the West Midlands, by which I mean the "old" counties of Staffordshire, Shropshire, Warwickshire and Worcestershire. I want to provide depth rather than skate across the surface, as many other projects are obliged to do for budgetary reasons.

Secondly, Historical. The books I have scanned so far are mostly history books which are out of copyright and relatively rare. By producing digital copies and putting them on the web they can be shared between locations.

Thirdly, Data. Finding what you want from the 650 books on the site requires that they be turned into useable data. We are used to the miracle of Google, but someone somewhere has to create the original data. What I have done is use existing technologies to convert images of books into searchable text.

The computer process that achieves that conversion is at best 99% accurate - and for old texts it can be as low as 70% accurate. However, I took the view that an some index is better than none, and discovered that where a series of books can be scanned and indexed (for example, an annual series of local directories), the chances of finding a piece of research data - say, a person or a place - increase. Also, where the format of the data is known, as it is in an Electoral Register, software can be developed to improve the quality of the resulting index.

The project aims can thus be summarised as adding value to the original text by making it accessible and searchable - and by having enough books in the library to provde the detail that history researchers need.

No comments:

Post a Comment

Followers