There are two types of document being converted from an analogue to a digital format for this project. The tide gauge charts are being digitised, and will produce actual data files, while the historical ledgers are being scanned and will be stored as image files.
It would be nice to extract the numbers from the ledgers but, at the moment, it isn’t possible to use optical character recognition (OCR) to digitise manuscript text easily. In the earlier ledgers, most of the text is written in a cursive script that’s very pretty but not remotely machine readable – it’s very difficult to distinguish between the number 4 and the number 7, among other problems.
We are looking in to the possibility of doing a citizen science project, like oldWeather a Zooniverse project that has recently completed. oldWeather let members of the public digitise pages from ships’ weather logs – eventually converting over 1 million logbook pages. Quality control was achieved by having several people digitise each page, then checking the results matched.
Some of our ledgers have a very similar structure to the ships’ log books and it would be great if we could use a similar system to that used by the oldWeather project to generate much more historic sea level data.