A potentially handy feature of Google Docs is its option to perform optical character recognition (OCR). As we've suggested in the past, that offers a convenient and free way to convert scanned images into text. But just how accurate is it?
I recently had to scan a dozen or so pages of introductions from various novels, which I wanted rendered as text. That represented a good opportunity to test just how well Google's OCR system actually worked. Scanning the images (at a high resolution) and uploading them was relatively quick, taking a total of perhaps 20 minutes.
However, despite the fact that I was working from a professionally-printed book, the results weren't spectacularly impressive. I ended up with around 2,896 words of text, but even a quick glance showed there were lots of text artefacts and misinterpretations. Here's a fairly representative sample paragraph:
One day, having lunch. at a C01"11.er Hnuse, ji was emaptured by a conversation un statistics going my at a tabie behind. me. I turned my he-ad auf? camght a vague giimpse of a bald head.,
Editing all the text to remove all the errors took me a further 30 minutes. Some mistakes were fairly mundane and easily fixed (Google tended to interpret the letter 'i' in my original documents as the number '1', sometimes added extra spaces between words, and tended to overenthusiatically add punctuation, as you can see in the example above.) Some misinterpretations would have been difficult to correct without the reference images, and on several occasions Google's OCR engine simply ignored entire words, a problem that would not always have been obvious without a careful comparison.
Given that the Docs OCR option is free, experimenting is not a costly exercise. However, with a total time of 50 minutes spent on getting the text scanned and accurate, it wasn't as much of a time saving over typing up the material myself as I had originally expected.
How useful have you found Google Docs' OCR capabilities? Share your own experience in the comments.