Free OCR Converts Your Scanned Documents To Text
Converting a scanned document to editable text is a handy trick that can save you hours of retyping documents. Free OCR converts your documents for free.
To use Free OCR you need to have your document in a PDF, JPG, GIF, TIFF or BMP format. You can upload documents up to 2MB in size which for most small document conversions will be sufficient. We tested the site with the same passage from the introduction of In Defence Of Food by Michael Pollan. Submissions were made in both PDF format and a JPG screenshot of the PDF page we used. Free OCR seems to like PDF files better, the PDF file rendered with fewer errors.
The rate of errors and quirks in the conversion wasn’t much higher than commercial OCR software and just like using a commercial solution you’ll need to go through and double check to make sure what you see and what the machine sees are the same thing. If you know of other free OCR services, share the wealth in the comments below.
Free OCR [via One Tip a Day]
- Next Post: Add Open File Location To The XP Context Menu »
- « Previous Post: Repel Flies With A Bag Of Water
Comments (AU Comments | US Comments)
The FAQ of Free OCR has at the end "Only the first page of my PDF file is processed!"...."At the moment you have to live with this constraint. In the near future it will process the first 10 pages of a PDF file."
As I have usually more than one page to process I use
[www.ocrterminal.com] which offers 20 pages free per month and it works on pdf files also.
Another option is
[www.instantocr.com]
which is good for up to 50mb files
Tip... before you use any of the services that have size limits convert your file to black and white which makes the size smaller.
Joe Duck
@Crazor: I want a bit more than that, but you're SO right!
Counterglow
@arctanb: That is bad. You'd think they'd bother to run the result through a spell checker, or at least a general rule that "tne" most likely means "the".
jokono
@gover57: + 1/2. Yes only a half. I played around with it, and like many Microsoft products it does almost everything well...except for the dealbreaker.
Those TIFFs aren't your normal, standardized TIFFs, it's a Microsoft-specific proprietary version. I think the only other option is an equally proprietary mdi format.
If Microsoft would take that program and just add PDF support, unicorns would exist again, dogs would frolick with cats, cops would have afternoon tea with robbers, etc.
AndyMan1
"....The rate of errors and quirks in the conversion wasn't much higher than commercial OCR software and just like using a commercial solution you'll need to go through and double check to make sure what you see and what the machine sees are the same thing...."
So then, how is this any better than the crap that came with my Canon printer. I may as well stick with that if this is going to be a little worse at OCR.
paintbox
@s0crates82: Open Office doesn't.
peanut_butter
@arctanb: A lot of these programs work better with scanned images which are generally at much higher dpi (300 compared to 96/72 for screen). Serif font's are also generally favored in printed works.
ABBYY Finereader though is one of the best professional apps.
peanut_butter
@Jordan10la: Evernote?
Ihaveasmartpuppy
@gover57: +1
@Crazor: +1 for sure
Are there any good programs for doing this with handwritten documents?
Jordan10la
for mac there's VelOCRaptor. It's not free, but there's no limit on the demo so you can use it as long as you want. And if you want a licence, it's only 29$
[velocraptor.com]
Philippe Mongeau
It's not free but ABBYY FineReader is the best I've seen for OCR.
mahumphrey
Adobe Acrobat includes OCR, MS Office includes OCR, heck, even Open Office might have an OCR function built in to it.
s0crates82
@Counterglow: Exactly. Have been looking for a decent OCR solution, too. Basically, all I want is scanning a PDF and make the text on it searchable/selectable for document archival.
Crazor
@gover57: +1 for Microsoft Office Document Imager. It is the best (read as most accurate) OCR software I have found. Only being able to use TIFs is a major drawback in my opinion, but for most small documents that is ok. It is rather useless for large books though as unless you have a batch conversion utility to convert large PDFs to TIF. Small documents are a simple screenshot from conversion.
John Cleaver
It's not brilliant - I took a screenshot of the sentence on its own website. Not great result:
[i44.tinypic.com]
I've used ABBYY Finereader and that does machine OCR perfectly (so far), even if it's not wonderfully scanned.
MICROSOFT OFFICE USERS: Start> programs > ms office> tools> Microsoft Office Document Imaging. is a built in tool for MS office ( i have office 2003 and it's there). take a screen grab, save it in tif format (ms paint) open with the program i mentioned first, and then, once the tif file is open, select tools menu, and click "recognize text using OCR" then when that is done, export it to word (under tools menu also). have used it a few times and have only had one error o's convert to zero's. other than that - no errors yet. haven't tried with handwriting though...
gover57
@forpeterssake:
It does say 2MB limit. Was your document within that parameter?
jupiterthunder
@Counterglow: Is there five?
jupiterthunder
@Phoshi:
Handwriting is one them, but there is some OCR software out there that can't even do the job on typeset text.
jupiterthunder
This sounds like a good Hive Five topic: top freeware OCR programs.
Counterglow
I just tried it out and it only seems to convert the first page of my PDF documents. Is it only good for one page?
forpeterssake
"Free OCR converts your documents for free."
Perfect! I was afraid there was a catch. ;)
In all seriousness though, I'm excited to try this. When I got my printer for college, it came with a similar program, but it was so huge and bloated I uninstalled it so fast. I've always though this technology was really cool, and there are many times I wish I'd have known about it sooner.
@32ndnote: It's frustrating, true. I won't even begin to talk about my woes dealing with East Asian (Japanese, Chinese) OCRing for work...
But among the many reasons that, for example, multiple letters of the Roman alphabet might be confused: poor copying technique (tilt, etc.), scan/copy resolution or quality (including specks, dirt, etc.), fonts (esp. when italics, underline, etc. come into play), and the fact that those letters look distinct to humans but are awfully similar mathematically (take "c e o" for example). This is the fundamentally fantastic thing about the human capacity for language: we thought we'd have talking robots easily by the turn of this century, but it turns out that it's a lot easier to make killer unmanned drones than it is to make a talking electronic maid... So the Army wins and the Jetsons lose.
(Another problem is document layout, but in any case this is certainly not an exhaustive list.)
celcinc
Wow. How nice this would have been when I was digging through packed up boxes to locate my printer installation disk so I could install the OCR software. At least it worked like intended once I found it and installed it. I'm going to try this one out, I think.
jupiterthunder
@32ndnote: My handwriting is atrocious.
@32ndnote:
I have good results with ABBYY(?) FineReader. It may go by another name now IIRC. For the most part, it seems you are right though.
jupiterthunder
I legitimately do not understand why we haven't gotten this down yet. There are only 26 letters in the Latin alphabet and they are all distinct.
I haven't even run into a perfect pay-to-download one.
psiokinetic