This week we answer a burning question we have all asked ourselves at some point, ‘how do I sort through pdfs?’ The answer is simpler than you’d think, all thanks to something called “Optical Character Recognition”.
I have a whole stack of things that I’d love to move from the physical world to the digital world, so I can then Marie Kondo the original documents and photos into oblivion. Stacks of paper do not bring me joy.
You have a few options you can try. I’d start with an obvious one: Google. Assuming you’re creating PDFs, upload your file(s) to Google Drive. Right-click on any individual PDF, hover your mouse over “Open With,” and select “Google Docs.”
Google will then attempt to run some OCR on your PDF, and you should be able to save the resulting file as a document. You can then search through this document (and any others you convert) via Drive itself.
The more I think about it, though, that solution seems a little inelegant given how many files you have to work with. Instead, I might try a piece of software like TesseractStudio.Net — or just Tesseract OCR, if you don’t fear the command line.
You should be able to use this to create OCR data from your files, and you can then search for them directly via Windows or macOS. OCRmyPDF is another option that’s similar to Tesseract OCR, but, again, you’ll be playing with typed commands to apply OCR to your files. There’s no GUI, nor is there (direct) Windows support.
There’s also Paperwork, an open-source document cataloging tool that comes with OCR built right in, which I would definitely consider given that it’s designed to be an all-in-one piece of software for archiving, sorting, and searching documents. That sounds like it might be just what you’re looking for.
I haven’t used PDF-XChange Viewer, but others have recommended it as an option. The free version will drop watermarks into your PDFs, but it can create PDFs from images and, if I’m correct, add OCR to these and any existing PDFs you have.
It’s worth exploring, even if it’s not the ideal (free) solution. Similarly, FreeOCR can take your images or PDFs, apply OCR, and export the results as plain text files or Word documents. If you don’t mind searching through your archives that way, it’s an option.