Can OCR Handle My Handwriting in 2024?

A quick experiment using four readily available options.

I tend to be one of the archivists in my family: photos from the 70s, my notes from undergrad, last week’s car maintenance invoice…you name it, it’s somewhere on my drive.

Recently I was trying to find something based on the contents of a file (for example, show me PDFs that say “Formula” somewhere on one of the pages) and realized that my digitization habits from 10+ years ago were…inconsistent, at best. Many of the scans I did at that time were saved as images, so there’s no way to search these for text without opening each image one at a time.

At first, it seemed like a good time to convert a lot of these old images to PDFs…with recent advances in AI technology and amazing smartphone features like grabbing text from photos with the press of a finger, I figured that most OCR technology must be well past the point of being able to recognize my handwriting consistently. (Additionally, a weird flex: I’ve been told my entire life that my handwriting is quite good, which only boosted my confidence that there would be a useful solution here.)

But it turns out that in this space, results still vary dramatically…so let’s take a closer look.

A classic pangram

We’ve probably all seen it before. Here it is from my hand to your eyes:

The quick brown fox jumped over the lazy dog.

Let’s nitpick to give OCR the benefit of the doubt: the e in over could be mistaken for c, and the z in lazy is squeezed. Feel free to judge the quality in other ways before looking at the results below.

I tested four programs that were readily available to me with very little effort (and if you know of others that would outperform all of the below methods, I would love to hear about them):

Epson ScanSmart (came with my scanner)
OCRmyPDF (open source, based on Tesseract OCR)
Adobe Acrobat
macOS Preview

All I did was use each program’s built-in method for adding OCR data to the file, then highlighted, copied, and pasted the resulting text below.

Results

Let’s see how each program did…

Epson ScanSmart

00}kck brovin-Gxi'velpe_ciovcr 4-k. 100/do3.

Comments: not close, pretty useless. A painful start to finding a solution!

OCRmyPDF

The Quick brown 40x _yerped over

Comments: “yerp!” Pretty funny misinterpretations here — you can see how fox could feasibly be fuzzed to 40x. Still, objectively a lot better than Epson’s software.

Adobe Acrobat

-C-~e.<.\_v\ck brovri ..fuxjvr,pe....roJv.e...r ~ 'O{Jd..oI~.

Comments: no, seriously…that’s it. Adobe OCR is completely useless when it comes to my handwriting. I was so shocked that I even tried the Adobe Scan app that Adobe recommends in their own article on the topic, and the result was still total gibberish.

macOS Preview

The quick brown fox jumpedover the laz dog.

Comments: finally, something close! It’s quite reasonable to interpret no space between “jumped” and “over”, and we lost the “y” in lazy thanks to my lazy “z”. Otherwise, perfect…we have an early winner.

Bonus: Mathpix

Mathpix has a very limited free tier, but luckily one single PDF doesn’t go over the limit. Results: The quick brown fox jumped over the lazy dog.

Comments: absolutely perfect…but as far as I can tell, you can only download the original PDF back to your machine without the OCR data attached, so it’s not a candidate for us. (Looks incredibly useful for conversions to LaTeX or Markdown, however.)

Conclusion

It’s painfully clear now that some OCR solutions are still only optimized for standardized computer fonts, so I’m continuing to research possibilities for bringing my old digitized, handwritten notes into 2024 and making them text-searchable. Right now the frontrunner is using macOS Preview to re-export each PDF with embedded text data, but it sure would be nice to find something scriptable and/or open source. If you have any tips, tricks, or suggestions for a workflow, feel free to share them.