How to correct OCR errors using Adobe Acrobat

Optical character recognition, usually abbreviated to just OCR, is the process of converting image files containing letters and words (such as scans or photographs) into searchable, text-based documents.

Now that the eFiling rules in many states, for example in California and Texas, require that electronically filed submissions (including exhibits) be text searchable, it’s important to understand the OCR process and how to spot and correct errors.

Read more: How to make a PDF text searchable >>

The biggest problem with the OCR process? It is very rarely perfect. While good quality originals (like screenshots or high-resolution scans of typed letters) may be recognized at 100% accuracy, poorer quality images will be less accurately recognized.

So, low-resolution scans or fuzzy faxes may not reproduce well. Similarly, handwriting will almost never be accurately recognized as text.

Therefore, when you scan images to include as exhibits in your court filing, it is very important to conduct an audit of the OCR results and correct any glaring and significant errors before considering the document finalized and ready to file with the court.

How to audit OCR quality

Acrobat offers a feature called “preflight,” part of which allows you to make the OCR text (that is the hidden text placed beneath the image that reflects the characters that the software recognized when OCR was applied) visible.

Here’s how you make that hidden text layer visible so that you can review its accuracy:

Step one

Open your OCR’d document in Adobe Acrobat. Now, in the right-hand tools panel, enter “preflight” into the search field. Select Preflight beneath Optimize PDF.

ocr-1

Step two

The Preflight dialog box will open. In the search field, enter “Make OCR.” From the options that appear, select Make OCR text visible and then click Analyze and fix.

ocr-2

Acrobat will ask you at this stage to re-name your file. It’s recommended that you add a note, e.g. “_review” to the file name so that you can easily identify this version later on.

Step three

Now, to review the quality of the OCR that has been applied, you need to view the OCR layer. To view this layer, open the Layers panel by clicking on the layers icon in the left-hand menu.

ocr-3

Step four

The Layers panel will open. You will see two available layers that can be viewed. By default, both the scanned image and the invisible text are displayed. To review the quality of the OCR, you need to disable the image so that you can only see the OCR layer. Uncheck the eyeball beside the Visible page content icon to turn it off.

Now, only the OCR text is visible on the screen. As you can see from the example, the quality of the OCR — especially for handwritten sections — is often, to put it mildly, rather low.

ocr-4

Free download: Learn more of the Adobe Acrobat skills that are vital for successful eFiling in our free ebook >>

How to correct OCR errors

If the text that has not been correctly OCR’d is particularly pertinent, then you may wish to improve the document’s search quality by correcting the invisible text manually. Doing so is a two-step process in Acrobat:

Step one

Open your OCR’d document in Acrobat. In the right-hand Tools panel search for “Correct” and select the Correct Recognized Text option beneath Enhance Scans.

ocr-5

Step two

The Correct Text function will appear at the top of your screen. Check Review recognized text. Suspected errors will be highlighted in red. Simply select an error, type the correct text, and then click Accept.

ocr-6

As you can see, this is quite a time-consuming and laborious process. You may, therefore, choose to make a judgment based on both the importance of the document being filed and your best sense of the quality of the original to which you applied OCR when deciding how much effort you wish to give to auditing your PDF.

***

Free ebook: Core Adobe Acrobat skills for successful eFiling >>

Tagged under:

5 Comments

  1. Craig Willford Reply

    What I have done, using Wondershare PDF Editor software rather than Adobe Acrobat, is to have both non-OCR versions (well legible, including handwriting) AND an OCR version, pasted together. If the Court wants to search text, it can; if it wants to see the fully legible version, it can. The downside is that a three page document becomes a six page document (or if only the signature page needs this double treatment, then it becomes a four page document).

    To help the Court know that there are non-OCR and OCR versions available, I make that clear in the labels attached to the bookmarking.

  2. LAURA C KNOBEL-PIEHL Reply

    Some versions of Acrobat Pro DC no longer have the Make OCR Visible preflight profile. I’m wondering if you’re willing to package and share yours, as I’ve read that profiles are sharable. Many thanks for your consideration of this request.

    1. Lindsey Dean Reply

      Hi Laura,

      Thanks for your comment. You may need to upgrade your version to access additional features. Profiles are sharable within an office, as you get a certain number of users per subscription, but we can’t share outside of our department or business. You may be able to try out the upgrade for a short period of time to see if it is worth it to you.
      All the best,
      Lindsey

  3. Tina Gross Reply

    When you’ve finished your OCR corrections, how to you get the document back into a usable state? I’m clearly doing something wrong, but I cannot get the “Invisible text” layer and the “Visible page content” layer to flatten or merge in a way that results in a final document that displays only the “Visible page content” and yet retains the corrected, searchable “Invisible text.” I’ve tried everything I can think of, and I keep getting either an image-only document with no OCR, or a two-layer document that insists on displaying *both* layers if you perform a text search.

    1. Lindsey Dean Reply

      Hi Tina,

      It’s likely that after enabling the invisible layer in order to see the OCR, you need to turn it off again.

      Hope that helps!

      Thank you,
      Lindsey

Share your thoughts

(Your email is for verification only.)

*