How to make a PDF text searchable

It’s long been best practice to ensure that the PDF documents you file electronically with the court are text searchable.

That’s because one can navigate searchable documents by looking up specific words and phrases, add markup (like comments) to specific sections, and copy and paste individual blocks of text. The result is a much more convenient experience for those reading and handling the documents.

While it has long been a good idea, in California courts from January 1, 2017, it will be required that you submit text searchable documents when eFiling — including your exhibits — to the greatest extent technologically feasible.

Here’s what you need to know to ensure that you’re compliant and your documents are text searchable.

Understanding the different types of PDF

PDFs can be categorized in three ways, depending on how the file was created. How it originated also defines whether the content of the PDF can be searched or copied and pasted or whether it is “locked” in an image of the page.

  • Text-based or “true” PDFs: Digitally created PDFs, sometimes called “true” PDFs, are made by directly saving a document being drafted in a word processor (like Microsoft Word) as a PDF or by using the “print to PDF” function.
  • Image-based PDFs: Image-only PDFs are created through scanning, taking photographs, or taking screenshots. These documents are “locked” in a snapshot type image and are not searchable, cannot be copied and pasted, and cannot be marked up.
  • OCR’d or “made-searchable” PDFs: Image-based PDFs can be made text searchable via the application of optical character recognition (OCR). During the OCR process, characters and the document structure are “read.” As a result, a text layer is added to the image layer. Such documents become similar to “true” PDFs — though, depending on the quality of the image, or the recognizability of the writing, may not be 100% accurate.

Method 1: Publish directly from your word processing software (preferred)

If you want to guarantee that your final text searchable document exactly matches your original draft, then you must publish it directly as a PDF.

In the recent past, this was a complicated process that leads to some people resorting to printing and scanning to create PDFs. Today it is a quick and straightforward process. It’s never necessary to print out and scan documents you’ve written yourself in a word processor to make them into a PDF.

To save a Microsoft Word document as a PDF, follow these steps:

Step ONE

Open your document in Microsoft Word. Next, click on File and then Save As.

pic-1

Step TWO

The Save As dialog box will open. Next, click on the File Format drop down menu and choose PDF. To save your document as a PDF, click Save.

pic-2

Method 2: Apply optical character recognition in your PDF software

Note: This section describes how to apply OCR in the most recent version of Adobe Acrobat. Other PDF editing software is available. Check out our buyer’s guide for more information.

Sometimes it just isn’t possible to save directly to PDF. For example, you mat have letters or other written exhibits that only exist in paper form or as photographs or items that are hand-written. To make these items text-searchable requires that you apply optical character recognition.

Step one

If your exhibits are not already in electronic form, you’ll need to create an image by either scanning or taking a photograph of the item. This will create an image file, possibly a PDF but just as likely an image file like a TIF, PNG, or JPG.

Step two

Open the image of your file as a PDF by using the Create PDF tool in Acrobat. Choose Create PDF in the Tools menu, select your file and click Create.

pic-3

Step three

Your file will open as an image-based PDF. To apply OCR, choose Enhance Scans in the Tools menu. This will open the Enhance Scans menu at the top of the screen.

pic-4

Step four

To apply OCR, select Recognize Text followed by In This File. A secondary menu will open. Make sure that you have the correct language selected and then click Recognize Text to begin the OCR process.

pic-5

Step five

Finally, search for text in your PDF to check that the process has worked successfully. Use the keyboard shortcut Ctrl+F to open the Find menu. Type a word or phrase you know to be in the document. The word or phrase should become highlighted.

pic-6

The accuracy of the text recognition will vary depending on the quality of the image you upload. Higher resolution scans and images will be recognized more accurately. The accuracy of hand-written documents can vary widely. You should audit your document carefully before considering it final.

***

Learn all of the PDF editing skills you need to successfully eFile in our free ebook: Core Adobe Acrobat skills for successful eFiling >>

Tagged under:

5 Comments

  1. Reed James Reply

    I suggest using ABBYY FineReader for optical character recognition. Legal documents often have signatures and seals that obscure or obliterate text. With this software, you can mark regions as text, image or table and let the software read them accordingly. Many times, to get a usable format without weird section breaks and unwanted characters, it is a good idea to employ a DTP specialist to ensure optimum results.

  2. Keith Codron Reply

    Probate petitions generally must be verified, and therefore signed. Are digitally signed documents accepted by the Orange County Superior Court? If so, what is the best way to go about doing this?

    1. Richard Heinrich Reply

      Hi Keith,

      The rules for signatures on electronically filed documents are specified in the state rules of court. Rule 2.257(a) covers signatures on eFiled documents like a probate petition that are signed under penalty of perjury. Here’s what the rule says (it’s online here, too: http://www.courts.ca.gov/cms/rules/index.cfm?title=two&linkid=rule2_257):

      (a) Documents signed under penalty of perjury

      When a document to be filed electronically provides for a signature under penalty of perjury, the following applies:

      (1)The document is deemed signed by the declarant if, before filing, the declarant has signed a printed form of the document.

      (2)By electronically filing the document, the electronic filer certifies that (1) has been complied with and that the original, signed document is available for inspection and copying at the request of the court or any other party. Local child support agencies may maintain original, signed pleadings by way of an electronic copy in the statewide automated child support system and must maintain them only for the period of time stated in Government Code section 68152(a). If the local child support agency maintains an electronic copy of the original, signed pleading in the statewide automated child support system, it may destroy the paper original.

      (3)At any time after the document is filed, any other party may serve a demand for production of the original signed document. The demand must be served on all other parties but need not be filed with the court.

      (4)Within five days of service of the demand under (3), the party on whom the demand is made must make the original signed document available for inspection and copying by all other parties.

      (5)At any time after the document is filed, the court may order the filing party to produce the original signed document in court for inspection and copying by the court. The order must specify the date, time, and place for the production and must be served on all parties.

    1. Richard Heinrich Reply

      Hi, Cynthia. It’s a great question. The Judicial Council’s report on the rule change says: “This requirement would apply broadly to all electronically filed
      documents, including “papers,” exhibits, and forms.” So forms do need to be made text searchable so far as is “technologically feasible.”

Share your thoughts

(Your email is for verification only.)

*