OCR Unlocks Lost Content

OCR could be the best three-letter acronym in legal tech today due to the powerful technology behind those seemingly inconspicuous letters. OCR stands for Optical Character Recognition, a type of software solution used to unlock unreadable content, such as images or scanned materials, stored in a document management platform like NetDocuments.

Suppose you wanted to access the text in a contract, article, or letter for re-use. You could spend hours retyping and correcting misprints, or you could convert the source file into editable text in several minutes using the NetDocuments OCR module.

How Does Optical Character Recognition Work?

Built-in conjunction with our partners in document processing, DocsCorp and ABYYY, the NetDocuments OCR processes content in three steps:

  1. Content Grouping

    First, all customer content is evaluated and placed into two groups: readable and unreadable.

    • “Readable” content consists of digital files that a computer can read and index.
    • “Unreadable” content are files that contain content a human can read as text, such as a written letter, but a computer cannot. This group is subject to the OCR conversion process.
  2. Structural Analyzation

    If the document requires OCR, NetDocuments analyzes the structure of the document image and divides the page into elements such as blocks of texts, tables, images, etc.

    The document is then decomposed further into words and characters which are then isolated and compared to a set of patterns. This comparison iterates though multiple hypotheses, considering different variants based on character context and form.

  3. Character Matching

    The program selects the best match based on multiple permutations and returns the correct text string for the associated image. The program relies on a dictionary of multiple languages, which enables secondary analysis of the text elements on a word-by-word level.

    The combination of probabilistic analysis and dictionary matching produces a high-quality text conversion, frequently better than retyping and certainly less expensive and time-consuming.

Unique Correction Attributes

After receiving images, the NetDocuments ABBYY engine performs two image processing functions to improve the quality of document images. These unique pre-processing functions increase the quality of images before the OCR function takes place.

Getting Back Lost Value

As of May 2019, 25,000 users have used the NetDocuments OCR to unlock hidden content. So far, 30% of all customer documents — including PDFs, emails, and images — have been digitized using the NetDocuments OCR. In fact, our customer studies have shown that customers are saving time, improving client satisfaction, and even reducing firm risk.

Looking for a great perspective on the importance of return-on-investment in our legal tech? Check out Dear Legal Tech, We Need to Talk about Money ROI is by Richard Tromans in Artificial Lawyer. Or, if you’re ready to start getting true ROI from your legal technology today, request a demo of the NetDocuments platform and OCR technology today.

Share:

Related Posts