IDP and OCR: Delivering Complete Document Automation Solutions

It’s helpful to understand OCR’s extensive history to appreciate today’s intelligent document processing’s full capabilities. If you aren’t familiar with the term, Optical Character Recognition (OCR) technology has been around since the 1970s, originally intended as an aid for the visually impaired. It interprets printed characters and transforms them to readable text. Through the 80s and the 90s, corporations quickly found the value of scanning paper and recognizing the text through OCR software. 

Documents were now searchable and bits of data could be identified for search requirements and extraction for use in business systems. Today, OCR has moved up the food chain and fuels AI engines, business workflow and Robotic Process Automation with data. The OCR engines of the past (and present) that were industry stalwarts, like Recostar or Nuance, have been augmented and surpassed by intelligent cloud OCR services like Google Vision, Amazon Textract and Microsoft’s Computer Vision. These powerful cloud OCR services provide high accuracy and give amazing results on both machine and human-generated text. 

But here lies the problem. As fancy and accurate as these cloud OCR engines can get, the result they provide is the same – raw text with some extended information. The black box still gives you generic text output, with slightly higher accuracy. Many companies assume that is all there is to do…

Necessary Ingredients for Enterprise Technology

Having sold and delivered these intelligent document processing solutions, OCR, or the process of converting images to text, is about 10% of the solution. At a high level, an enterprise-class IDP solution needs more than basic technology. It needs what I call the OCR “Oreo Cookie”. It’s not just the creamy filling (OCR) that makes the cookie, you can’t package filling. It takes support on both sides (pre- and post-processing) to deliver the whole package.

So what’s missing from intelligent cloud OCR solutions? Let’s look at the pre- and post-processing steps. Our Head Architect for the Ephesoft platform shared a list of necessary items for the preparation of documents for an accurate OCR output of pre- and post-processing to ensure accuracy, validity and proper delivery. 

An example of pre- and post-OCR services to get to the end result

While the list is generalized, there are granular microservices required to perform individual components of the overall steps. The below example is for AP Invoice Processing, and the processes involved in interpreting invoice images.

Pre-OCR Processing Post-OCR Processing
Import Prediction AI
Normalize Interpretation
File Prep Table AI
Asset Generation Vendor AI
AI Image Prep Table Reconcile
Vendor ERP Data
Data Normalization
UI (User Interface for Exceptions)

The magic is in what wraps the OCR process, not OCR itself. Similar to an Oreo cookie. 

The Choice: Build or Buy? 

For decades, the CIO has been faced with the eternal question: Do I buy it or build it myself? In 2021, the age of Robotic Process Automation (RPA) and no-code/low-code platforms, there are way too many opportunities to go down the rabbit hole and create “apps” that are inefficient, don’t meet user expectations and fall short. To avoid any potential pitfalls, the modern-day CIO must ask these 10 questions:

  1. What are the desired results from an OCR or data extraction perspective that will mean success for the business? 
  2. Are my documents “born digital” or are they copies of copies of physical documents?
  3. Is my team capable of building and supporting the pre- and post-OCR/extraction process components?
  4. Does my team have expertise or experience in writing document processing code and/or applications?
  5. Do I have the clean and accurate data to build ML models to interpret the OCR results from <XYZ OCR engine>?
  6. Do I really want to be a software/RPA development house?
  7. Is 80% accuracy substantial in the overall business value it provides?
  8. Who will update my application and keep it current with subtle document changes and new documents that need to be interpreted, configured and understood?
  9. How will I manage app security and cloud OCR data?
  10. What is the fastest and most cost-effective path? 

The Ephesoft Difference

I wrote this post because we routinely get enveloped quickly in the build vs. buy conversation, especially with larger prospects and customers. There is always a group consultant or team that raises their hand internally and says: “That’s easy, we can build it with RPA and Textract (Google or similar solutions).” Don’t be fooled by the lure of fancy OCR promises and remember the Oreo cookie – a strong foundation will only benefit your digital transformation initiative.

Ephesoft has leveraged its 10 years of deep capture experience to build the next-generation intelligent document processing platform to drive productivity through document context. Combining the best of OCR, AI, machine learning and rules, we achieve unmatched accuracy and straight through document processing. Contact us today for a live demo and see how we can help you drive business results.