Automating Document Processing with Artificial Intelligence and Supervised Machine Learning

Every organization worldwide has the same problem: there is an influx of inbound documents, including paper, email attachments, faxes, web uploads and downloads, shared files and system produced PDFs, to name a few. With this large amount of data surrounding us, we can see that business is run by document exchange. Here are some common business processes involving documents:

  • Applications to facilitate loans and account openings
  • Invoicing to collect payments
  • Contract processing
  • Uploads or mobile captures to prove identity
  • Bank statements to present income
  • Utility bills to prove residence

Document processing requires time and effort when done manually. At scale, it can be burdensome on employee resources. Due to a high-volume document load for a single mortgage, this is a prevalent and ongoing challenge for many financial services companies. A single mortgage can require hundreds of different documents, resulting in up to 1000 pages, with a wide variety of sources and formats. Manual processing is no longer cost effective and most customers will not tolerate long processing times. What is the solution? Smart Capture®.

Smart Capture® Automation

What is Smart Capture® and how can it help financial services companies? At Ephesoft, we define Smart Capture® as an intelligent document capture tool that utilizes a form of AI or supervised machine learning technology to normalize, classify, separate, extract, validate and export metadata from documents. To illustrate how standard mortgage paperwork might benefit from a Smart Capture® system, let’s look at a common practice for processing loans.

For example, as an employee at a mortgage company, you have a large stack of paper documents in front of you, or possibly a large PDF file that contains the same content. Your first step is to sort the documents into individual document types (classify and separate). Once organized in this manner, you then need to find the data you require on each individual document (data extraction). Just as you would train a new administrator to process these files manually, you can also train a machine through a learning process to automate these tasks.

With Smart Capture®, the training is divided into two distinct steps:

  1. Classification and Separation – To train a machine (software platform) how to recognize when one document ends and another begins, a software platform starts with samples. Through supervised machine learning technology, a knowledge worker drags sample notes, applications, statements and other document types into learning folders for each specific type of document. These samples are analyzed just as an employee would process them. Words, positions, titles, sections, fonts and other dimensions become a cumulative fingerprint, identifying that document with a high level of accuracy. Once the training is complete, the machine now has the artificial intelligence to “understand” these documents.

Machine Learning: Data of Interest on a W-2 Form

Machine Learning: Data of Interest on a W-2 Form

  1. Data Extraction – Once the machine understands the mortgage document types using the supervised learning process described above, it begins to learn the data of interest. We instruct the machine to extract what we want from the documents. By pointing and clicking on the desired data fields, the system learns what data exists, but also will create logic that can span across all the documents. Demonstrating its intelligence, the system only needs three to five samples to learn what to extract. These learned rules can alleviate the requirement for manual data entry or extraction, which expedites the process.

The End Result

When we look at the product in our mortgage example, the output has two elements: documents and data. The documents are produced through the efforts of our supervised machine learned separation and classification. In our bundled PDF example, a 1000-page PDF is ingested and separated documents come out, which are named by their document types. The data extracted through supervised training is exported into a database, a line of business system or a generic form for consumption (XML or CSV).

Using Ephesoft Transact for Mortgage, the artificial intelligence offloads processing to the machine, automates the time-intensive tasks and massively reduces processing time. For true Smart Capture®, financial services and loan originators can also add a specialized cloud-based mortgage module that has over 600 mortgage document types that are already pre-trained, for fast, easy deployment and continued processing speed.

Learn more about AI and supervised machine learning in Mortgage and Financial Services.

Learn More