From a Content Management perspective, the concept of document capture has been one of “on ramping.” Capture was a pre-processing step where you could scan documents, extract key index fields, and ingest the documents and data.  However in the past few years, capture has advanced to the stage where it can be an integral component of a Content Management strategy.

Capture is more than just a transaction-based technology. We’re entering a time when the cumulative effect of document based transactions means more than simply automating or accelerating individual processes: there is an enormous amount of intelligence that can be harvested from the unstructured content and the cadence of transactions that documents represent. This means data gleaned from documents can be used for more than simply finding the document again when needed: this information resource can be used for business analytics and decision making.

Document capture is now morphing into a third stage of maturity. Initially, documents were scanned to archive, meaning the primary purpose was to digitize documents as a record in case they were needed for future reference.  For the past decade an emphasis has been placed on scanning to process, capturing documents earlier in their lifecycle (using distributed scanners at the point of origin) and performing workflow processes with the document in an electronic format. The added benefits are reduced paper handling, faster business process throughput, and fewer errors (digital workflows can be programmed to identify missing and inaccurate information, automating rejection and correction re-routing.)

The latest incarnation of document capture is scanning to insight.  In this model, you achieve the collective benefits of archiving and process automation, but now you add the ability to analyze all of the information formerly trapped in a non-structured format. Instead of capturing a few indexing fields to either identify the document or perform a transaction, advanced capture can be used to extract relevance and meaning from all of the text contained in a document set. Moreover, the actually history and cadence of transactional documents (how & when they are submitted; actions performed upon them; approvals and database lookups) can be ingested by Content Management solutions for further analysis.

We are finding that more and more large organizations are looking at advanced capture technologies as a key component of content management. In fact, many of Ephesoft’s larger implementations have the document classification and data extraction functionalities imbedded in the organizations Content Management platform.  For these companies, capture is not merely a front end attachment for a content repository, documents are ingested throughout workflow processes, reflecting the fact that not all needed and available documents are created or available at the inception of a business process.

For example, with a mortgage approval process, a loan originator may obtain a credit application, w-2, and tax records from the potential borrow to kick off the process. Later on appraisals, property inspections, good faith estimates and similar supporting documents are often required. Rather than wait for all corresponding documentation to be collected and batch captured, a lender can bring documents into an already active workflow.  The added benefit is that by capturing all of the information contained in these documents (either at the point the document is introduced into the content management system, or later on after the workflow is complete) the organization has a wealth of additional data to use for decision making, including potential customer/prospect marketing, portfolio health analysis, investment decisions, compliance and fraud detection.

It seems to be an accepted standard that less than 20% of corporate information is in a structured format.  Even less (a 2012 IDC study estimates less than 0.5%) of available information is subject to analysis.  By integrating advanced capture technology with content management, companies can dramatically increase the available pool of data—and in many cases data only available to them, representing valuable intellectual property—for improving process cycle times and making better business decisions.