With the rapidly evolving information age, Ephesoft offers a glossary for all the terms you need to know to understand and utilize intelligent document automation.

Adaptive AI

Adaptive AI is a specific class of artificial intelligence solutions. These technologies can continuously learn and “self-update” by analyzing historical data and applying it to new data sets.


APIs or application programming interfaces are connections between computer devices and various programs. APIs allow intelligent document processing software to share information with other applications to facilitate seamless information flow between systems.

Artificial Intelligence (AI)

Artificial intelligence refers to technologies that can mimic human intelligence on a broad scale. IDP solutions use several types of AI software, such as natural language processing (NLP) and machine learning (ML).

Automation Hero

At Ephesoft, an automation hero is an individual that spearheads hyperautomation efforts within their organization. These individuals work with our experts to identify and implement the right automation solutions for the unique needs of their company.


A batch is a large grouping of records or documents that should be processed together, rather than having each record processed separately. At Ephesoft, a “batch class” refers to a project in Transact. A “batch instance” refers to a job in Transact.


Classification is a critical step when determining how to process a particular document. Classification within IDP refers to the process of identifying a document based on the characteristics or metadata it contains.

Cloud Hybrid

Also known as “hybrid cloud,” a cloud hybrid is a computing environment that leverages both on-premise resources and the public cloud. The combination offers peak performance, scalability and minimized upfront investment. Ephesoft launched the first cloud hybrid platform for document processing called Cloud HyperExtender.

Cognitive Document Processing

The most advanced forms of IDP software, such as those offered by Ephesoft, perform cognitive document processing. This means that the technology mimics the cognitive abilities of humans by using AI, machine learning and NLP tools.

Computer Vision

Computer vision technology is a field of AI. Computer vision software allows computers to analyze and derive information from digital videos, images or other visual input sources.


Certain types of AI software are described as being “contextual.” This means that the software can process documents by using additional information from historical data and other sources to improve its accuracy and, as the word suggests, provide context.

Data Extraction

Data extraction is the process of retrieving information from physical and digital documents. Once extracted, the data can facilitate the automation of various business processes.

Data of Interest (DOI)

Data of interest is the result of adaptive AI, and includes all data pairs that a user or system will need to discern context or understanding about the document’s purpose, how to process it, and how it is associated with the organization.

Day Zero Accuracy (DZA)

Day Zero Accuracy is a measure of a system’s accuracy on the first day and is helpful in understanding time to value and how quickly a system will begin returning value or ROI. IDP solutions with a high DZA enable companies to begin capturing results with high levels of accuracy as soon as the solution is implemented. A system that delivers incredibly accurate results when activated provides immediate benefits to an organization.

Digital Transformation

Digital transformation is the practice of integrating technologies into all business practices to increase efficiency and profitability.

Digitization vs. Digitalization

Digitization involves converting specific documents and images into a digital format. In contrast, digitalization is a broader term that refers to the act of transitioning away from analog systems.

Document Change Rate (DCR)

DCR is a measure representing the number of fields that must be corrected or revised during manual review. DCR per document is the total number of fields that the model misses in the given document to predict. This is a good indication of how accurate an IDP solution is and the level of confidence that can be placed on the system.


Extraction is the process of identifying and capturing specific index fields (metadata) within documents. Examples of index fields are Company Name, Address, Total Amount and others.

Field Change Rate (FCR)

The Field Change Rate is the average number of times that a field is revised on a manually reviewed document. For each given key (e.g. InvoiceNumber, TaxRate), it looks at how often do the values need to be added/changed/modified.

Handwriting Recognition (HWR)

Handwriting recognition software is a valuable component of IDP technology. HWR tools allow the software to automatically recognize and extract information from handwritten documents.

Human-in-the-Loop (HITL)

Human-in-the-loop or HITL is a method of information processing that uses a combination of machine learning capabilities and human input in order to increase its efficiency. IDP solutions should provide efficient HITL support since AI-based IDP platforms are not 100% accurate and must accommodate exceptions.


Hyperautomation is a strategy that involves identifying and automating as many IT and business processes as possible. Hyperautomation uses multiple technologies, platforms and tools to facilitate this broad-scale automation.


Indexing is a legacy approach that involves manually linking a file with a specific tag so that it can easily be located in the future. IDP solutions have automated much of this process and enable documents to be indexed with greater accuracy, efficiency, and speed.


Ingestion is the first phase of document processing. IDP ingestion is the process of accumulating documents for analysis and data extraction.


Integration is the process of interconnecting various applications, devices, APIs, etc. An IDP solution integrates with other digital resources to allow for process automation and the ability to share information between solutions.

Integration Engine

An integration engine receives data from multiple systems, modifies this information and relays it to other applications.

Intelligent Automation

Intelligent automation involves automating redundant processes using AI technologies. Cumulatively, these technologies allow businesses to streamline decision-making.

Intelligent Capture

IDP technology uses intelligent capture to identify and automatically extract critical information from electronic and paper documents without human intervention other than managing exceptions.

Intelligent Character Recognition (ICR)

ICR allows IDP solutions to learn various fonts and handwriting styles over time. This will increase its processing accuracy and ability to recognize handwriting.

Intelligent Document Processing (IDP)

IDP is a software service that extracts important data from digital and physical documents through data capture technology.

iPaaS (integration Platform as a Service)

iPaaS is a platform that standardizes the integration of applications across an entire organization. iPaaS is designed to help organizations integrate their various SaaS (software as a service) applications.


Key-values are a means of storing nonrelational data in groups or key-value pairs. This method allows organizations to store massive amounts of data.

Knowledge Graphs

This term refers to a general-purpose knowledge base and is commonly used in areas such as knowledge representation, knowledge acquisition, natural language processing, ontology engineering and the semantic web. A knowledge graph represents a collection of entities, or data of interest, along with how those entities are related and information about them, known as metadata. Today, knowledge graphs are used extensively in anything from search engines and chatbots to product recommenders, cognitive automation and other AI-based services.

Line Item Matching

Line item matching is a function of IDP and hyperautomation solutions. These technologies can ensure that invoices and purchase orders, or other sets of corresponding documents, match.

Machine Learning (ML)

Machine learning is a field of computer science and a subset of AI that involves using algorithms that can learn and adapt. Machine learning technologies evolve over time to become more accurate and efficient.


Metadata is descriptive information about specific data. Examples of metadata include who authored a file, when or how it was created, what it is about and more.

Natural Language Processing (NLP)

Natural language processing technologies can process and analyze natural language, such as those contained in physical and digital documents.

Normalization / Normalized / Normalize

Machine learning technologies must normalize data so that it is usable. The normalization process is sometimes referred to as “data cleaning.” Normalized data appears similar across various fields and records so that the software can adequately process it.

Optical Character Recognition (OCR)

OCR software or services convert raw images of text into machine-encoded text along with positional information so that these raw files can be further processed, e.g., analyzing the text to extract meaningful data.

Optical Mark Recognition (OMR)

Whereas OCR technology “reads” written text and images, OMR software reads information that is marked on tests, surveys, etc. like checkboxes. OMR technology is used to process multiple-choice tests and similar documents.

Perfect Recall Documents (PRD)

Perfect recall documents are files that have been processed with perfect accuracy. In order to be considered PRD files, the documents must be able to be recalled or queried with 100% accuracy.


Post-processing is the final phase of IDP. During this phase, extracted data is examined using a set of validation rules and AI-based processes.


Pre-processing is the first stage of IDP. During this stage, the quality of the documents is enhanced so that they can be more accurately classified and analyzed.


Recall is the measure of a model or system correctly identifying true positives.


A repository is one of the possible storage locations for documents.

Robotic Processing Automation (RPA)

Robotic Processing Automation is a technology that is used to automate repetitive business processes using bots, digital workers or robots. This software can be programmed to perform a wide array of redundant tasks.

Semantic Data

Semantic data is more than just a piece of data. Semantic data includes the meaning and intent surrounding the data. It is the deep, multi-dimensional understanding of data and the relationships between the entities. Understanding the semantic data can lead to accelerated insight through AI and machine learning.

Semantic Technologies

Semantic technologies are designed to help computer software better understand various data types. These technologies are integral to machine learning and automation and are often used to create and deploy knowledge graphs.

Semi-Structured Data

Semi-structured data is information that does not conform to specific data models but has some general structure, such as tags or other identifying markers. Semi-structured data has a structure that might be discernible to the human eye but isn’t easily processed by a computer system.

Straight-Through Processing (STP)

Straight-through processing occurs when an automated process can be performed with zero manual or human intervention. STP is a measure of how accurate and efficient an intelligent document processing system is versus how much manual validation and intervention is required.

Structured Data

Structured data is clearly defined groups of information that can easily be searched or queried because they adhere to repeatable patterns.

Unstructured Data

Unstructured data includes all information that does not follow a formal pattern or data model. This data cannot easily be searched and typically includes audio files, emails, physical documents, faxs, PDFs, etc.


Validation is the process of confirming that information gathered from documents is accurate and correct. Validation can be performed manually or through the use of AI technologies.

Workflow Automation

Workflow automation refers to the process of systematically designing and automating a specific series of tasks. Workflow automation can be used to increase productivity and efficiency. For instance, workflow automation allows businesses to auto-generate essential documents, assign tasks, sign these documents and distribute data across various systems.


>> Do you have another intelligent document automation term, word or definition you’d like to submit for consideration? Reach out to us here.