You may have already heard the buzz about Ephesoft’s latest release of its big data documents analytics platform, Ephesoft Insight 3.1. Insight is a data exploration and interaction tool that makes hidden, unknown or difficult connections with multiple data points. The new version focuses on features that prevent fraudulent activities, builds a comprehensive look at data sets and their relationships, and empowers users with tools and customization.

Insight is a big data document analytics and business intelligence platform that leverages Ephesoft’s patented machine learning algorithms to extract meaningful and actionable information from an often-untapped resource: unstructured data on documents and images in content management systems, content repositories and other network storage. Government agencies, healthcare, banks, mortgage and insurance companies are typical industries that have large volumes of unstructured data and are prime industries to reap the competitive advantages using the technology.

The latest release allows customers to easily find trends, patterns and data relationships to avoid risks and uncover opportunities from documents. In addition to verticals and organizations that have high volumes of documents, the technology is aimed at impacting Commercial KYC programs by helping banks detect any threats or “unhealthy” relationships that corporate customers may be involved in.

Similarly, Europe’s GDPR that takes effect in May 2018 require companies to know and have access to all customer data for privacy protection. With so many unstructured documents – it is estimated that companies typically have 80% of unstructured documents, and therefore do not have easy access to that data – organizations could be in jeopardy of breaking GDPR. Insight 3.1 can help reduce exposure and uncover potential risks using its filters and crawlers.

Let’s take a look at a few of the new features, what they mean, benefits and use case examples:

  1. Anomalies and Outlier Detection

Insight now provides an out-of-the-box outlier detection tool for data analytics. Utilizing supervised machine learning to compute statistical parameters, Insight readily identifies values that fall outside of standard deviations. With an easy-to-use interface for outlier report creation, Insight supports the discovery of two types of anomalies: nominal (numeric) and categorical (textual).

Example Application: A user can teach Insight how much a medical procedure costs within a set of insurance Explanation of Benefits (EOB) documents. If a claim is submitted and the cost falls above or below the median amount, it is automatically flagged. No data scientist is required to crunch the numbers or the data. Another use case example for mortgage companies would be to train the system to detect fraudulent loans, using several examples of ones that defaulted, so new loans with the same properties will be flagged. The same application could be used to detect fraudulent credit card applications.

  1. Multiple Data Relationships with Visualization

Insight shows visual relationships between documents and records within an organization’s pool of content, including mindmaps, heat maps, graphs, charts and links, in a way that is relevant to the user and their organization.  Users have a new data exploration tool that can look at tables dynamically for hidden relationships.

Example Application: The system can learn to extract all social security numbers from a series of forms as its unique value. However, not all forms will have a social security number as an identifier, but the system can still connect documents to each other by using other common factors. This type of data relationship can be valuable when trying to reference a customer’s privacy through GDPR. A second example would be to find someone on a blacklist and referencing a passport number (its unique value) with a secondary data point, such as an address or signature.

  1. Custom Knowledge Bases

Administrators can now apply knowledge bases and language packs directly to the application to support improved extraction, including the ability to create custom data dictionaries. Users can tailor their data sources so it relates to their industry lexicon, project type or other information they are searching for. Specifically, using custom knowledge bases, a user is empowered with the ability to create a list of values or patterns of text.

Example Application:  A knowledge base containing known Service Codes or eCodes, could help companies in the healthcare industry with patient and claim-related data extraction and analysis to detect patient health patterns or claim trends.

  1. Multi-Language Recognition and Extraction

In the new release, Insight supports the analysis of multiple languages within a single document at the extraction level. Global organizations no longer must sort and separate multi-language documents with Insight’s new auto-language detection.

Example Application:  Organizations that have invoices in multiple languages can use Insight to extract information to pinpoint possible money laundering or fraudulent activities that otherwise wouldn’t be detected.

While there are other features that will further help organizations of all kinds, we can look at the ones discussed as the most informative and all-encompassing for a broader set of industries. The new release also improves upon scalability, security, speed, import tools and high availability. Entities that can benefit from Insight’s ability to sift through its big data, will ultimately have a competitive advantage with the ability to foresee issues and make the best decisions based on accurate analytics.

For questions or to schedule a personal demo of Insight 3.1, please email

Insight 3.1 Mind Map

Insight 3.1 Relationship View