What is GDPR?

The General Data Protection Regulation or GDPR goes into effect on May 25th of this year. This is an important and historic regulation, at the heart of which is the design to protect the privacy of EU citizens and their data stored in EU organizations.

But GDPR isn’t only the concern of European countries. Organizations all over the world will be affected by this regulation, provided there is any exchange of data with European companies. So even if your organization is not at direct risk of sanctions under GDPR, you could find yourself in a situation where European-based companies will not conduct business with you if you have not proven compliance with these strict data protection policies.

The Stick

In order to add weight to the data protection and policy mandate, GDPR outlines severe fines for non-compliance.

I’m certain you’ve heard about the potential sums threatened to ensure GDPR compliance: the greater of either 10 or 20 million euros or two to four percent of the company’s annual turnover from the previous year. I’ve personally heard this threat of the regulation repeated so often, it has lost a bit of its impact and significance. So let’s take a look at a real-world example.

This month, Facebook was in the news for a data breach that impacted millions of users of the social media site. The day the news broke, Facebook stock prices took a 7% hit, countless allegations were made and lawsuits were threatened. I imagine repercussions of this breach will play out in the courts and through mediators for years to come. But all of this came to light before the May 25th deadline. What would a GDPR response look like?

According to Facebook’s annual 2017 financial report released in January, the company’s total annual revenue was $40.653 billion, or just under 40 billion euros. Given the nature of this alleged data breach – which includes data transfer/sharing with a third party and site’s user rights – Facebook would be subject to the higher fine tier at 20 million euros or 4% annual revenue. Four percent of about 40.7 billion dollars is nothing to sneeze at. In addition to all the social rebuke, stock price drop, and undoubtedly significant legal fees, Facebook could have a $1.6 billion fee levied upon it.

The Carrot

In the Facebook scenario, we’re looking at a company whose value is derived from the monetization of consumer or customer personal data. This is not the case for most commercial organizations, but that doesn’t mean the data they hold or manage isn’t valuable.

GDPR was introduced not only to protect individual consumers, but also as a response to the increasing value of personal data. With the big data industry growing exponentially, countless studies have been conducted to calculate the worth of consumer data,  including information like names, addresses, wage, employment and travel history and educational background. The range in value of an individual’s personally generated data spans from approximately $3 to $480 per person. It’s the value of data that creates the need for both ethical and financial accountability.

On one side of the coin, we’re facing the threat heavy fines, social censure, legal battles and additional internal (and external) oversight. But the other side of that same coin holds the potential for enormous gain. So what can we do in the immediate future to mitigate the risk of data exposure and GDPR sanctions while simultaneously getting the most out of the content we manage? And what challenges might we face while working toward compliance?

The Challenge

Perhaps the biggest challenge might be in the actual implementation of the GDPR. Compliance with the policies and regulations outlined by this historic data protection mandate will require comprehensive changes to business practices for all companies. Many organizations, especially those that don’t specifically focus on data mining or social media data collection, have never implemented comparable levels of privacy procedures. This holds especially true for non-European companies handling EU personal data.

If we consider that up to 80% of the information an organization manages is unstructured, stored in documents and flat files, the challenge grows. How can a company protect data if it doesn’t even know what kind of data it owns, let alone where all of that data is located? So, what can we do?

The Solution

Using the Ephesoft Smart Capture® platform, we can tackle the unstructured content in the form documents and records of an organization to unlock the data stored within and meet personal data identification and protection requirements as outlined by GDPR.

Mitigating the risk of document data

The Ephesoft platform is designed to address perhaps the most challenging aspect of GDPR compliance: unstructured document data.

Searching for data within structured data sets like databases is fairly easy. When dealing with structured data, locating instances of an individual’s personally identifiable information is as straightforward as a query. It is uncomplicated. But think about cases where unstructured data formats are collected and stored. Consider onboarding processes for organizations where documents containing personal information must be collected and stored for records management purposes.


  • Credit card applications where bank statements need to be submitted.
  • Mortgage applications where utility bills or income statements are supplied.
  • Driver applications where images of individuals’ IDs and driver’s licenses are copied and kept on file.

Chaos. Order. Enlightenment.

The goal of the Ephesoft platform is to take that unstructured content and turn it into structured, searchable and correlated datasets. This unlocks the value of that data, PII or otherwise.

Avoid the stick – sanctions – and embrace the carrot, which is valuable data. Organizations that are able to extract data and thereby value and actionable information out of their unstructured content are going to be better positioned for success and better equipped for data-based decision making.


The Ephesoft product suite for intelligent content capture follows a document-specific workflow from content ingestion to document processing and through to data analytics.

First, we begin by crawling document and record repositories where PII or potentially risky customer, employee and consumer data may be stored. After document ingestion, the Ephesoft platform separates the documents, categorizes the records, and OCRs non-text searchable documents utilizing machine learning and full text-analysis of the content on each page processed. Upon document classification, multi-dimensional analysis is performed to identify and extract key values that fall under the purview of data protection as outlined by GDPR. Supervised machine learning algorithms seek patterns in text, compare values in a document against data dictionaries, look for spatial relationships between targeted extraction fields and anchor values, and analyze the content of blocks of text. And the result of these behind-the-scenes document analysis processes is a set of structured data, stored in a relational database. Data compliance officers and managers now have access to the break-down of their content repositories and instances of PII-related data at their fingertips. Data is searchable and filterable. Document-level data relationships may be identified automatically using machine learning or manually defined by an end user.

The fear of the unknown and risk of exposure as it relates to private and sensitive information protected under GDPR can be eliminated for this segment of an organization’s content.

Thank you again for your time and attention. Please feel free to reach out directly via info@ephesoft.com or to fill out a contact form on the Ephesoft website if you would like a follow-up call or demonstration.