As published on

Just how valuable is document content? Most organizations cannot answer this question. Why? This type of content typically exists in an unstructured state. They may have bits and pieces of what their documents hold, but most companies lack four distinct elements that make content truly valuable: Classification, Metadata, Searchability and Correlation.

Understanding and having accessibility to documents is important because the information can hold insights into customers, service, trends, risks and opportunities. To glean maximum value from any document, all four must be available to the organization, and there must be a single, unified catalyst technology to expose all of them. Below is a quick overview of each element.


Classification, or categorization, answers the most basic content value question: “What is this?” When I know the type of document, I know how to apply security, how to set retention schedules, what workflows might be applicable and its overall importance to the business. Even more importantly, I can now add additional elements of value. For a contract, knowing the expiration date and the terms are extremely valuable to the business. This type of layering of the elements of value that can help an organization maximize document data usage, and increase the business impact of content.


Attributes (metadata) — the key data that is extracted — that define a document at a deeper level can be extremely valuable. Having metadata that is specific to a type of document extends that value. There is a caveat: these attributes need to be complete and pervasive. For example, if you only have vendor metadata on half of the invoices, the data is almost useless. Even more importantly, the information needs to be accurate and validated. Having a wrong invoice number is worse than no invoice number at all. Accurate metadata about a document drives context, and facilitates and enables core content business functions:

  • Where to store the document
  • Security rules
  • Metadata search
  • Workflow
  • Business rules and actions
  • Approvals and routing
  • Compliance and governance


Insuring that a document is completely searchable makes sure the right user can find it at the right time. Having 30,000 scanned PDFs from your copier in your content services repository may be a good first step, but if you cannot search the content, the value drops significantly. However, searchability is not the only consideration for content value. This element combined with the others, creates a powerful value multiplier effect. One note here, we typically associate this element with just pure text searchability, which ties to human usage. Modern platforms create a format that also provides enhanced capabilities to applications and developers. Once a document is converted with an advanced document capture engine, it goes beyond pure text, and contains a dimensional analytic fingerprint for broad and extensive computer use.


At the highest value level of the content elements is correlation. Correlation is a linkage, through structured data, that ties document content together: the digital breadcrumbs. This can be the most difficult and manual process an organization will undertake (without the correct technology). In order to facilitate correlation, key steps that need to occur:

  1. Definition of a classification and extraction model.
  2. A conversion from unstructured to structured.
  3. Document relationship mapping.
  4. Transfer into a proper visualization tool.

Therefore, correlation value relies on all the above elements. An example? A company that is doing GDPR discovery and wants to see all the documents tied to an initial banking application. Alternatively, an oil and gas entity that has a well failure and wants to see all documents relating to the install and safety checks on the bad component.

Now that we know about these four elements of value, the question becomes “How do we build a strategy to unlock the maximum value of our document content?”