Transact

⌘K
  1. Home
  2. Transact
  3. Features and Functions
  4. Auto Key-Value Learning Plugins

Auto Key-Value Learning Plugins

Certified for: Ephesoft Transact 2022.1.01 or above

Introduction

The Auto Key-Value (KV) set of plugins solve data drift, a common problem in document extraction where rules that once processed all data variations cannot adapt as new, unexpected data arrives. This solution addresses data drift by automatically deriving new rules from the validation corrections, so the new variants get processed in the future.

The Auto KV plugins work with the existing key-value extraction feature in Ephesoft Transact and are not intended as a replacement. The plugins help prevent additional work in the key-value extraction workflow by automatically creating new rules for new invoices from the Validation screen.

The set of plugins include the following:

  • Auto KV Export Plugin: This plugin automatically generates KV extraction rules based on operator changes to monitored values during validation. This plugin is used with the Auto KV Extraction plugin.
  • Auto KV Extraction Plugin: This plugin automatically generates key-value rules to extract data from a document based on values previously selected by operators on exported documents. This plugin is used with the Auto KV Export plugin.

Prerequisites

  • Your batch class must already be set up and configured prior to using this plugin.
  • Your batch class must be configured with Fuzzy DB, which is used to find the vendor. This information is used to create extraction templates based on user input in Validation. These rules are vendor-specific.
  • The Auto KV plugins should be used with other Ephesoft extraction capabilities for the best results, such as one of the following:
    • Key Value Extraction
    • Invoice Accelerator Batch Class (paid add-on)

Use Cases

Here are some use cases that work well with the Auto KV plugins:

  • Invoices — where you have a lookup table for vendors
  • Shipping documents — where you have a lookup table for shipping companies
  • Insurance statements — where you have a lookup table for insurance providers
  • Purchase orders — where you have a lookup table of your existing customers

Best Practices

This document can be broken down into the following three categories:

Accuracy for Data Types

Key takeaway: You will receive the best accuracy if your data patterns stay consistent across various documents, and you associate the training during Validation to a specific vendor using a filter in the Auto KV plugins.

The following table shows a breakdown of how extraction works for different data types.

Data Type Examples Notes
Numbers
  • Invoice Numbers
  • PO Numbers
Numbers need fewer iterations to receive accurate results. This is because the number of digits is not taken into consideration when making regex rules, which makes this very flexible.
Numbers with special characters
  • Currency Amounts
  • Tax Percentage
  • Phone Numbers
More samples over time will result in better accuracy because these data types have limited variations. For example, the regex for $123.00 ($\d+\.\d+) wouldn’t match $1,234.50, and a new rule will be generated.
Single or Multi-line Alphanumeric Strings
  • Payment Terms
  • Street Addresses
  • Email Addresses
You will receive the best accuracy if the pattern of characters remains consistent.

Important: An alphanumeric string may not be a good candidate for Auto KV if the data fluctuates largely between documents.

Dates
  • Invoice Dates
  • Shipping Dates
  • Delivery Dates
  • You will receive the best accuracy if the number or pattern of the characters remains consistent.
  • Dates can only follow certain patterns, meaning you should receive good accuracy after a few iterations.

Static vs Dynamic Data

Key takeaway: You will receive the best accuracy if the location of the surrounding data and the formatting is consistent for each document being processed for the vendor.

The following table shows a breakdown of how surrounding data patterns affect accuracy.

  • Depending on the position of the fields, there may be up to two empty surrounding patterns.
    • For example, a field at the bottom right will not have a right and bottom pattern, which means only a top and left pattern will be used for extraction and confidence calculation, which may result in more false positives.
Pattern Type Pattern Accuracy
Static Left and/or Top + Right and Bottom
  • Fewer iterations are required to get good accuracy.
  • If the fields don’t fluctuate, you may get good accuracy and confidence after the first iteration.
Dynamic Left, Top, Right, and Bottom
  • More iterations are required to get good accuracy.
  • If the patterns surrounding the value fluctuate greatly (mainly character strings), the confidence will drop and you may receive false positives.
Empty
  • Right and Bottom
  • Left and Bottom
  • Right and Top
  • Left and Top
  • Depending on the position of the fields, there may be up to two empty surrounding patterns.
    • For example, a field at the bottom right will not have a right and bottom pattern, which means only a top and left pattern will be used for extraction and confidence calculation, which may result in more false positives.

Single vs Multi-page Documents

Key takeaway: You will receive the best accuracy if the field you are extracting is always located on the same page, in the same place.

For a multi-page invoice, some fields can be present on multiple pages (such as invoice numbers or PO numbers). It is possible to fetch a wrong value if the training is done on a single page invoice, and vice-versa.

For example, if you train the index fields “Invoice Number” and “Amount” on a single-page document, Transact will associate these two rules with page “0”. If you were to process a multi-page document using these two rules, the Auto KV plugins will only try to extract the data from the first page (page “0”).

Because invoice numbers don’t fluctuate, the “Invoice Number” field may be extracted correctly, but the “Amount” field may be wrong if it is extracted from the wrong page.

Limitations

  • The Auto KV plugins do not currently support table extraction.

Articles