Transact

⌘K
  1. Home
  2. Transact
  3. Features and Functions
  4. Auto Key-Value Learning P...
  5. Auto KV Extraction Plugin

Auto KV Extraction Plugin

Overview

Certified for: Ephesoft Transact 2022.1.01 or above

The Auto Extraction plugin automatically generates key-value rules to extract data from a document based on values previously selected by operators on exported documents. This plugin is used with the Auto Export plugin.

Benefits of the Auto Extraction plugin include:

  1. Overriding the value populated by other extraction plugins based on the confidence score of the extracted value
  2. Marking a field as “force review” where the match level is below a configured threshold.
  3. Using a single rules database per batch class or a shared system wide database.

Installation

  1. In Ephesoft Transact, go to Administrator > System Configuration.
  2. Select Workflow Management.
  3. Drag and drop the ZIP file into the Import Plugin section.

Installation
Figure 1. Import Plugin

  1. Restart Ephesoft Transact.

Note: There is a known issue with the Infor IDM Export Plugins Phase 6 and earlier in that they remove an essential line from the applicationContext.xml file, located at [Ephesoft_Directory]\Application.

The following error may occur in the dcma-all.log file when you try to execute the plugin after restarting Transact:

Caused by: org.activiti.engine.ActivitiException: Unknown property used in expression: ${autokvextractionplugin.performAutoKVExtraction(batchInstanceID,key)}

To resolve this error, add the following line into the applicationContext.xml file, located at [Ephesoft_Directory]\Application:

<import resource="file:C:\Ephesoft\SharedFolders/customPluginJars/*.xml"/>

Note: The above line assumes Transact is installed on C:\Ephesoft\.

Configuration

  1. From the Batch Class Management page, select your batch class and click Open.
  2. Go to Modules > Extraction.
  3. Add the AUTO_KV_EXTRACTION plugin to the list of Selected Plugins.

Workflow1
Figure 2. Selected Plugins

  1. Click Deploy to update the workflow.
  2. Go to Modules > Extraction > AUTO_KV_EXTRACTION.
  3. Configure the plugin according to your requirements. Reference the table below for descriptions of the configurable properties.

Configuration
Figure 3. Configurable Properties

Configurable Property Description
Auto Extraction Enabled This switch enables the plugin. Set to True to enable.
Enabled DLF List (or Batch Class Relative path to list) Use this field to configure the list of index fields that should be monitored during validation. There are two available ways to list the index fields:

  • List of index fields separated by the pipe character ( | ).

Note: If only one field is listed, it must be terminated with a pipe character ( | ).

  • Enter the relative path to a text file that lists each index field line by line. For example: scripts-config\mydlflist.txt
Rules Filter Value DLF Name Use this field to configure the rules to be filtered by a specific value, such as the Vendor ID, GST ID, or IBAN.

To create a filtered rule, enter the name of the index field by which the rule should be filtered.

Note: Leaving this field blank will create rules without a filter.

DLF Value Overwrite Mode Select Overwrite or Do Not overwrite.

The threshold option will overwrite only if the confidence calculated by Transact of a field is lower than the value below.

DLF Overwrite Confidence Threshold This property is only used for the Threshold DLF Overwrite Mode. Otherwise, enter 0.
DLF Force Review Threshold Auto rules will be assigned a confidence based on Auto Confidence Values assignment logic.

If the confidence is below the value entered here, the DLF will be marked for operator review.

Rules Database Path This property defines the location of the database rules. There are three available ways to configure this property:

  • Leave the field empty to use the default location of [Batch_Class_Folder]\auto_kv_extraction\auto_kv_rules.db
  • Enter a path relative to the batch class folder. For example, \scripts-config\mydatabase.db
  • Enter an absolute path. For example, C:\Folder1\mydatabase.db

Auto Confidence Values Assignment Logic

The following Auto Rule Confidence values are assigned to Auto populated fields:

Confidence Value Criteria
100
  • Value regular expressions matches
  • Above, below, right, and left value regular expressions match
  • Value is found in the same zone as the learned value
90
  • Value regular expressions matches
  • Above and left value regular expressions match
  • Value is found in the same zone as the learned value
85
  • Value regular expressions matches
  • Left value regular expressions match
  • Value is found in the same zone as the learned value
85
  • Above regular expressions matches
  • Left value regular expressions match
  • Value is found in the same zone as the learned value
75
  • Value regular expressions matches
  • Above, below, right, and left value regular expressions match
  • Value is not found in the same zone as the learned value
50
  • Value regular expressions matches
  • Above and left value regular expressions match
  • Value is not found in the same zone as the learned value
20
  • Value regular expressions matches
  • Above or left value regular expressions match
  • Value is not found in the same zone as the learned value

Limitations

  1. In Ephesoft Transact 2019.2 or below, if you are using Format Conversion or a custom script to modify a DLF value, consider upgrading to 2020.1 or later. There is a known issue that the coordinates are removed from the DLF values after extraction so they no longer exactly match HOCR SPAN entries. This means an extraction rule cannot be automatically created.
  2. The same page of a document will be tested for a value that the rule was created from. For example, if a document has three pages and the total rule was created from a value on page 3, then the Auto Extraction plugin will only test page 3 for a value. Additional rules can be created for other pages.
  3. Extraction rules are not automatically created if the document type is changed between extraction and export.