Transact

⌘K
  1. Home
  2. Transact
  3. Features and Functions
  4. Administrator Role and Fe...
  5. Modules and Plugins
  6. Extraction Module
  7. Machine Learning Custom Dictionary Support

Machine Learning Custom Dictionary Support

Introduction

Dictionaries are a part of the machine learning mechanism. They are created and used to extract those types of values, for which it is not possible to define any specific regex. Dictionaries contain sets of predefined values like the US States, US Cities, Personal Names, etc. One of these values is selected at the time of data extraction according to the system settings.

Previously, the dictionaries were provided at the application level and were stored in the META-INF folder (EphesoftApplicationMETA-INF).

In Ephesoft Transact 4.5.0.0, the default dictionaries have been moved to the Batch Class level. The current path for the machine-learning-dictionaries folder is EphesoftSharedFoldersBC{Id}machine-learning-dictionaries (the folder structure is explained in detail below). You can also add your own custom dictionaries:

1. You can create the dictionaries at the time of DLF learning on the Validation screen. Whenever you create and click on the overlay on the screen, the Suggestion View window pop-up with all the predefined and custom regex types as well as dictionaries. Here, you can create new custom types of dictionaries. The custom dictionary can contain any number of values. Once the dictionary values are added and saved, they will be used during extraction.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word38.png

2. You can import the dictionaries from the Batch Class Management screen. The main menu of each Batch Class now includes a new machine-learning dictionaries tab. Here, you can use Import Machine Learning Dictionary(s) section to upload your dictionaries into the system. The Export button allows you to export selected folders or files.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word37.png

Machine-learning-dictionaries folder

All the dictionaries are provided at the Batch Class level. The path for the machine-learning-dictionaries folder is:

EphesoftSharedFoldersBC{Id}machine-learning-dictionaries.

This folder has two subfolders: language-packs and knowledge-base.

  • language-packs: This folder contains language specific text files with stop words (used in machine learning to filter out any words, which are not to be extracted like “and”, “the”). The user can add, modify or delete any file in this folder. By default, language-pack dictionaries are provided for English, German, French, Turkish, Spanish and Dutch language:

○ en_stopWords.txt – contains English stop words.

○ de_stopWords.txt – contains German stop words, etc.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word39.png

 

  • knowledge-base: This folder contains two subfolders – regex and dictionaries.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word42.png

○ regex: This folder contains regex specific text files.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word40.png

          > regex.txt – contains simple predefined regex like Number, Date, SSN, Amount, Email, etc. as well as custom regex that will be created by the user.

          > composite.txt – contains information about the Composite types created by the user via Suggestion View window on the Validation screen. The Composite type name (or custom block name) is mapped against the Composite type values (either created or predefined). Data will be stored in following format:

Custom_Block_Name=Custom_Regex_Name/Predefined_Regex_Name|Custom_Regex_Name/Pre-defined_Regex_Name

i.e. custom block name is followed by the equal sign, followed by a series of custom regex names or predefined regex/dictionary names, separated by pipe operator “|”.

For example, CustomBlock1=CustomId|SSN

This will create a custom block with name “CustomBlock1” which contains regex of “CustomId” (where “CustomId” is custom regex type) followed by regex of “SSN” (where “SSN” is predefined regex type).

Note: Composite block type cannot have composite types as part of its definition.

          > regex_mappings.properties – contains parent-child relation mappings for regex.

Child   =   Parent
Number   = ALL
Date   = ALL
Amount   = Number
USA_Amount   = Amount
NON_USA_Amount   = Amount
DD_MM_YYYY   = DATE
MM_DD_YYYY   = DATE, etc.

○ dictionaries: This folder contains dictionaries and dictionary_mappings.properties file.

          > dictionaries – both default dictionaries and custom dictionaries (created or imported by the user) in a .txt format.

          > dictionary_mappings.properties – contains dictionary types mapped against corresponding .txt files. Here, you can also specify whether the dictionary should be displayed in the list of Predefined Types in the Suggestion View window on the Validation screen: Dictionary Type=Dictionary File=Display: -1, 0, 1.

The following dictionaries are provided by default:

NAME=name.txt

PERSON_NAME_PREFIX=personNamePrefix.txt

PERSON_NAME_SUFFIX=personNameSuffix.txt

USA_CITY=usCity.txt

PARTIAL_CITY=partialUSCity.txt

USA_STATE=usState.txt

PARTIAL_STATE=partialUSState.txt

COMPANY_SUFFIX=companySuffix.txt

ORGANIZATION_NAME=organizationName.txt

 

Display options:

– 1 = hidden and not loaded into memory (e.g. if the dictionary is a part of the composite block type, neither the dictionary, not the composite type will be displayed in the Suggestion View window)

0 = hidden and loaded into memory (e.g. if the dictionary is a part of the composite block type, the dictionary will not be displayed, however the composite type containing it will be shown in the Suggestion View window)

1 = displayed and loaded (both the dictionary as well as all composite types containing the dictionary will be displayed in the Suggestion View window)

Note: By default, English language dictionary will be used if the required dictionary file is not present.

 

Custom Dictionary Creation

There are two ways to add a custom dictionary – you can create it on the Validation screen during DLF training or you can import it from the Batch Class Management screen.

To create a custom dictionary on the Validation screen:

  • Place your cursor in the text box of the index field to be learned in the middle pane of the Validation screen.
  • On the image view pane of the Validation screen, click on the area of the image where the index field is located (right-click to draw overlay on multiple values). An overlay appears on the image and the text box is populated with the index field value.
  • Click on the overlay to open the Suggestion View window.
  • Select Create Type option and in the Type dropdown list select Dictionary.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word15.png

  • Define the Type Name and add as many values for the dictionary as required by using the button. Use button to delete any value.

In the below example, we are creating the dictionary for the streets of Irvine city.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word1-1.png

  • Click OK to save the custom dictionary.

After saving the dictionary, a new .txt file is created in the dictionaries folder (EphesoftSharedFoldersBC{Id}machine-learning-dictionariesknowledge-basedictionaries). This custom dictionary file has the same name as given in the Type Name field and contains all the values added on the Validation screen.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word4-1.png

Next time, the newly created dictionary will be included in the Predefined Type list on the Validation screen and will be used to extract a value on the basis of the predefined value set.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word12.png

Note: If two users try to create a new custom dictionary with the same name for the same Batch Class, the dictionary entries will be merged.

 

Custom Dictionary Modification

Whenever required, you can modify the custom dictionary you’ve created. This can be done in several ways:

  1. As an operator, you can add values to your dictionary on the Validation screen,
  2. As an admin, you can modify your dictionary in the Folder Management section of Ephesoft Transact, or
  3. You can make changes directly in the dictionary .txt file on the Ephesoft Transact server.

Note: Default dictionaries can also be modified in the Folder Management section or on the Ephesoft Transact server.

Let’s review each option in detail.

1. To add values to your dictionary on the Validation screen:

  • Click on the overlay to open the Suggestion View window.
  • Select Create Type option and in the Type dropdown list select Dictionary.
  • In the Type Name dropdown, find and select your dictionary name. All values contained in the dictionary will be displayed in the Suggestion View window.

  • Use button to add values to the dictionary.

In our example, we’ve added two more values.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word17.png

  • Click OK to save the changes.

The custom dictionary is now updated according to the changes done on the Validation screen.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word18.png

 

2. To modify your custom dictionary in the Folder Management section of Ephesoft Transact:

  • On the left menu panel, select Folder Management and double-click on the selected Batch Class.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word19.png

  • Navigate to the dictionaries folder (SharedFoldersBC{Id}machine-learning-dictionariesknowledge-basedictionaries) and find your dictionary.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word20.png

  • Select the dictionary and click Edit.

  • Make the changes in your dictionary as required, the field is editable.

In our example, we’ll add three more streets to the list.

  • Click Save to save the changes.

The custom dictionary is now updated according to the changes made in the Folder Management section.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word23.png

 

3. To make changes directly in the dictionary .txt file on the Ephesoft server:

  • Navigate to the dictionaries folder (EphesoftSharedFoldersBC{Id}machine-learning-dictionariesknowledge-basedictionaries) and open the text file containing your dictionary.
  • Add/remove/change values as in an ordinary text editor.
  • Save the changes.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word24.png

Dictionary Export

Dictionaries can be exported so you can use the same dictionaries in other Batch Classes. Exported dictionary is downloaded as .zip file containing the .txt file with associated dictionary values.

You can export the dictionaries from the Batch Class Management section as well as from the Folder Management section.

To export the dictionary from the Batch Class Management section:

  • Navigate to the Batch Class Management screen and select your Batch Class.
  • Go to the machine-learning-dictionaries tab > knowledge-base > dictionaries folder.
  • Select your dictionary and click Export.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word27.png

  • Specify destination folder and click Save.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word28.png

The.zip file saved on your local machine contains your dictionary in .txt format along with all associated values.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word29.png

To export the dictionary from the Folder Management section:

  • Select your Batch Class.
  • Go to machine-learning-dictionaries > knowledge-base > dictionaries, select your dictionary and right-click.
  • Select Download option.
  • In the dialog window, specify destination folder and click Save.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word30.png

Note: If you export the dictionary_mappings.properties file and modify it before importing it again, the system will pick up the changes, and the updated file will be used to perform machine learning.

 

Dictionary Import

To import the dictionary in the Batch Class Management section:

  • Navigate to the Batch Class Management screen and select your Batch Class.
  • Go to machine-learning-dictionaries tab > knowledge-base > dictionaries.
  • In the Upload Machine Learning Dictionary(s) section, click Select Files or drag and drop the file containing the dictionary into specified area.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word31.png

The dictionary is imported successfully. Since we are importing the dictionary manually, the following message is displayed: “Please make corresponding changes in the mapping files manually”.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word32.png

To make changes in the mappings file:

  • Navigate to the Folder Management section and select your Batch Class.
  • Go to the dictionaries folder (machine-learning-dictionariesknowledge-basedictionaries) and select dictionary_mappings_properties file.
  • Click Edit.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word36.png

  • Provide the following information to perform the dictionary mapping:
Field Description
Key Define the Dictionary name (e.g. Irvine_streets). This name will appear in the Predefined Types list in the Suggestion View window on the Validation screen.
Value Define the dictionary text file (e.g. Irvine_streets.txt) and provide the Display value:

0 = do not to display Dictionary Type in the Suggestion View window on the Validation screen

1 = display Dictionary Type in the Suggestion View window on the Validation screen

For release 4500: provide display value as 1.

  • Click Save to save your changes.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word33.png

Note: Dictionary mapping can also be done directly in the dictionary_mappings properties file on the Ephesoft Transact server. For that, navigate to the dictionaries folder (EphesoftSharedFoldersBC{Id}machine-learning-dictionariesknowledge-basedictionaries), open dictionary_mappings properties file and perform the mapping as described above.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word34.png

Whenever you try to import the dictionary which already exists within the Batch Class, a pop-up is displayed containing the list of dictionaries that are already present. You can select either to override or to merge the dictionary files.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word35.png