Transact

⌘K
  1. Home
  2. Transact
  3. Features and Functions
  4. Administrator Role and Fe...
  5. Modules and Plugins
  6. Extraction Module
  7. Fuzzy DB Extraction

Fuzzy DB Extraction

Available: on-premises, cloud

This page provides an overview of the FuzzyDB plugin and its configuration in Ephesoft Transact.

Prerequisites

FuzzyDB must be added to the Extraction module and set to ON.

Overview

The FuzzyDB plugin uses a fuzzy database lookup to link your internal database to Ephesoft Transact. This allows you to pair data extracted from Ephesoft Transact with data from your internal database.

For example, refer to the following excerpt from a sample invoice.


Figure 1. Sample Invoice

From this invoice, Ephesoft Transact can extract the vendor name: “Office Depot”. However from this document the system cannot determine your internal vendor number that you have assigned to this vendor. To match this internal information (vendor number) with the extracted information (invoice), the system can learn a database table for the list of vendors to perform automated lookups. It will then use fuzzy matching logic on the extracted information.

Note: FuzzyDB can be used for extraction or validation.

Configuration

FuzzyDB must be set up in two places:

  1. Connection Manager
  2. Fuzzy DB Extraction Configuration

Connection Manager

Before configuring the FuzzyDB plugin, you must first connect your database to Ephesoft Transact.

  1. Go to Administrator > System Configuration.
  2. Open the Connection Manager and click Add.


Figure 2. Add Connection

  1. Provide the required information.


Figure 3. Connection Details

  1. Click Test Connection to ensure the connection is successful. Then click Save.

Note: The following connection (database) types are available:

  • MYSQL
  • MSSQL
  • MSSQL Windows Authentication
  • Oracle
  • MariaDB

Fuzzy DB Extraction Configuration

Once you have set up your connection in the Connection Manager, you can select that connection when adding FuzzyDB extraction to a document type.

Note: This is a Document Level Field (DLF), which means it is configured at the document type level in a batch class.

  1. Select and open a batch class.
  2. Open a document type.
  3. Go to Index Fields > Fuzzy DB Extraction Configuration. This will open the Document Fuzzy page by default.

The following options are available for mapping database data to a document:

Document Fuzzy

In Document Fuzzy, only one database can be configured per document type.


Figure 4. Document Fuzzy

  1. From the Document Fuzzy page, provide the following details from their respective drop-downs:
    • Connection
    • Table Name
    • Primary Key


Figure 5. Map Document Fuzzy Connection

  1. Click Add to map index fields with database columns.
    • Select the checkbox Is Searchable if you want to limit the search to the specified table column, not the entire database.
  2. Review the Additional Parameters Mapping section and perform any additional configurations.


Figure 6. Additional Parameters Mapping

Basic Configuration
Enabled Select this checkbox to enable basic configuration.
Confidence Threshold A value from 1-100. The minimum confidence defined for the search.
Weight Value from 0-1. Acts as a multiplying factor for the computed confidence.
Ignore Word List Enter values separated by a semi-colon ( ; ) which you want to exclude during search.
Include Pages Define the pages in a document type in which you want to search.

  • FIRSTPAGE
  • LASTPAGE
  • ALLPAGES
Max Search Results Define the maximum number of results that will be returned for the search.
Field Based Search
HOCR Search Switch Select this checkbox to enable searching based on HOCR content after index fields.
Search Column List Select the index field to search with.
  1. Click Apply to save any changes. Then click Learn DB to generate the lucene indexes.

Field Fuzzy

In Field Fuzzy, a field can be mapped with multiple databases. This is done through groups.

Note: While one field can be mapped to multiple databases, you cannot have more than one field mapped to the same table and index.

  1. Under Fuzzy DB Extraction Configuration, click on Field Fuzzy.


Figure 7. Field Fuzzy

  1. From the Field Fuzzy page, provide the following information:
    • Group Name
    • Connection
    • Table Name
    • Primary/Unique Key


Figure 8. Map Field Fuzzy Connection

  1. Click Add to map index fields with database columns.
    • Select the checkbox Is Searchable if you want to limit the search to the specified table column, not the entire database.
  2. Review the Additional Parameters Mapping section and perform any additional configurations.


Figure 9. Additional Parameters Mapping

Basic Configuration
Enabled Select this checkbox to enable basic configuration.
Confidence Threshold A value from 1-100. The minimum confidence defined for the search.
Weight Value from 0-1. Acts as a multiplying factor for the computed confidence.
Ignore Word List Enter values separated by a semi-colon ( ; ) which you want to exclude during search.
Include Pages Define the pages in a document type in which you want to search.

  • FIRSTPAGE
  • LASTPAGE
  • ALLPAGES
Max Search Results Define the maximum number of results that will be returned for the search.
  1. Click Apply to save any changes. Then click Learn DB to generate the lucene indexes.

Test Extraction

You can test the extraction results from the Document Types folder. Supported image files are PDF or TIFF.

  1. Select the document type and click Test Extraction.


Figure 10. Test Extraction

  1. Click Extract. The Extraction Type column shows how the data was extracted. If a FuzzyDB mapping matches the mapped fields, it will list FuzzyDB as the extraction type.

Figure 11. View Test Extraction

Perform Fuzzy Search and Extraction

Once you execute a batch, the document will appear on the Validation screen.

  1. In the Fuzzy Search field, enter a mapped index field (see Document Fuzzy step 2 or Field Fuzzy step 3).


Figure 12. Fuzzy Search Field

  1. A popup will display the fuzzy search results. Select the record from the table which you want to fetch. You can enter an asterisk ( * ) into the search bar to fetch all records from the table.

Note: You cannot select two rows from the same group.


Figure 13. Fuzzy Search Results

Considerations for Fuzzy Extraction

Multiple Rows:

  • If multiple rows are selected, the option with the highest confidence score will be populated in the field.
  • If multiple rows are selected that have the same confidence, the option with the higher weight will be populated.
  • If multiple rows are selected that have the same confidence and weight, the first record on the extraction list will be populated.

Multiple Fuzzy Extraction Methods:

  • If both Document Fuzzy and Field Fuzzy are configured and the same index field exists in both, the option with the higher weighted confidence (confidence x weight) will be populated in the field.

Conclusion

This completes the overview of the FuzzyDB plugin and its configuration in Ephesoft Transact.

Articles