Available: on-premises, cloud
This page provides an overview of the FuzzyDB plugin and its configuration in Ephesoft Transact.
Prerequisites
FuzzyDB must be added to the Extraction module and set to ON.
Overview
The FuzzyDB plugin uses a fuzzy database lookup to link your internal database to Ephesoft Transact. This allows you to pair data extracted from Ephesoft Transact with data from your internal database.
For example, refer to the following excerpt from a sample invoice.
Figure 1. Sample Invoice
From this invoice, Ephesoft Transact can extract the vendor name: “Office Depot”. However from this document the system cannot determine your internal vendor number that you have assigned to this vendor. To match this internal information (vendor number) with the extracted information (invoice), the system can learn a database table for the list of vendors to perform automated lookups. It will then use fuzzy matching logic on the extracted information.
Note: FuzzyDB can be used for extraction or validation.
Configuration
FuzzyDB must be set up in two places:
Connection Manager
Before configuring the FuzzyDB plugin, you must first connect your database to Ephesoft Transact.
- Go to Administrator > System Configuration.
- Open the Connection Manager and click Add.
Figure 2. Add Connection
- Provide the required information.
Figure 3. Connection Details
- Click Test Connection to ensure the connection is successful. Then click Save.
Note: The following connection (database) types are available:
- MYSQL
- MSSQL
- MSSQL Windows Authentication
- Oracle
- MariaDB
Fuzzy DB Extraction Configuration
Once you have set up your connection in the Connection Manager, you can select that connection when adding FuzzyDB extraction to a document type.
Note: This is a Document Level Field (DLF), which means it is configured at the document type level in a batch class.
- Select and open a batch class.
- Open a document type.
- Go to Index Fields > Fuzzy DB Extraction Configuration. This will open the Document Fuzzy page by default.
The following options are available for mapping database data to a document:
Document Fuzzy
In Document Fuzzy, only one database can be configured per document type.
Figure 4. Document Fuzzy
- From the Document Fuzzy page, provide the following details from their respective drop-downs:
- Connection
- Table Name
- Primary Key
Figure 5. Map Document Fuzzy Connection
- Click Add to map index fields with database columns.
- Select the checkbox Is Searchable if you want to limit the search to the specified table column, not the entire database.
- Review the Additional Parameters Mapping section and perform any additional configurations.
Figure 6. Additional Parameters Mapping
Basic Configuration | |
Enabled | Select this checkbox to enable basic configuration. |
Confidence Threshold | A value from 1-100. The minimum confidence defined for the search. |
Weight | Value from 0-1. Acts as a multiplying factor for the computed confidence. |
Ignore Word List | Enter values separated by a semi-colon ( ; ) which you want to exclude during search. |
Include Pages | Define the pages in a document type in which you want to search.
|
Max Search Results | Define the maximum number of results that will be returned for the search. |
Field Based Search | |
HOCR Search Switch | Select this checkbox to enable searching based on HOCR content after index fields. |
Search Column List | Select the index field to search with. |
- Click Apply to save any changes. Then click Learn DB to generate the lucene indexes.
Field Fuzzy
In Field Fuzzy, a field can be mapped with multiple databases. This is done through groups.
Note: While one field can be mapped to multiple databases, you cannot have more than one field mapped to the same table and index.
- Under Fuzzy DB Extraction Configuration, click on Field Fuzzy.
Figure 7. Field Fuzzy
- From the Field Fuzzy page, provide the following information:
- Group Name
- Connection
- Table Name
- Primary/Unique Key
Figure 8. Map Field Fuzzy Connection
- Click Add to map index fields with database columns.
- Select the checkbox Is Searchable if you want to limit the search to the specified table column, not the entire database.
- Review the Additional Parameters Mapping section and perform any additional configurations.
Figure 9. Additional Parameters Mapping
Basic Configuration | |
Enabled | Select this checkbox to enable basic configuration. |
Confidence Threshold | A value from 1-100. The minimum confidence defined for the search. |
Weight | Value from 0-1. Acts as a multiplying factor for the computed confidence. |
Ignore Word List | Enter values separated by a semi-colon ( ; ) which you want to exclude during search. |
Include Pages | Define the pages in a document type in which you want to search.
|
Max Search Results | Define the maximum number of results that will be returned for the search. |
- Click Apply to save any changes. Then click Learn DB to generate the lucene indexes.
Test Extraction
You can test the extraction results from the Document Types folder. Supported image files are PDF or TIFF.
- Select the document type and click Test Extraction.
Figure 10. Test Extraction
- Click Extract. The Extraction Type column shows how the data was extracted. If a FuzzyDB mapping matches the mapped fields, it will list FuzzyDB as the extraction type.
Figure 11. View Test Extraction
Perform Fuzzy Search and Extraction
Once you execute a batch, the document will appear on the Validation screen.
- In the Fuzzy Search field, enter a mapped index field (see Document Fuzzy step 2 or Field Fuzzy step 3).
Figure 12. Fuzzy Search Field
- A popup will display the fuzzy search results. Select the record from the table which you want to fetch. You can enter an asterisk ( * ) into the search bar to fetch all records from the table.
Note: You cannot select two rows from the same group.
Figure 13. Fuzzy Search Results
Considerations for Fuzzy Extraction
Multiple Rows:
- If multiple rows are selected, the option with the highest confidence score will be populated in the field.
- If multiple rows are selected that have the same confidence, the option with the higher weight will be populated.
- If multiple rows are selected that have the same confidence and weight, the first record on the extraction list will be populated.
Multiple Fuzzy Extraction Methods:
- If both Document Fuzzy and Field Fuzzy are configured and the same index field exists in both, the option with the higher weighted confidence (confidence x weight) will be populated in the field.
Conclusion
This completes the overview of the FuzzyDB plugin and its configuration in Ephesoft Transact.