Fuzzy DB Extraction

Overview

Fuzzy DB plugin is used to extract the document level fields of a document from records in the database on the basis of the matched value of the HOCR content or the previously extracted value of a document level field. This plugin involves creation of search engine based indexing and extracting document level field value based on fuzzy match of HOCR content against index. The user can configure any vendor database in order to capture the vendor name, vendor ID, or any other field from the incoming invoices. This can be done simply by mapping the document to the Vendor database table and the index fields of the document to the columns in the database table. The plugin will find the matching vendor from the database and update the fields in the document. Now user can configure different database for different document types of a batch class.

Mapping Configuration

The user can map the document level fields with the column of a database table on Fuzzy DB Extraction Configuration screen under the document type of a batch class. Click the Fuzzy DB Extraction Configuration node to display the following screen.

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_FuzzyDB_10001.jpg

DLF Mapping

To map the DLFs with the columns of a database table, User can follow the below given steps:

  • Select a database connection.
  • Select a Table.
  • Select a Row ID, Row Id drop down will show only columns that follows unique constraint.
  • On clicking the ‘Add’ button, following UI will be presented to map the DLFs with columns of database table:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_FuzzyDB_10002.jpg

  • Now User can map a DLF with the column of database table.

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_FuzzyDB_10003.jpg

Search Specific Column

Fuzzy DB Extraction Configuration Screen has a Is Searchable column. It specifies that indexes will be created for the DLF when Is Searchable is checked and Learn DB is selected. This DLF will be used for searching and generating output at RV screen for fuzzy text search.

Learn Database

Once the mapping is defined, the user can click on “Learn DB” to create indexes of all the records present in the database.

    • Lucene indexing is generated against all database records belonging to all document types which have been mapped for current batch class. Only mapped columns are indexed.
    • Indexes are built on a string which is the combined text of all the fields mapped to various columns of the database table.
    • Separate index directories are created to store indexes per document type per batch class. The hierarchy used for storing index files against each document level field is: <Shared-Folder-Path>\<Batch-Class>\fuzzydb-index\<Document-Type>\<Database-Name>\<Table-Name>.

Dependency

  • Fuzzy DB Extraction configuration depends on database connection that can be configured from the system configuration screen.

The user can configure the database connection at the Connection Manager node:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_ConnectionManager_10001.jpg