Fuzzy DB Extraction


Fuzzy DB plugin is used to extract the document level fields of a document from records in the database on the basis of the matched value of the HOCR content or the previously extracted value of a document level field. This plugin involves creation of search engine based indexing and extracting document level field value based on fuzzy match of HOCR content against index. User can configure any vendor database in order to capture vendor name, vendor ID, or any field from the incoming invoices. This can be done simply by mapping the document to the vendor database table and the index fields of the document to the columns in the database table. The plugin will find the matching vendor from the database and update the fields in the document. Now user can configure different database for different document types of a batch class.

Mapping Configuration

The user can map the document level fields with the column of a database table on Fuzzy DB Extraction Configuration screen under document type of a batch class. Click the Fuzzy DB Extraction Configuration node. The following screen displays:

C:\Users\gajendrayadav\Desktop\Screen shots\

DLF Mapping

Follow the steps to map the DLFs with the columns of a database table:

  1. Select a database connection.
  2. Select a Table.
  3. Select a Row ID, Row Id drop down will show only columns that follows unique constraint.
  4. On clicking the ‘Add’ button, following UI will be presented to map the DLFs with columns of database table:

C:\Users\gajendrayadav\Desktop\Screen shots\

  • The user can map a DLF with the column of database table.

C:\Users\gajendrayadav\Desktop\Screen shots\

Search Specific Column

Fuzzy DB Extraction Configuration Screen has a column Is Searchable. It specifies that indexes will be created for those DLF for which Is Searchable is checked and Learn DB is clicked. This DLF will be used for searching and generating output at RV screen for fuzzy text search.

Learn Database

Once the mapping is defined, click Learn DB to create indexes of all the records present in the database.

    • Lucene indexing is generated against all database records belonging to all document types which have been mapped for current batch class. Only mapped columns are indexed.
    • Indexes are built on a string which is the combined text of all the fields mapped to various columns of the database table.
    • Separate index directories are created to store indexes per document type per batch class. The hierarchy used for storing index files against each document level field is: <Shared-Folder-Path>\<Batch-Class>\fuzzydb-index\<Document-Type>\<Database-Name>\<Table-Name>.


  • Fuzzy DB Extraction configuration depends on database connection that can be configured from System Configuration screen.

The user can configure the database connection in the Connection Manager.

C:\Users\gajendrayadav\Desktop\Screen shots\