Fuzzy DB Extraction


FuzzyDB plugin is used to extract the document level fields of a document from records in the database on the basis of the matched value of the HOCR content or the previously extracted value of a document level field. This plugin involves creation of search engine based indexing and extracting document level field value based on fuzzy match of HOCR content against index. The user can configure any vendor database in order to capture the vendor name, vendor ID, or any other field from the incoming invoices. This can be done by mapping the document to the Vendor database table and the index fields of the document to the columns in the database table. The plugin will find the matching vendor from the database and update the fields in the document. The user can configure different database for different document types of a batch class.

Mapping Configuration

The user can map the Document Level Fields with the column of a database table on Fuzzy DB Extraction Configuration screen under document type of a batch class. Click the Fuzzy DB Extraction Configuration node, following UI will be presented:

C:UsersgajendrayadavDesktopScreen shots4.0.0.0_FuzzyDB_10001.jpg

DLF Mapping

Follow the instructions below to map the DLFs with the columns of a database table.

  1. Select a database connection.
  2. Select a table.
  3. Select a Row ID, Row Id drop down will show only columns that follows unique constraint.
  4. Click Add. The following screen displays to map the DLFs with the columns of database table:

C:UsersgajendrayadavDesktopScreen shots4.0.0.0_FuzzyDB_10002.jpg

  • The user can map a DLF with the column of database table.

C:UsersgajendrayadavDesktopScreen shots4.0.0.0_FuzzyDB_10003.jpg

Search Specific Column

Fuzzy DB Extraction Configuration Screen has a column Is Searchable. It specifies that indexes will be created for those DLF for which Is Searchable is checked when learn DB is clicked. This DLF will be used for searching and generating output at RV screen for fuzzy text search.

Learn Database

Once the mapping is defined, the user can click on “Learn DB” to create indexes of all the records present in the database.

    • Lucene indexing is generated against all database records belonging to all document types which have been mapped for current batch class. Only mapped columns are indexed.
    • Indexes are built on a string which is the combined text of all the fields mapped to various columns of the database table.
    • Separate index directories are created to store indexes per document type per batch class. The hierarchy used for storing index files against each document level field is: <Shared-Folder-Path><Batch-Class>fuzzydb-index<Document-Type><Database-Name><Table-Name>.


  • Fuzzy DB Extraction configuration depends on Database connection that can be configured from System Config Screen.

The user can configure the database connection in the Connection Manager:

C:UsersgajendrayadavDesktopScreen shots4.0.0.0_ConnectionManager_10001.jpg