Fuzzy DB Extraction


Fuzzy DB extraction is used to extract the document level fields of a document from records in the database on the basis of the matched value of the HOCR content or the previously extracted value of a document level field. This plug-in involves creation of search engine based indexing and extracting document level field value based on fuzzy match of HOCR content against index. User can configure any Vendor database in order to capture Vendor name, Vendor ID or any other field from the incoming invoices. This can be done simply by mapping the document to the Vendor database table and the index fields of the document to the columns in the database table. The plugin will find the matching vendor from the database and update the fields in the document. Now user can configure different database for different document types of a batch class.

Mapping Configuration

User can map the Document Level Fields with the column of a database table on Fuzzy DB Extraction Configuration Screen under document type of a batch class. When User click on Fuzzy DB Extraction Configuration node, following UI will be presented:

C:UsersgajendrayadavDesktopScreen shots4.0.0.0_FuzzyDB_10001.jpg

DLF Mapping

To map the DLFs with the columns of a database table, User can follow the below given steps:

  • Select a database connection.
  • Select a Table.
  • Select a Row ID, Row Id drop down will show only columns that follows unique constraint.
  • On clicking the ‘Add’ button, following UI will be presented to map the DLFs with columns of database table:

C:UsersgajendrayadavDesktopScreen shots4.0.0.0_FuzzyDB_10002.jpg

  • Now User can map a DLF with the column of database table.

C:UsersgajendrayadavDesktopScreen shots4.0.0.0_FuzzyDB_10003.jpg

Search Specific Column

Fuzzy DB Extraction Configuration Screen has a column ‘Is Searchable’. It specifies that indexes will be created for those DLF for which Is Searchable is checked when learn DB is clicked. This DLF will be used for searching and generating output at RV screen for fuzzy text search.

Learn Database

Once the mapping is defined, the user can click on “Learn DB” to create indexes of all the records present in the database.

    • Lucene indexing is generated against all database records belonging to all document types which have been mapped for current batch class. Only mapped columns are indexed.
    • Indexes are built on a string which is the combined text of all the fields mapped to various columns of the database table.
    • Separate index directories are created to store indexes per document type per batch class. The hierarchy used for storing index files against each document level field is: <Shared-Folder-Path><Batch-Class>fuzzydb-index<Document-Type><Database-Name><Table-Name>.


  • Fuzzy DB Extraction configuration depends on Database connection that can be configured from System Config Screen.

User can configure the database connection on the following UI:

C:UsersgajendrayadavDesktopScreen shots4.0.0.0_ConnectionManager_10001.jpg

Refresh button in Fuzzy DB & DB Export

A Refresh button is provided next to connection dropdown both in Fuzzy DB and DB Export. By clicking refresh, newly added connections will also appear in connections drop down.