Available: on-premises, cloud
Overview
This plugin extracts index field values based on the pattern defined for that field. A semicolon-separated collection of one or more words followed by a regular expression can be defined for the pattern. The system will search each page for the regular expression. If a match is found, the system will look to the left of the match and see if all of the preceding words in the pattern can be found. If all of the words are found (in order), the value will be extracted. If only a subset of the words are found, or if none of the words are found, the value will not be extracted.
Examples
Consider the following text defined for the pattern field of the InvoiceDate index field: Invoice;Date;d{1,2}[/]d{1,2}[/]d{2,4}
Example 1
Text string in document: Invoice Date 21/03/2012
Result: “21/03/2012” will be extracted for the InvoiceDate index field. This happens because “21/03/2012” matches the regular expression pattern, with “Date” found to its left, and “Invoice” found to its left.
Example 2
Text string in document: Date 21/03/2012
Result: Nothing will be extracted for this index field. Even though “21/03/2012” matches the regular expression, and “Date” is found to its left, the word “Invoice” is not found to the left of “Date.”
Plugin Configuration
The REGULAR_REGEX_EXTRACTION plugin can be configured in the following UI:
Properties Description
Configurable property | Type of value | Value options | Description |
Regular Regex Extraction Switch | List of Values |
|
This property determines if the plugin will run or not.
Default value is ON. |
Regular Regex Confidence Score | Integer | 0 – 100 | Acts as a multiplier for the confidence score calculated by matching regex. |
The semicolon-separated set of words and regular expression can be entered in the Pattern column for each index field:
Troubleshooting
The following table lists possible error messages that could appear, and an explanation of what each error message means.
Error message | Possible root cause |
Invalid input pattern sequence. | The pattern entered is not a valid regular expression, or doesn’t match the proper format. |
No FieldType data found from data base for document type | The FieldType column doesn’t contain a valid value. |