Transact

  1. Home
  2. Transact
  3. Features and Functions
  4. Administrator Role and Features
  5. Batch Class Management
  6. Paragraph Extraction

Paragraph Extraction

  • Get values trapped inside a paragraph.
  • Define the paragraph boundary using a keyword(s).
  • Use a pattern (regular expression) to search for desired data.

Important: The Start pattern for the regular expression cannot exceed 31 characters in length.

picture22

First, a paragraph is identified, and then from within that paragraph, a particular value matching with a given regex pattern is extracted.

Paragraphs are identified on the basis of the following conditions:

  • Regex match for start pattern is treated as the start of the paragraph only if there is no span(word) present to the left of found Regex Match.
  • Ephesoft takes the average white space between lines and segregates the text body on the basis of white space being larger than the average space.
  • If any line ends with the End pattern if defined, then it takes priority over the line spacing mechanism and the paragraphs end on that line even if the next lines satisfy the spacing condition.

The start pattern for a paragraph can be a title of the paragraph or starting words of the paragraph. You can configure the extraction rule accordingly. During extraction, paragraph wrapping is handled by default while using the Paragraph Extraction Rule.

This functionality enables you, as an administrator of batch classes to configure extraction rules for index fields.

Was this article helpful to you? Yes No