Transact

⌘K
  1. Home
  2. Transact
  3. Features and Functions
  4. Administrator Role and Fe...
  5. Modules and Plugins
  6. Extraction Module
  7. Create Fixed-Form Projects with RecoStar Design Studio

Create Fixed-Form Projects with RecoStar Design Studio

Ephesoft Transact supports fixed-form extraction to capture handprint, constrained text, checkboxes, and signature detection. This page explains how to use RecoStar Design Studio to create a fixed-form project to be used in conjunction with an Ephesoft Transact batch class and document type. The following is a general outline:

  • Create the RecoStar Design Studio project.
  • Set up individual fields for extraction.
  • Create an Ephesoft Transact batch class and document type.
  • Integrate the RecoStar Design Studio project file into Ephesoft Transact.

Steps of Execution

If you need assistance using RecoStar Design Studio, please download the user manual from the Customer Support Portal Downloads page.

  1. Navigate to the following folder: [Ephesoft_Directory]\Application\native\RecostarPlugin\RecoStarDesignStudio.
  2. Double-click RecoStarDesignStudio.exe to launch the RecoStar Design Studio application.

Figure 1. RecoStar Design Studio Application

  1. Click File > New Project.

Figure 2. Create New Project

  1. Select Single Form. Click Next.

Figure 3. Select Single Form

The following screen displays.

Figure 4. New Project File Name

  1. Enter a name for the project and click Browse to select a location where the project will be saved. The system will automatically build out the project directory based on the project name. For this tutorial, we’re using the IRS 2018 1040 schedule 1 tax form.

Figure 5. Creating New Project File

  1. Click Next. The Working Image Files screen displays.

Figure 6. Working Image Files

  1. Right-click the panel on the right to choose the Add Files command or click the Add Files button. Select a sample file of good quality that is populated with data.  Click Next.

Figure 7. Add Files

  1. On the next screen, verify that USA is selected as the country. Click Next.

Figure 8. Verify Country

  1. Verify the displayed information is correct.  Click Finish.

Figure 9. Verify Declarations

  1. The project will open inside the application with your sample file centered in the top panel.

Figure 10. Setup Image

  1. Right-click the form name and click Rename to rename the form.

Figure 11. Rename Form

  1. View the renamed form.

Figure 12. Renamed Form View

  1. Right-click IcrField and rename this operator “SSN”. This name MUST match the name of the corresponding index field in your Transact batch class where this extracted value will be displayed in Transact.

Figure 13. Rename IcrField

  1. Resize the dotted box on the preview so that it roughly surrounds the SSN field.

Figure 14. Focus SSN Field

  1. Press Ctrl + W or right-click the image and choose Fit Width to zoom in on the document.

Figure 15. Fit Document to Screen Width

  1. Now that you can see the details of the document more clearly, resize the dotted box so that it more closely surrounds the SSN value.

Figure 16. Fix Box to SSN Value

  1. Right-click BinaryImageSequence and select Add > Remove Line System.

Figure 17. BinaryImageSequence > Add > Remove Line System

  1. Confirm that BinaryImageSequence has been added.

Figure 18. View BinaryImageSequence

  1. Select File > Save Project to save your work.

Figure 19. Save Project

  1. Click Run Selected Image (►).

Figure 20. Run Selected Image

  1. Verify that the SSN value has been extracted correctly.

A screenshot of a cell phone Description automatically generated

Figure 21. Verify SSN Value

  1. Create a project in Transact and configure it to use this RecoStar Design Studio project file. Launch Ephesoft Transact and log in as a batch class administrator.

Figure 22. Launch Ephesoft Transact as Administrator

  1. Click Add to create a new batch class. The batch class name does not need to match the RecoStar Design Studio project.

Figure 23. Create New Batch Class

  1. Confirm that the new batch class has been created in the Batch Class Management screen.
  2. Select the new batch class and click Open to begin editing.

Figure 24. Open New Batch Class

  1. Click Doc Type and select Create New.

Figure 25. Create New Doc Type

  1. Select the Description field and edit the description of the new document type accordingly.

Figure 26. Edit Document Type Description

  1. Select the Minimum Confidence field and reduce the Minimum Confidence value to 8. Click Apply.

Figure 27. Reduce Minimum Confidence Value

  1. Select the new document type and click Upload Learn Files.

Figure 28. Upload Learn Files

  1. In the Open window, choose a blank template of the document and click Open. This will train Ephesoft Transact to recognize this document type.

Figure 29. Open Blank Template

  1. From the Document Types folder, navigate to <New Document Type> > Index Fields.

Figure 30. Index Fields of New Document Type

  1. Select Add to create a new index field. Name the index field SSN.  This name must match the name of the field you defined in the RecoStar project.  Click Apply.

Figure 31. Create SSN Index Field

  1. In the Additional Configuration column, click the down arrow ( ▼ ) to expand the drop-down list. Select the Force Review check box. This will force all batches to stop in the Validation module so we can verify the results of our extraction rules.

Note: In Ephesoft Transact 2020.1 and above, this check box is labeled Force Validation.

Figure 32. Select Force Review

  1. Click Apply to save your changes.

Figure 33. Save Changes

  1. Navigate to Modules > Extraction > RECOSTAR_EXTRACTION and set RecoStar Extraction Switch to ON. Click Apply to save your changes.

Figure 34. Enable RecoStar Extraction Switch

  1. Open File Explorer (previously Windows Explorer) and navigate to the location of your RecoStar project (Step 5). Copy (CTRL + C) the RecoStar (.rsp) file from that folder.

Figure 35. Copy the RecoStar Project File

  1. Navigate to the location of your batch class in Ephesoft Transact. Continue to fixed-form-extraction and open the folder for your new document type.

Figure 36. Navigate to Document Type

  1. Paste (CTRL + V) the .rsp file into the document type folder.

Figure 37. Add .rsp File

  1. In Ephesoft Transact, navigate to <New Document Type> > Index Fields > Fixed Form Extraction. This is where you can map specific pages of a document type to a RecoStar project file.

Figure 38. Navigate to Fixed Form Extraction

  1. Click Add. Set the Page Number field to “1”.  The File Name field should auto-populate because you only have a single .rsp file in that folder. Click Apply to save your changes.

Figure 39. Edit Fixed Form Extraction Fields

  1. Navigate to Operator > Upload Batch.

Figure 40. Operator > Upload Batch

  1. Click the Batch Class drop-down ( ▼ ). Select your new batch class from the provided list.

Figure 41. Select New Batch Class

  1. Under Upload Files, click Select Files. This opens the following screen. Choose a populated version of your document to be processed.

Figure 42. Upload Populated File

  1. The file will display in the Upload Batch screen.

Figure 43. File Displays

  1. Click Start Batch.

Figure 44. Start Batch

  1. Navigate to Batch Instance Management screen and wait for the batch to stop in Validation.

Figure 45. Batch Instance Management

  1. When the batch stops in Validation, open it for editing. Note that in the provided example, the SSN value has been extracted correctly. The highlighting on the preview shows from where the value was extracted.

Figure 46. Validation Module

  1. Repeat the steps above to add new index fields in RecoStar Design Studio, along with matching index fields in Ephesoft Transact.

Note: Each time you make changes to the RecoStar project file, you must copy the .rsp file from the RecoStar project area into your batch class’s fixed-form folder (refer to Steps 35-37) for the changes to be recognized by Ephesoft Transact.

RecoStar Design Studio Checkbox Configuration

Perform the instructions below to process checkboxes using RecoStar Design Studio and Ephesoft Transact. The following figure is an example of a form with checkboxes.

Figure 47. Sample Form for Checkbox Extraction

  1. Create a subform to help locate the desired text on the page for data extraction. Right-click the RecoOperators node in the form. Select Add > Subform. Keep the default name.

Figure 48. Add Subform

  1. Click the plus sign to expand the Subform node.
  2. Right-click FieldRegistration and select Insert > Regular Expression Search Field.

Figure 49. Insert Regular Expression Search Field

A blue overlay is drawn over the document.

Figure 50. Overlay

  1. Right-click an area inside the dashed blue overlay. Select Draw Geometry.

Figure 51. Draw Geometry Command

  1. Left-click and drag to draw a box around a word near the fields you want to work with. In the figure below, a box is drawn around Divorced to extract the Sex and Marital Status checkboxes.

Figure 52. Draw Geometry

  1. Right-click outside of the drawn box. Select Draw Zone.

Figure 53. Draw Zone Command

  1. Left-click and draw a larger box surrounding the general area around the blue box. This will create a red dashed box with a green line connecting it to the upper-left corner of the document.

Figure 54. Draw Zone

  1. Resize the blue box around the text to reduce the amount of white space as shown in the following figure. For better results, match the size of the blue box to the text size in the pattern.

Figure 55. Resize Geometry Box

  1. Click the FieldRegistration node. Enter Divorced into the Pattern field.

Figure 56. Add Pattern

When RecoStar processes the document, it will use the top-left corner of the document as its first reference point, then look to the exact location of the red dashed area (relative to the top-left corner) and try to find something with a specific pattern inside the blue box. The red dashed area needs to be larger to allow the system to better account for shrinking or stretching when paper documents are printed, faxed, or scanned.

  1. Click the green arrow to run this project. RecoStar locates the value in the document.

Figure 57. Test Results

  1. Create the checkbox definitions. Right-click the RecoOperators node inside the Subform node and click Add > Check Box Field.

Figure 58. Add Checkbox Field

  1. Change the name of the field to Sex. This field name needs to match the name of the index field in Ephesoft Transact.

Figure 59. Checkbox Operator

  1. Resize the red dashed overlay to surround the Female and Male checkboxes.

Figure 60. Resize Checkbox Operator

  1. Expand the Sex node. Right-click HorizontalDistances. Select Add > Checkbox Distance Description.

Figure 61. Add Checkbox Distance Description

  1. Click in the preview pane. Type M to open the measuring tool.
  2. Left-click the left edge of the Female checkbox, then drag to the left edge of the Male checkbox. The distance between the two left edges will be displayed at the top of the screen.

Figure 62. Measurement Value

  1. Note the distance in millimeters. Click the CheckBoxDistanceDescription field and enter that value in the Distance field.

Note: Since we’re only looking for one additional checkbox (after the first one), leave the Count field set to 1.

Figure 63. Distance Value

  1. Run the project again and view the results. In the figure below, the result OX indicates that the first checkbox is unchecked, and the second checkbox is checked.

Figure 64. Test Results

Note: Checked checkboxes are represented with a capital X, unchecked checkboxes are represented with a capital O. This is the value that will be passed to Ephesoft Transact once this project is integrated into your batch class.

When RecoStar processes this document, it will first locate the subform based on the reference point in the top-left corner and the FieldRegistration RegularExpressionSearchField operator, then it will look for the first checkbox.

If that first checkbox is found, it will then look for the second checkbox 20.57 mm to the right.

To extract a group of checkboxes with the same distance apart, use the Count field to indicate how many additional checkboxes should be read after the first one.

If the checkboxes aren’t spaced evenly, use CheckBoxDistanceDescription properties operators to define the distances between the different checkboxes. For example, the following screenshot shows the Sex checkbox group we just defined, plus the Marital Status checkbox group.

Figure 65. Larger Checkbox Group Results

Note how four CheckBoxDistanceDescription fields were needed to capture those checkboxes because the distances between each checkbox were inconsistent.

This document has focused only on horizontal checkboxes so far, but vertical checkbox groups can be processed in the same way using the CheckBoxDistanceDescription field under the VerticalDistances node as well. When measuring distances between vertical checkboxes, measure from the top border of the top checkbox to the top border of the bottom checkbox.

Deploy the New Project File to Ephesoft Transact

  1. Save your changes and copy the *.rsp project file to your batch class and document type folder as described earlier in this document.
  2. Ensure that the RecoStar Extraction Switch option is set to ON inside the RECOSTAR_EXTRACTION plugin inside the Extraction module.

Figure 66. RECOSTAR_EXTRACTION Plugin

  1. To apply a RecoStar project to this document type for the first time, map the new project to the document type inside the Document Types > Document Type Name > Fixed Form Extraction interface, as shown below:

Figure 67. Fixed-Form Extraction Project Assignment

Ephesoft Transact Checkbox Configuration

  1. Create new index fields in Ephesoft Transact with the same names (including capitalization) as the fields you created in your RecoStar Design Studio project.

Note: Checkboxes are often best represented in Ephesoft Transact using the COMBO field type. Use semicolons to separate the different values in the Field Option Values List column.

Figure 68. Create New Index Fields

  1. Create a batch instance with sample documents.

Figure 69. Initial Checkbox Extraction Results

  1. The XO and OXOOO values represent the checked/unchecked values coming from RecoStar. Use the Format Conversion feature or write an extraction script to convert X/O combinations into real values.
  2. Ensure the FORMAT_CONVERSION_PLUGIN plugin is added to the Extraction plugin and turned on.

Figure 70. FORMAT_CONVERSION_PLUGIN plugin

  1. Navigate into your document type and into the Format Conversion interface.

Figure 71. Format Conversion Interface

  1. Check the Replace checkbox.
  2. Enter XO in the Replace text field and Female in the With field.
  3. Click the Validate Regex button.

Figure 72. One Replace Rule

  1. Click the plus sign button to add a second row. Repeat the steps above to add OX and Male.

Figure 73. Format Conversion Rules for Sex Checkboxes

  1. Click the Apply button to save your changes. Repeat the above instructions for any other checkbox fields that you added in your RecoStar project.

Figure 74. Format Conversion Rules for Marital Status Checkboxes

  1. Restart your test batch at the Extraction module to view the new results.

Figure 75. Updated Extraction Results

Note: RecoStar project also transfers the coordinates of the checkbox group so that the checkboxes are highlighted when the user moves through the fields in the Validation screen.

Signature Detection

Signature detection is handled very similarly. A pixel count field is used in RecoStar Design Studio to measure the percentage of non-white pixels in a given area. A configurable threshold is used to determine what percentage is needed to result in a positive response.

Signature Detection in RecoStar Design Studio

The sample patient registration form used in the previous steps of this tutorial has a signature field at the bottom. The first step to determine if a signature is present is to create a new subform to help locate the signature area.

  1. Follow the steps listed above to create a new subform with a Regular Expression Search Field operator for the Field Registration field.

Figure 76. Subform for Signature Area

  1. Click Set Reference Corner to select the location of the subform. In the figure below, the subform is located at the bottom of the document, so the selected reference corner is Lower Left.

Figure 77. Set Reference Corner Lower Left

  1. Note how the thin green line now connects the red and blue overlay to the bottom left corner of the document.

Figure 78. Reference Corner Set to Lower Left

  1. Run the project and ensure that your new search field is found.

Figure 79. Test Results

  1. Right-click the RecoOperators node in your new subform node. Select Add > Pixel Count Field.

Figure 80. Add Pixel Count Field

  1. Resize the red dashed box to surround the signature area.

Figure 81. Resized Pixel Count Field

  1. Rename the Pixel Count field with the same name that you want to use for the index field in Ephesoft Transact. Change the PixelMinRatio to a small value like 2. For example, if 2% of the pixels are black, a signature is present.
  2. Run the project and see what percentage is given for the new field.

Figure 82. Test Results

  1. Run tests with a variety of sample documents. Adjust the PixelMinRatio property accordingly so that a positive value is returned when a signature is present.

Note: Larger search areas will actually result in lower percentages. It’s not uncommon to use a value of 2% or 3% to indicate that a signature is present. Fuzzy or dirty scans may introduce noise into signature areas that result in false positives.

  1. Depending on your use-case, you may want to adjust the following additional settings for best extraction results:
    • SyntaxMode: Specifies the expected content in the extraction zone, such as Alphanumeric, Numeric, or Amounts.
    • Font: Specifies whether the content in the extraction zone will be Machine Print or Handprint.
    • HandprintHeight: The expected height of the handwritten characters.
    • HandprintPitch: The angle (slant) that a person would write at.
    • LogicalContext: A post-processing feature on the recognition results. It helps improve the extracted values by looking at the preceding values and making decisions based on context. For example, SMITH instead of SM1TH.
    • Patterns: The ability to use regular expressions to help extract the correct values.

Figure 83. Additional Settings

  1. Save your changes and copy the updated *.rsp file into your batch class.

Configuring the Signature Detection Field in Ephesoft Transact

  1. Create a new index field with the same name as the pixel count operator in RecoStar Design Studio. Then, create a sample batch instance.
  2. When you run a test batch instance with the new pixel count field, the signature field displays as True or False in Ephesoft Transact.

Note: The highlighted area shows where the signature was detected in the document.

Figure 84. Signature Result

  1. To edit how the fields display, change the signature field to a drop-down list with the COMBO index field control. Refer to step 24 in the Ephesoft Transact Checkbox Configuration section to use the Format Conversion feature.