Recommended Settings for Better OCR Results

Applies to: All versions of Ephesoft Transact

This page lists recommended settings that may improve optical character recognition (OCR) results in Transact. These settings may also resolve common issues with the Recostar_HOCR_Plugin, such as if color images cause:

  • Errors or failure when creating HOCR.xml files.
  • Transact to crash or hang during RecoStar batch image process.

The following factors can affect OCR quality:

  • Image quality
  • Compression parameters
  • Other plugins

The following settings are optimal when the PDF to TIFF Conversion Process is set to Ghostscript and the Image Conversion Process is set to ImageMagick.* If you are processing both color and black and white images, use the settings recommended in Color Images. If your use case is exclusively greyscale, use the settings for Black and White Images.

* Since the publication of this article, other combinations of engines for PDF to TIFF Conversion Process and Image Conversion Process may result in better performance and quality depending on your use case.

Color Images

For color images, Ephesoft recommends the following configurations:

  • Documents should have a minimum of 200 DPI.
  • In the IMPORT_MULTIPAGE_FILES plugin:
    • Ensure -limit area 100MB is added to the IM Convert Input Image Parameters.

Note: This field is case sensitive, “MB” must be capitalized.

    • Ensure -compress LZW is added to the IM Convert Output Image Parameters.
    • Ensure -sCompression=lzw is added to the GhostScript Image Parameters.

Figure 1. Import Multipage Files Plugin

  • Ensure the CREATE_OCR_INPUT plugin exists in the Page Process module.
  • Turn the Recostar color switch to ON in the RECOSTAR_HOCR plugin, located in the Page Process module.

Figure 2. RecoStar HOCR Plugin

Black and White Images

For black and white images, Ephesoft recommends the following configurations:

  • Documents should have a minimum of 200 DPI.
  • In the IMPORT_MULTIPAGE_FILES plugin:
    • Ensure -limit area 100MB is added to the IM Convert Input Image Parameters.

Note: This field is case sensitive, “MB” must be capitalized.

    • Ensure -compress LZW is added to the IM Convert Output Image Parameters.
    • Ensure -sCompression=lzw is added to the GhostScript Image Parameters.

FIgure 3. Import Multipage Files Plugin

  • Turn the Recostar color switch to OFF in the RECOSTAR_HOCR plugin, located in the Page Process module.

Figure 4. RecoStar HOCR Plugin

  • Remove the CREATE_OCR_INPUT plugin from the Page Processing module.

Related Articles