Transact

⌘K
  1. Home
  2. Transact
  3. Features and Functions
  4. Administrator Role and Fe...
  5. Modules and Plugins
  6. Page Process Module
  7. RecoStar HOCR Plugin
  8. OCR Languages Selection from the UI

OCR Languages Selection from the UI

Ephesoft Transact supports the OCR engines Recostar (for Windows systems) and Nuance (for Linux systems). Transact also offers the option for Tesseract. The user can select any one of them depending on preference and system requirements.

In Transact versions prior to Release 4.5.x.x or 2019.1, to define OCR languages for Recostar/Nuance, the user had to find the required backend folders on the server and edit the OCR input file manually. Tessaract OCR language could be specified from the UI – for that the language name had to be manually typed in the corresponding field.

Starting from Ephesoft Transact 4.5.0.0 and continuing with 2019.x releases, a new multi-select-suggestion widget has been added to the Plugin Configuration screen for all three OCR engines under the Page Process module. Using this widget, the user can select the language(s) and update the OCR engine input file automatically from the UI rather than having to make this change manually.

The name of the new Plugin Configuration field for Nuance and Tesseract OCR engines is OCR Language. The Recostar OCR engine, on the other hand, takes only the country name as the language input; therefore, to make it compatible with other definitions, the same field in the Recostar HOCR plugin is called OCR Country/Language.

NUANCE_HOCR Plugin Configuration screen

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word2.png

RECOSTAR_HOCR Plugin Configuration screen

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word3.png

TESSERACT_HOCR Plugin Configuration screen

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word1.png

When you select or type the language name, the widget will help you by giving suggestions. The complete suggestion list will be opened by the suggestion token, which is a semi-colon (;) or by clicking in the field with predictive typing if no language is selected. The suggestion token will automatically list languages based on the user’s input.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word4-1.png

When you start typing the first letters of the required language name, the widget will suggest languages according to the letters already entered.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word5.png

The multi-select-suggestion widget has several icons associated with it:

– Help icon is used to provide suggestions (for example, it will remind you to use suggestion token to view the language suggestions list (;).

– Error icon is used to indicate that you have provided/selected wrong input (for example, if you leave the field empty or enter invalid value).

Note: The error icon will also be shown if you select/use a non-licensed language for Nuance (Arabic and Asian (Chinese_Simplified, Chinese_Traditional, Japanese, Korean) languages) or Recostar (Chinese, Japanese, Korean, Thai).

– Warning icon is used to warn and provide information (for example, it will remind that Tessaract Test-Data folder should contain Test Data for selected Tesseract languages).

Notes:

  • If you do not specify the language in the HOCR plugin, English will be used by default.
  • During the OCR process with Recostar/Nuance OCR engine, the system will check whether all selected languages are licensed. If not, then the empty HOCR will be generated for all pages and an error log will be created in the log file.
  • If you need to OCR documents in Asian languages using the Recostar OCR engine, you’ll have to purchase additional Ephesoft OCR language license for Asian languages (Chinese, Japanese, Korean, Thai). Similarly, when using Nuance, separate licenses have to be purchased for Arabic language and Asian languages (Chinese_Simplified, Chinese_Traditional, Japanese, Korean).

The information about selected language(s) is now also included in the HOCR.xml file. The file will contain the <LanguageCode> tag with the code of the OCR language(s) specified in the RECOSTAR_HOCR, NUANCE_HOCR, TESSERACT_HOCR plugin.

C:UsersEphesoftAppDataLocalMicrosoftWindowsINetCacheContent.Word22.png