Question: How to use Tesseract Plugin to configure two languages in Batch Class.


Below are the known issues when we are using Tesseract Plugin in our batch class:

  1. At a time we can only work with 2 languages in a Batch class with Tesseract Plugin.
  2. Arabic language is not properly recognised in Tesseract and usually gives error when ran through command line.
  3. There may be difference in OCRing and sometimes text may be recognised as some other character.
  4. Tesseract seems to have some issues to classify the characters properly when large number of languages are being used.
  5. Mandatory steps to perform OCRing for Chinese language:
    1. add chi_sim.traineddata and chi_tra.traineddata in tessdata folder
    2. specify tesseract language as chi_tra or chi_sim