KB00022161: Machine Learning does not work with multiple languages

Machine Learning doesn’t work when multiple languages are used.

KB Article # 22161

Topic/Category: Machine Learning

Applies to: 4.1 onwards


If a Batch Class is configured for multiple languages (within either RecoStar or Nuance), the Machine Learning plugin won’t work correctly.

Root Cause:

Machine learning based extraction plugin expects dictionary file (containing stop words) for each language of the document being executed. Thus in one case, it needs 2 files, for English and Italian documents. As we are not shipping the Italian dictionary file, “Dictionary not found exception” is thrown at the time of learning and thus no extraction happens for subsequent batches.


Customer can manually create Italian dictionary file (it_stopWords.txt) under opt/Ephesoft/SharedFolders/BCXX/machine-learning-dictionaries/language-packs/* folder, learning and extraction would work in that case. But, if no (or wrong) stop words are specified in that file, wrong anchors may get learned causing low accuracy.

Documentation Main Page
How To Articles
Downloads and Updates