{"id":31866,"date":"2015-03-09T12:53:26","date_gmt":"2015-03-09T20:53:26","guid":{"rendered":"https:\/\/ephesoft.com\/docs\/2019-1-2\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/"},"modified":"2022-03-09T12:00:26","modified_gmt":"2022-03-09T19:00:26","slug":"tesseract-hocr-plugin-3","status":"publish","type":"docs","link":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/","title":{"rendered":"Tesseract HOCR Plugin"},"content":{"rendered":"

Available:\u00a0<\/strong>on-premises, cloud<\/p>\n

Overview<\/h2>\n

The TESSERACT_HOCR plugin is commonly used in the Page Processing module.\u00a0This plugin reads the image files listed in the batch.xml file for a batch, generates an HOCR.xml file for each image, and updates the batch.xml accordingly.<\/p>\n

Configuration<\/h2>\n

Following are the list of configurable properties for TESSERACT_HOCR plugin from the UI:<\/p>\n

\"\"<\/p>\n\n\n\n\n\n\n\n\n
Configurable property<\/strong><\/td>\nType of value<\/strong><\/td>\nValue options<\/strong><\/td>\nDescription<\/strong><\/td>\n<\/tr>\n
Tesseract Switch<\/td>\nList of values<\/td>\nON, OFF<\/td>\nThis switch is used to turn this plugin ON\/OFF. If this switch is OFF, this plugin won\u2019t do anything.<\/td>\n<\/tr>\n
Tesseract color switch<\/td>\nList of values<\/td>\nON, OFF<\/td>\nTesseract is unable to read colored TIFFs. Hence, in the case of colored images (i.e. when one switches ON the color switch), we send the PNGs for OCRing instead. Hence switching the color switch ON would be helpful for batch classes where one expects to have colored TIFF images.<\/td>\n<\/tr>\n
Tesseract Language<\/td>\nString<\/td>\nN\/A<\/td>\nThis option provides the user an option to select the language one wants to use for OCRing. At present Tesseract supports only a single language per image file OCRing.E.g.: specify \u2018eng\u2019 <\/strong>for English, \u2018tur\u2019<\/strong>– for Turkish, etc.<\/td>\n<\/tr>\n
Tesseract Version<\/td>\nString<\/td>\nN\/A<\/td>\nThis option provides the user an option to define the Tesseract version installed in the system. For example: specify \u2018tesseract_version_3\u2019 <\/strong>for Tesseract 3.0, \u2018tesseract_version_2\u2019<\/strong>– for Tesseract 2.0 etc.<\/td>\n<\/tr>\n
Tesseract Valid Extensions<\/td>\nMulti-select<\/td>\ntif, gif, png<\/td>\nThe file extensions that this plugin will support.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

Steps of execution<\/h2>\n

This plugin works in the Page Process phase of Transact after the import processing is complete.\u00a0The plugin performs OCR for all the input images.\u00a0After all the work is done, it writes the name of each HOCR file in its batch.xml and generates HOCR output in the form of HTML and HOCR.xml.<\/p>\n

Dependency<\/h2>\n

This plugin only requires an image as an input (PNG if the color switch is ON, TIFF if the color switch is OFF).\u00a0 Therefore, either the “Create OCR Input Plugin” or the “Create Display Image Plugin” must run before this plugin.<\/p>\n

Troubleshooting<\/h2>\n

The following table lists several possible error messages that could appear for this plugin, and explanations of what each error message means.<\/p>\n\n\n\n\n\n\n\n
Error message<\/strong><\/td>\nPossible root cause<\/strong><\/td>\n<\/tr>\n
Tesseract Base path not configured.<\/td>\nThe environment variable for Tesseract is either not set or the path is configured incorrectly.<\/td>\n<\/tr>\n
Space found in the name of image: xyz.png. So it cannot be processed<\/td>\nOne or more spaces were found in the file name.\u00a0 Remove the spaces from the image name and restart the batch from the Page Process module.<\/td>\n<\/tr>\n
No valid extensions are specified in resources<\/td>\nNo extensions were specified for this plugin.<\/td>\n<\/tr>\n
Image Processing or XML updation failed for image: xyz<\/td>\nThe image file being processed has a file extension that isn’t included in the list of valid extensions for the plugin.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

 <\/p>\n","protected":false},"featured_media":0,"parent":31858,"menu_order":6,"comment_status":"closed","ping_status":"open","template":"","doc_tag":[],"yoast_head":"\nTesseract HOCR Plugin | Ephesoft Docs<\/title>\n<meta name=\"robots\" content=\"noindex, follow\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Tesseract HOCR Plugin\" \/>\n<meta property=\"og:description\" content=\"Available:\u00a0on-premises, cloud Overview The TESSERACT_HOCR plugin is commonly used in the Page Processing module.\u00a0This plugin reads the image files listed […]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/\" \/>\n<meta property=\"og:site_name\" content=\"Ephesoft Docs\" \/>\n<meta property=\"article:modified_time\" content=\"2022-03-09T19:00:26+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ephesoft.com\/docs\/wp-content\/uploads\/2015\/03\/word-image314.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/\",\"url\":\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/\",\"name\":\"Tesseract HOCR Plugin | Ephesoft Docs\",\"isPartOf\":{\"@id\":\"https:\/\/ephesoft.com\/docs\/#website\"},\"datePublished\":\"2015-03-09T20:53:26+00:00\",\"dateModified\":\"2022-03-09T19:00:26+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ephesoft.com\/docs\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Transact\",\"item\":\"https:\/\/ephesoft.com\/docs\/products\/transact\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Features and Functions\",\"item\":\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Administrator Role and Features\",\"item\":\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/\"},{\"@type\":\"ListItem\",\"position\":5,\"name\":\"Modules and Plugins\",\"item\":\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/\"},{\"@type\":\"ListItem\",\"position\":6,\"name\":\"Page Process Module\",\"item\":\"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/\"},{\"@type\":\"ListItem\",\"position\":7,\"name\":\"Tesseract HOCR Plugin\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ephesoft.com\/docs\/#website\",\"url\":\"https:\/\/ephesoft.com\/docs\/\",\"name\":\"Ephesoft Docs\",\"description\":\"Intelligent Document Processing Made Easy\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ephesoft.com\/docs\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Tesseract HOCR Plugin | Ephesoft Docs","robots":{"index":"noindex","follow":"follow"},"og_locale":"en_US","og_type":"article","og_title":"Tesseract HOCR Plugin","og_description":"Available:\u00a0on-premises, cloud Overview The TESSERACT_HOCR plugin is commonly used in the Page Processing module.\u00a0This plugin reads the image files listed […]","og_url":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/","og_site_name":"Ephesoft Docs","article_modified_time":"2022-03-09T19:00:26+00:00","og_image":[{"url":"https:\/\/ephesoft.com\/docs\/wp-content\/uploads\/2015\/03\/word-image314.png"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/","url":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/","name":"Tesseract HOCR Plugin | Ephesoft Docs","isPartOf":{"@id":"https:\/\/ephesoft.com\/docs\/#website"},"datePublished":"2015-03-09T20:53:26+00:00","dateModified":"2022-03-09T19:00:26+00:00","breadcrumb":{"@id":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/tesseract-hocr-plugin-3\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ephesoft.com\/docs\/"},{"@type":"ListItem","position":2,"name":"Transact","item":"https:\/\/ephesoft.com\/docs\/products\/transact\/"},{"@type":"ListItem","position":3,"name":"Features and Functions","item":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/"},{"@type":"ListItem","position":4,"name":"Administrator Role and Features","item":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/"},{"@type":"ListItem","position":5,"name":"Modules and Plugins","item":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/"},{"@type":"ListItem","position":6,"name":"Page Process Module","item":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/"},{"@type":"ListItem","position":7,"name":"Tesseract HOCR Plugin"}]},{"@type":"WebSite","@id":"https:\/\/ephesoft.com\/docs\/#website","url":"https:\/\/ephesoft.com\/docs\/","name":"Ephesoft Docs","description":"Intelligent Document Processing Made Easy","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ephesoft.com\/docs\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"}]}},"comment_count":0,"_links":{"self":[{"href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/docs\/31866"}],"collection":[{"href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/docs"}],"about":[{"href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/types\/docs"}],"replies":[{"embeddable":true,"href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/comments?post=31866"}],"version-history":[{"count":2,"href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/docs\/31866\/revisions"}],"predecessor-version":[{"id":50525,"href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/docs\/31866\/revisions\/50525"}],"up":[{"embeddable":true,"href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/docs\/31858"}],"next":[{"title":"Advanced Barcode Reader Plugin","link":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/advanced-barcode-reader-plugin\/","href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/docs\/49211"}],"prev":[{"title":"Image Enhancement Support","link":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/support-for-image-enhancement-features-like-despeckle-line-removal-in-nuance\/","href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/docs\/31865"}],"wp:attachment":[{"href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/media?parent=31866"}],"wp:term":[{"taxonomy":"doc_tag","embeddable":true,"href":"https:\/\/ephesoft.com\/docs\/wp-json\/wp\/v2\/doc_tag?post=31866"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}