{"id":30147,"date":"2019-01-30T11:54:57","date_gmt":"2019-01-30T19:54:57","guid":{"rendered":"https:\/\/ephesoft.com\/docs\/?page_id=30147"},"modified":"2020-05-19T12:24:18","modified_gmt":"2020-05-19T19:24:18","slug":"etext-support-in-pdf","status":"publish","type":"docs","link":"https:\/\/ephesoft.com\/docs\/products\/transact\/release-notes\/release-notes-2019-1\/etext-support-in-pdf\/","title":{"rendered":"EText Support – Leveraging Existing Text Layer in PDF Documents"},"content":{"rendered":"
Introduction<\/a><\/p>\n EText Functionality<\/a><\/p>\n Page Process Module<\/a><\/p>\n Learn File(s) \/ Test Classification \/ Test Extraction<\/a><\/p>\n Extraction Configuration Screens<\/a><\/p>\n Batch.xml File<\/a><\/p>\n Export Module<\/a><\/p>\n Web Services<\/a><\/p>\n Use Cases<\/a><\/p>\n EText Mode \u2013 Automatically<\/a><\/p>\n EText Mode \u2013 Always<\/a><\/p>\n EText Mode \u2013 Never<\/a><\/p>\n Ephesoft Transact fully leverages the text embedded in computer generated PDFs (also referred to by RecoStar as EText). This helps to boost the accuracy of extracted text and greatly reduce the effort required for extraction on all projects that include processing of electronically generated documents.<\/p>\n Rather than being OCRed as images, the documents with EText layer are processed using a special mechanism, which helps to extract data directly. As per provided configuration, the EText support feature can be used:<\/p>\n This feature also allows to create the exported PDF artefact(s) from the original file rather than use an image with a text layer attached.<\/p>\n The following languages are supported with EText feature:<\/p>\n The feature is implemented on the following levels:<\/p>\n Page Process Module<\/strong><\/a><\/p>\n Learn File(s) \/ Test Classification \/ Test Extraction<\/strong><\/a><\/p>\n Extraction Configuration Screens<\/strong><\/a><\/p>\n Batch.xml File<\/strong><\/a><\/p>\n Export Module<\/strong><\/a><\/p>\n Web Services<\/strong><\/a><\/p>\n Three new web services have been added and 11 existing web services have been updated to support the new functionality.<\/p>\n Feature implementation on various levels is described in detail below.<\/p>\n <\/p>\n Two fields \u201cUse EText Recostar Project File\u201d and \u201cPDF EText Recostar Project File Name\u201d have been added to the plugin. Both new fields are mandatory.<\/p>\n <\/p>\n<\/a>Introduction<\/strong><\/h2>\n
\n
\n2. French<\/div>\n<\/a>EText Functionality <\/strong><\/h2>\n
\n
\n
\n
\n
<\/a><\/a>Page Process Module<\/strong><\/h3>\n
RECOSTAR_HOCR plugin (Windows)<\/strong><\/h4>\n