{"id":31859,"date":"2018-03-26T11:53:13","date_gmt":"2018-03-26T19:53:13","guid":{"rendered":"https:\/\/ephesoft.com\/docs\/2019-1-2\/moduleplugin-configuration\/page-process-module\/fraud-detection-using-ocr-font-switch\/"},"modified":"2020-07-22T13:25:05","modified_gmt":"2020-07-22T20:25:05","slug":"fraud-detection-using-ocr-font-switch","status":"publish","type":"docs","link":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/recostar-hocr-plugin\/fraud-detection-using-ocr-font-switch\/","title":{"rendered":"Fraud Detection Using OCR Font Switch"},"content":{"rendered":"
The Font Recognition switch<\/strong> has been introduced to detect potential fraud and tampering with processed documents. The HOCR file reflects the font style (Bold, Italics, and Underline) and font size if the Font switch is turned ON in the RECOSTAR_HOCR or NUANCE_HOCR plugins. This allows the user to detect any data that has been manually altered or added to the documents. By default, the Font switch is set to OFF.<\/p>\n For example, the original amount of a field in a document is \u201c1000\u201d and the font size is 11. Assume this value is manually changed to \u201c41000\u201d and the \u201c4\u201d is written in a size 12 font. The system will recognize the font size and style in the HOCR file. This will help the user identify that the document has been tampered with.<\/p>\n Note: <\/em><\/strong>Tesseract does not provide any information on font detection. This feature is available only in the Recostar and Nuance OCR engines.<\/em><\/p>\n The following changes have been made to implement this feature:<\/strong><\/p>\n <\/p>\n RECOSTAR_HOCR Plugin<\/p>\n <\/p>\n <\/p>\n The newly generated HOCR schema now includes the font size of each character in the span. A tag entitled \u201cUnicodeCharacters\u201d has been added to the HOCR file which contains information about the value and size of each character. Also, a tag entitled \u201cStyle\u201d has been added in the HOCR file which contains information about the style (Bold, Italics, and Underline) of the span. If style information is not fetched, its value is \u201cNone\u201d.<\/p>\n <\/p>\n <\/p>\n In the screenshot below you can see the difference in the HOCR schema when the Font switch is turned OFF<\/strong>.<\/p>\n The information about font family and size is not fetched when the switch is turned OFF.<\/p>\n <\/p>\n Note:<\/em><\/strong> The Recostar OCR engine does not recognize combinations of font styles. For example, the style value would be \u201cNone\u201d if a character string was both bold and underlined.<\/em><\/p>\n <\/p>\n <\/p>\n NUANCE_HOCR Plugin<\/p>\n <\/p>\n <\/p>\n The newly generated HOCR schema now includes the font size of each character in the span. A tag entitled \u201cUnicodeCharacters\u201d has been added to the HOCR file which contains information about the value and size of each character. Also, a tag entitled \u201cStyle\u201d has been added in the HOCR file which contains information about the style (Bold, Italics, and Underline) of the span. If style information is not fetched, its value is \u201cNone\u201d.<\/p>\n <\/p>\n <\/p>\n In the screenshot below you can see the difference in the HOCR schema when the Font switch is turned OFF<\/strong>.<\/p>\n The information about font family and size is not fetched when the switch is turned OFF.<\/p>\n <\/p>\n Note:<\/em><\/strong> The Nuance OCR engine does recognize the combination of font styles, giving comma separated values when multiple styles are detected. However, it does not recognize the character size of individual characters. All characters in a word are always recognized as having the same size, even though some letters might be capitalized.<\/em><\/p>\n","protected":false},"featured_media":0,"parent":31869,"menu_order":2,"comment_status":"closed","ping_status":"open","template":"","doc_tag":[],"yoast_head":"\n\n
\n
\n
To fetch the font information via the RECOSTAR_HOCR plugin:<\/strong><\/h4>\n
\n
\n
To fetch font information via the NUANCE_HOCR plugin:<\/strong><\/h4>\n
\n
\n