{"id":31867,"date":"2015-03-09T12:51:47","date_gmt":"2015-03-09T20:51:47","guid":{"rendered":"https:\/\/ephesoft.com\/docs\/2019-1-2\/moduleplugin-configuration\/page-process-module\/search-classification-plugin-2\/"},"modified":"2022-03-09T11:55:10","modified_gmt":"2022-03-09T18:55:10","slug":"search-classification-plugin-2","status":"publish","type":"docs","link":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/page-process-module\/search-classification-plugin-2\/","title":{"rendered":"Search Classification Plugin"},"content":{"rendered":"
Available<\/strong>: on-premises, cloud<\/p>\n This document describes how to configure and use the Search Classification plugin. The plugin classifies documents in the Page Process<\/strong> module of the workflow using Lucene-based indexing. Classification is how Ephesoft Transact chooses or associates the document to the Document Type. This document applies to Ephesoft Transact 2019.1 and above.<\/p>\n Perform the following steps to configure the SEARCH_CLASSIFICATION plugin in the Page Process<\/strong> module. You must have administrator rights to complete these steps.<\/p>\n <\/p>\n Navigation to SEARCH_CLASSIFICATION Plugin<\/em><\/span><\/p>\n The SEARCH_CLASSIFICATION plugin works independently of the MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN<\/strong> in the Page Process<\/strong> module. Both plugins can be present in the module.<\/p>\n \u00a0 \u00a0 \u00a0 3. Select the SEARCH_CLASSIFICATION plugin to set up the configuration. The Plugin Configuration <\/strong>screen for the SEARCH_CLASSIFICATION<\/strong> plugin displays.<\/p>\n <\/p>\n SEARCH_CLASSIFICATION Plugin Configuration Screen<\/em><\/p>\n The following table lists and defines the configurable properties for the Search Classification plugin:<\/p>\n html<\/td>\n summary<\/td>\n name<\/td>\n OFF<\/td>\n The default value for this field is 5 in Ephesoft Transact to control the overall size of the batch.xml file.<\/td>\n<\/tr>\n 4. Define the settings, then click Deploy<\/strong> to save and enable the changes.<\/p>\n This plugin operates in the Page Process<\/strong> module after all batch-level import processes are complete.<\/p>\n Ephesoft recommends that document learning is completed for the batch class prior to using this plugin. This plugin classifies incoming document images using Lucene-based indexing. This plugin functions in two stages when classifying documents:<\/p>\n The plugin generates HOCR content similar to the RecoStar HOCR and Tesseract HOCR plugins.<\/p>\n The following table lists the possible error messages that may occur with this plugin along with a description of each possible root cause.<\/p>\nIntroduction<\/h2>\n
Configuring the Search Classification Plugin<\/h2>\n
\n
\nThe following figure illustrates the SEARCH_CLASSIFICATION plugin in a typical batch class configuration.<\/li>\n<\/ol>\nConfigurable Properties<\/h2>\n
\n\n
\n \nConfigurable Property<\/th>\n Type of Value<\/th>\n Value Options<\/th>\n Description<\/th>\n<\/tr>\n<\/thead>\n \n Lucene Valid Extensions<\/td>\n List of Values<\/td>\n xml<\/p>\n This field defines the valid extension of the input file and is applied when classifying document types for the specified file format.<\/td>\n<\/tr>\n \n Lucene Min Term Frequency<\/td>\n Integer<\/td>\n NA<\/td>\n This field sets the frequency below which terms will be ignored in the source document.<\/td>\n<\/tr>\n \n Lucene Min Document Frequency<\/td>\n Integer<\/td>\n NA<\/td>\n This field sets the frequency at which words are ignored. When a word does not occur in at least x amount of documents indicated in this field, it gets ignored.<\/td>\n<\/tr>\n \n Lucene Min Word Length<\/td>\n Integer<\/td>\n NA<\/td>\n This field sets the minimum word length. Words smaller than this setting are ignored from the HOCR content.<\/td>\n<\/tr>\n \n Lucene Min Query Terms<\/td>\n Integer<\/td>\n NA<\/td>\n This field sets the minimum number of query terms that will be included in any generated query.<\/td>\n<\/tr>\n \n Lucene Top Level Field<\/td>\n String<\/td>\n NA<\/td>\n This property is used to configure the default field for query terms.<\/td>\n<\/tr>\n \n Lucene No Of Pages<\/td>\n Integer<\/td>\n NA<\/td>\n This property specifies the number of documents to be returned in a query search.<\/td>\n<\/tr>\n \n Lucene Index Fields<\/td>\n List of Values<\/td>\n title<\/p>\n This property is used as an index field for searching the document type using Lucene.<\/td>\n<\/tr>\n \n Lucene Stop Words<\/td>\n List of Values<\/td>\n title<\/p>\n This property sets the words to be ignored when classifying a document.<\/td>\n<\/tr>\n \n Search Classification Switch<\/td>\n List of Values<\/td>\n ON<\/p>\n This property enables or disables the SEARCH_CLASSIFICATION plugin for the batch class.<\/td>\n<\/tr>\n \n Search Classification Max Results<\/td>\n Integer<\/td>\n NA<\/td>\n This field defines the maximum number of alternate value results that will be generated in the batch.xml.<\/p>\n \n First Page Confidence Score Value<\/td>\n Integer<\/td>\n NA<\/td>\n This property is used to update the confidence score based on the first page type.<\/td>\n<\/tr>\n \n Middle Page Confidence Score Value<\/td>\n Integer<\/td>\n NA<\/td>\n This property is used to update the confidence score based on the middle page type.<\/td>\n<\/tr>\n \n Last Page Confidence Score Value<\/td>\n Integer<\/td>\n NA<\/td>\n This property is used to update the confidence score based on the last page type.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n Search Classification Execution Process<\/h2>\n
\n
\n
Troubleshooting<\/h2>\n