{"id":31850,"date":"2015-03-09T12:43:31","date_gmt":"2015-03-09T20:43:31","guid":{"rendered":"https:\/\/ephesoft.com\/docs\/2019-1-2\/moduleplugin-configuration\/document-assembly-module\/document-assembler-plugin-2\/"},"modified":"2021-01-26T15:11:02","modified_gmt":"2021-01-26T22:11:02","slug":"document-assembler-plugin-2","status":"publish","type":"docs","link":"https:\/\/ephesoft.com\/docs\/products\/transact\/features-and-functions\/administrator\/moduleplugin-configuration\/document-assembly-module\/document-assembler-plugin-2\/","title":{"rendered":"Document Assembler Plugin"},"content":{"rendered":"
Available<\/b>: on-premises, cloud<\/p>\nOverview<\/h1>\n
The DOCUMENT_ASSEMBLER plugin is responsible for forming multi-page documents from single pages. This plugin reads all the pages present inside the “Unknown” document type and creates new documents on the basis of page-level fields. The DOCUMENT_ASSEMBLER plugin will review page level field results and use the page_level_index fields to determine the document type, as well as which page is the first page.<\/p>\n
Ephesoft Transact supports 7 types of classification, which are explained in further detail below.<\/p>\n
Only one type of classification can be set at a time for a batch. However, if the classification type is set to Automatic<\/strong> then the results of several classification types (Barcode, Image, Lucene, Keyword, and Multidimensional) are all taken into consideration, and the top results among them are used to classify pages into documents.<\/p>\n 1.\u00a0 DA Merge Unknown Document Switch<\/strong><\/p>\n This feature is used to merge the UNKNOWN document types into a classified document.<\/p>\n Example: Suppose after the algorithm the documents are classified as listed below:<\/p>\n After executing the DA Merge Unknown Document Switch<\/strong>, the results would be:<\/p>\n DOC4 and DOC5 were merged into DOC3, and DOC7 was merged into DOC6.<\/p>\n 2.\u00a0 Predefined Document Change <\/strong><\/p>\n This feature is used to set a particular document type for any documents whose confidence value is less than the configured threshold values. The UNKNOWN document types will not be affected.<\/p>\n Example: Suppose after the algorithm the documents are classified with provided confidence values. The threshold value is 50 and the document type to be set is DOC_TYPE_DEFAULT:<\/p>\n After executing the Predefined Document Change<\/strong>, the resulting classification would be:<\/p>\n Any documents below the configured confidence threshold of 50 were set to DOC_TYPE_DEFAULT.<\/p>\n 3.\u00a0 Change Unknown Document Type<\/strong><\/p>\n This feature is used to set a particular document type to all the UNKNOWN document types. You can provide the document type for each batch class in the DOCUMENT_ASSEMBLER plugin under Modules<\/strong>. All UNKNOWN document types will be classified as the provided document type.<\/p>\n Example: Suppose after the algorithm the documents are classified as listed below:<\/p>\n After setting the Change Unkown Document Type<\/strong> to DOC_TYPE_DEFAULT then the classification is as follows:<\/p>\n Document types previously listed as UNKNOWN are now classified as DOC_TYPE_DEFAULT.<\/p>\n 4.\u00a0 DA Delete Document First Page<\/strong><\/p>\n This feature is used to delete the first page of all the classified documents. This is used to remove the separator sheets or barcode pages if there are any in the batch. If the DA Delete Document First Page <\/strong>switch is ON<\/strong> then the first page of all the documents will be removed. No pages will be removed if the document is only one page.<\/p>\n 5.\u00a0 Regex Classification<\/strong><\/p>\n When performing classification using Barcode Classification<\/strong> or from the results of the KV_PAGE_PROCESS plugin, then Ephesoft Transact will check all page level field values with the regular expression (regex) for the UNKNOWN document type. For the page whose page level field has a value matching to regular expression, the DOCUMENT_ASSEMBLER plugin will create a new document with the same type as the one provided in Default Regex Document Type<\/strong> property. This way the DOCUMENT_ASSEMBLER plugin will create a new document for each page which has similar value to the regex provided.<\/p>\n Example:<\/p>\n Documents are classified as listed below before execution of this feature:<\/p>\n In this scenario, the DOCUMENT_ASSEMBLER plugin will only work on DOC3 as this is the only UNKNOWN document type.<\/p>\n Let\u2019s assume documents DOC1 and DOC2 has pages PG0, PG1 and PG2 among them and DOC3 has four pages with the following page level field values:<\/p>\n As none of the pages have a page level field that matches the provided regex ( a* ), these pages will remain in DOC3 as UNKNOWN.<\/p>\n In PG5, the third page level field (Aaa) matches the regex, so this page will be put into a new document (DOC4) with the configured default document type: Regex Doc Type.<\/p>\n In PG6, the second page level field (Application) also matches the regex, so this page will also be put into a new document (DOC5) with the configured default document type: Regext Doc Type.<\/p>\n The final classified documents will be:<\/p>\n 6.\u00a0 Advanced Document Assembler Algorithm<\/strong><\/p>\n Scenario 1:<\/strong><\/p>\n If the algorithm has three pages as input and they all are classified as follows:<\/p>\n Three individual documents are created.<\/p>\n Scenario 2:<\/strong><\/p>\n If the algorithm has three pages as input and they are classified as follows:<\/p>\n Scenario 1:<\/strong><\/p>\n If the algorithm has three pages as input and they all are classified as follows:<\/p>\n Scenario 2:<\/strong><\/p>\n If the algorithm has three pages as input and they all are classified as follows:<\/p>\n 7.\u00a0 Confidence Threshold Matching Algorithm<\/strong><\/p>\n The Confidence Threshold Matching Algorithm decides if the pages must be merged.<\/p>\n Consider the confidence of a page as “X” and the page the algorithm is trying to match is a middle page with confidence “Y”, then the algorithm will check if<\/p>\n (X-Y) < M_P_C_T<\/p>\n Then the algorithm will merge the pages into one document. If the confidence threshold is not met, the pages will not merge and a new document will be created with this page in it.<\/p>\n When the algorithm is matching the last pages and the confidence of the last page from alternate values is “Y”, then the algorithm will check if<\/p>\n (X-Y) < L_P_C_T<\/p>\n If this is true, then the algorithm will merge them. If the confidence threshold is not met, the pages will not merge.<\/p>\n Assumptions<\/strong><\/p>\n There are following assumptions and requirements for this algorithm:<\/p>\n The DOCUMENT_ASSEMBLER plugin can be configured in the\u00a0Plugin Configuration\u00a0<\/strong>screen. Open the batch class you want to configure and select\u00a0Modules\u00a0<\/strong>>\u00a0Document Assembly\u00a0<\/strong>> DOCUMENT_ASSEMBLER<\/strong>.<\/p>\n <\/p>\n Figure 1. DOCUMENT_ASSEMBLER Plugin<\/em><\/p>\n The following table shows the configurable properties and their related values.<\/p>\n The plugin assumes the page processing plugins for respective classification types have been executed and the page level fields for each image are populated. The DOCUMENT_ASSEMBLER plugin works on the page level field values for each page and classifies pages into documents.<\/p>\n Here are a few common error messages and their root cause:<\/p>\n <\/p>\n","protected":false},"featured_media":0,"parent":31849,"menu_order":0,"comment_status":"closed","ping_status":"open","template":"","doc_tag":[],"yoast_head":"\nDocument Assembler Plugin Properties<\/h1>\n
\n
\n
\n
\n
\n
\n
\n
\n
\n\n
\n Page<\/strong><\/td>\n Page level field values<\/strong><\/td>\n<\/tr>\n \n PG3<\/td>\n i. 123ii. Invoiceiii. US<\/td>\n<\/tr>\n \n PG4<\/td>\n i. 990ii. Invoiceiii. Checklist<\/td>\n<\/tr>\n \n PG5<\/td>\n i. 789ii. Documentiii. Aaa<\/td>\n<\/tr>\n \n PG6<\/td>\n i. 456ii. Applicationiii. Checklist<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
UI Configuration<\/h2>\n
\n\n
\n Configurable property<\/strong><\/td>\n Value options<\/strong><\/td>\n Description<\/strong><\/td>\n<\/tr>\n \n DA Barcode confidence<\/td>\n 0-100<\/td>\n This field is used to specify the barcode confidence. The confidence value for classified document type in Barcode classification is this value.<\/td>\n<\/tr>\n \n DA Rule First-middle-last Page<\/td>\n 0-100<\/td>\n This field is used to specify the confidence for first, middle and last page document.<\/td>\n<\/tr>\n \n DA Rule First Page<\/td>\n 0-100<\/td>\n This field is used to specify the confidence for the first page document.<\/td>\n<\/tr>\n \n DA Rule Middle Page<\/td>\n 0-100<\/td>\n This field is used to specify the confidence for the middle page document<\/td>\n<\/tr>\n \n DA Rule Last Page<\/td>\n 0-100<\/td>\n This field is used to specify the confidence for the last page document.<\/td>\n<\/tr>\n \n DA Rule First-last Page<\/td>\n 0-100<\/td>\n This field is used to specify the confidence for the first and last page documents.<\/td>\n<\/tr>\n \n DA Rule First-middle Page<\/td>\n 0-100<\/td>\n This field is used to specify the confidence for the first and middle page documents.<\/td>\n<\/tr>\n \n DA Rule Middle-last Page<\/td>\n 0-100<\/td>\n This field is used to specify the confidence for the middle and last page documents.<\/td>\n<\/tr>\n \n DA Classification Type<\/td>\n \n \n
This value decides the document classification type to be used for classification.<\/td>\n<\/tr>\n \n DA Merge Unknown Document Switch<\/td>\n \n \n
\n
This value decides whether the unknown document will be merged with the pre-classified document.<\/td>\n<\/tr>\n \n DA Delete Document First Page Switch<\/td>\n \n \n
\n
This value decides whether the first page of the document will be deleted if the document has more than one page.<\/td>\n<\/tr>\n \n Advanced DA Switch<\/td>\n \n \n
\n
This value decides whether to run the ADVANCED_DOCUMENT_ASSEMBLER algorithm.<\/td>\n<\/tr>\n \n DA First Page Confidence Threshold<\/td>\n 0-100<\/td>\n This field is used in Advanced DA to specify the confidence threshold of the first page for classification into a document.<\/td>\n<\/tr>\n \n DA Middle Page Confidence Threshold<\/td>\n 0-100<\/td>\n This field is used in Advanced DA to specify the confidence threshold of the middle page for classification into a document.<\/td>\n<\/tr>\n \n DA Last Page Confidence Threshold<\/td>\n 0-100<\/td>\n This field is used in Advanced DA to specify the confidence threshold for the last page for classification into a document.<\/td>\n<\/tr>\n \n Predefined Document Type<\/td>\n Pre-defined Document Type<\/td>\n This value specifies the document type that a document will be changed to based on the Predefined Document Confidence Threshold.<\/td>\n<\/tr>\n \n Predefined Document Confidence Threshold<\/td>\n 0-100<\/td>\n This field is used to specify the threshold confidence below which a document type will be classified into a Predefined Document Type.<\/td>\n<\/tr>\n \n Change Unknown Document Type Switch<\/td>\n \n \n
\n
This value decides whether the document classified as UNKNOWN will be changed to the Predefined Document Type.<\/td>\n<\/tr>\n \n Change Unknown Document To Document Type<\/td>\n Pre-defined Document Type<\/td>\n This value specifies the document type that a document will be changed to based on the Change Unknown Document Type switch.<\/td>\n<\/tr>\n \n Regex Classification Switch<\/td>\n \n \n
\n
This value decides whether KV Page process and Barcode Reader results will be compared with the Regex Classification Pattern. If there is a match then the document type is changed to what is specified in the Regex Classification Default Document Type.<\/td>\n<\/tr>\n \n Regex Classification Pattern<\/td>\n Regex pattern<\/td>\n This value specifies the regex pattern that will be compared to the KV Page process or Barcode Reader values.<\/td>\n<\/tr>\n \n Regex Classification Default Document Type<\/td>\n Pre-defined Document Type<\/td>\n This value specifies the document type the document will be changed to based on the Regex Classification switch.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n Steps of execution<\/h2>\n
\n
Dependency<\/h2>\n
Troubleshooting<\/h2>\n
\n\n
\n Error message<\/td>\n Possible root cause<\/td>\n<\/tr>\n \n Invalid format of page level fields. DocFieldType found for {Document Assembler Classification Type} classification is null.<\/td>\n \n \n
\n
\n DocumentType name was not found in the database for the page type name<\/td>\n The barcode decoded value is not found as a document type in the Ephesoft Application database.<\/td>\n<\/tr>\n \n No Document type defined for batch instance<\/td>\n The batch class doesn\u2019t have a document type for classification.<\/td>\n<\/tr>\n \n Invalid integer for barcode confidence score in the properties file.<\/td>\n An invalid value was used in \u201cDA Barcode confidence\u201d at Ephesoft Admin Screen Configuration.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n