Differences: Document Assembler vs Advanced Document Assembler
* Page Processing
For search classification the calculated page confidence values are weighted by the First Page Confidence Score Value, middle Page Confidence Score Value & last Page Confidence Score Value configured in the Search_Classification plugin. This happens before Document Assembly.
* Regular Document Assembly
1. Separation
For the batch:
- look at the highest confidence value for each page
- if it is a first page then start a new document
2. Classification
For each document in the batch:
- Look at all the page confidence values for all pages
– calculate the weighted confidence for each doc type found within the document
– weighting factors:– if first, middle & last pages found in order for same doctype apply first-middle-last page weights
– else if first & last pages found in order for same doctype apply first-last page weights
– else if first & middle pages found in order for same doctype apply first-middle page weights
– else if middle & last pages found in order for same doctype apply middle-last page weights
– else if just a first page apply first-page page weights
– else if just a middle page apply middle-page page weights
– else if just a last page apply last-page page weights
** What weighting factors apply?
For the doc confidence score, the following is applied not for separation, but to generate doc classification confidence score:
DA Rule First-middle-last Page: 100
DA Rule First Page: 50
DA Rule Middle Page: 25
DA Rule Last Page: 50
DA Rule First-last Page: 75
DA Rule First-middle Page: 50
DA Rule Middle-last Page: 50
– the highest weighted value is used as the doc type for the document
* Advanced Document Assembly
1. Separation
For the batch:
- Forward and reverse page level look-aheads and look-behinds to all alternate values are applied to a proprietary algorithm. Decision making is based on every permutation of pages and alternative value information in the xml.
The algorithms rely on weightings from DOCUMENT_ASSEMBLER:
DA First Page Confidence Threshold: 50
DA Middle Page Confidence Threshold: 15
DA Last Page Confidence Threshold: 10
2. Classification
For each document in the batch:
- Look at all the page confidence values for all pages
– calculate the weighted confidence for each doc type found within the document
– weighting factors:– if first, middle & last pages found in order for same doctype apply first-middle-last page weights
– else if first & last pages found in order for same doctype apply first-last page weights
– else if first & middle pages found in order for same doctype apply first-middle page weights
– else if middle & last pages found in order for same doctype apply middle-last page weights
– else if just a first page apply first-page page weights
– else if just a middle page apply middle-page page weights
– else if just a last page apply last-page page weights
** What weighting factors apply?
For the doc confidence score, the following is applied not for separation, but to generate doc classification confidence score:
DA Rule First-middle-last Page: 100
DA Rule First Page: 50
DA Rule Middle Page: 25
DA Rule Last Page: 50
DA Rule First-last Page: 75
DA Rule First-middle Page: 50
DA Rule Middle-last Page: 50
– the highest weighted value is used as the doc type for the document