Overview:
This feature lets you define how the document assembler plugin can manipulate the separation results in different ways based on the File boundaries. A property DA File Name Boundary Classification determines the strategy for file boundary classification which operates over the separation results generated by the DA algorithm.
File Boundary Classification Strategies
The three types of File Boundary Classification strategies are-
- UseDAGeneratedDocument– The classification results generated by the DA algorithm remain so without any changes.
- MergeDocumentsBelongingToSameFile– All pages belonging to single multipage input file must be a part of the same document. If the results generated by the DA algorithm contain a document that contains pages belonging to two different input files, a new document is generated at the file boundary. If pages belonging to a single input file are classified into two or more documents, the documents are merged such that all the pages from the source file be a part of one document only.
Example:
Original file name | Broken file name | Page ID | Doc ID as generated by DA | Result |
File1.tiff | File1-0001.tiff | PG0 | DOC1 | DOC1 |
File1-0002.tiff | PG1 | DOC1 | DOC1 | |
File1-0003.tiff | PG2 | DOC2 | DOC1 | |
File1-0004.tiff | PG3 | DOC2 | DOC1 | |
File2.tiff | File2.tiff | PG4 | DOC2 | DOC2 |
File3.tiff | File3-0001.tiff | PG5 | DOC3 | DOC3 |
File3-0002.tiff | PG6 | DOC3 | DOC3 | |
Original file name | Broken file name | Page ID | Doc ID as generated by DA | Result |
File1.tiff | File1-0001.tiff | PG0 | DOC1 | DOC1 |
File1-0002.tiff | PG1 | DOC2 | DOC1 | |
File1-0003.tiff | PG2 | DOC3 | DOC1 | |
File1-0004.tiff | PG3 | DOC4 | DOC1 | |
File2.tiff | File2.tiff | PG4 | DOC5 | DOC2 |
File3.tiff | File3-0001.tiff | PG5 | DOC6 | DOC3 |
File3-0002.tiff | PG6 | DOC7 | DOC3 |
3. CreateNewDocumentForDifferentFile– If a document consists of pages belonging to two different input files, a new document is created at the file boundary. Pages belonging to a source file may get separated into multiple documents but no such document should span across another input file.
Example:
Original file name | Broken file name | Page ID | Doc ID as generated by DA | Result | |||
File1.tiff | File1-0001.tiff | PG0 | DOC1 | DOC1 | |||
File1-0002.tiff | PG1 | DOC1 | DOC1 | ||||
File1-0003.tiff | PG2 | DOC2 | DOC2 | ||||
File1-0004.tiff | PG3 | DOC2 | DOC2 | ||||
File2.tiff | File2.tiff | PG4 | DOC2 | DOC3 | |||
File3.tiff | File3-0001.tiff | PG5 | DOC3 | DOC4 | |||
File3-0002.tiff | PG6 | DOC3 | DOC4 | ||||
Original file name | Broken file name | Page ID | Doc ID as generated by DA | Result | |||
File1.tiff | File1-0001.tiff | PG0 | DOC1 | DOC1 | |||
File1-0002.tiff | PG1 | DOC2 | DOC2 | ||||
File1-0003.tiff | PG2 | DOC3 | DOC3 | ||||
File1-0004.tiff | PG3 | DOC4 | DOC4 | ||||
File2.tiff | File2.tiff | PG4 | DOC5 | DOC5 | |||
File3.tiff | File3-0001.tiff | PG5 | DOC6 | DOC6 | |||
File3-0002.tiff | PG6 | DOC7 | DOC7 |