Document Assembly: File Boundary Classification

Overview:

This feature lets you define how the document assembler plugin can manipulate the separation results in different ways based on the File boundaries. A property DA File Name Boundary Classification determines the strategy for file boundary classification which operates over the separation results generated by the DA algorithm.

 

File Boundary Classification Strategies

The three types of File Boundary Classification strategies are-

  1. UseDAGeneratedDocument– The classification results generated by the DA algorithm remain so without any changes.
  2. MergeDocumentsBelongingToSameFile– All pages belonging to single multipage input file must be a part of the same document. If the results generated by the DA algorithm contain a document that contains pages belonging to two different input files, a new document is generated at the file boundary. If pages belonging to a single input file are classified into two or more documents, the documents are merged such that all the pages from the source file be a part of one document only.

Example:

Original file nameBroken file namePage IDDoc ID as generated by DAResult
File1.tiffFile1-0001.tiffPG0DOC1DOC1
File1-0002.tiffPG1DOC1DOC1
File1-0003.tiffPG2DOC2DOC1
File1-0004.tiffPG3DOC2DOC1
File2.tiffFile2.tiffPG4DOC2DOC2
File3.tiffFile3-0001.tiffPG5DOC3DOC3
File3-0002.tiffPG6DOC3DOC3
Original file nameBroken file namePage IDDoc ID as generated by DAResult
File1.tiffFile1-0001.tiffPG0DOC1DOC1
File1-0002.tiffPG1DOC2DOC1
File1-0003.tiffPG2DOC3DOC1
File1-0004.tiffPG3DOC4DOC1
File2.tiffFile2.tiffPG4DOC5DOC2
File3.tiffFile3-0001.tiffPG5DOC6DOC3
File3-0002.tiffPG6DOC7DOC3

3. CreateNewDocumentForDifferentFileIf a document consists of pages belonging to two different input files, a new document is created at the file boundary. Pages belonging to a source file may get separated into multiple documents but no such document should span across another input file.

Example:

Original file nameBroken file namePage IDDoc ID as generated by DAResult
File1.tiffFile1-0001.tiffPG0DOC1DOC1
File1-0002.tiffPG1DOC1DOC1
File1-0003.tiffPG2DOC2DOC2
File1-0004.tiffPG3DOC2DOC2
File2.tiffFile2.tiffPG4DOC2DOC3
File3.tiffFile3-0001.tiffPG5DOC3DOC4
File3-0002.tiffPG6DOC3DOC4
Original file nameBroken file namePage IDDoc ID as generated by DAResult
File1.tiffFile1-0001.tiffPG0DOC1DOC1
File1-0002.tiffPG1DOC2DOC2
File1-0003.tiffPG2DOC3DOC3
File1-0004.tiffPG3DOC4DOC4
File2.tiffFile2.tiffPG4DOC5DOC5
File3.tiffFile3-0001.tiffPG5DOC6DOC6
File3-0002.tiffPG6DOC7DOC7