Introduction
This reference document provides information for developers who want to add additional functionality or customize their solution beyond a regular installation of Ephesoft Transact. This document applies to Ephesoft Transact 2019.1 and above.
- Developer tasks could entail web services, scripting, interoperation between Ephesoft Transact and external applications, or other custom integrations of Ephesoft Transact.
- This document describes the layout and content of the batch.xml file so that this metadata can be consumed somewhere else.
- With the information contained in this document, a developer can write a workflow script that can manipulate this batch-level information.
Overview of Batch.xml Schema and XSD
The batch.xml file schema and matching XSD contain metadata and multi-level information for every batch processed in the Ephesoft Transact workflow. The batch.xml file contains metadata for each batch instance at the batch level, document level and page levels.
The batch.xml file and XSD support the following field levels. This hierarchy of fields applies to the batch.xml schema for each batch instance that has begun the workflow process:
- Batch-level Fields — The fields on this level apply to the entire batch instance as a whole.
- Document Fields — The fields on this level apply to all documents in the batch instance.
- Document-level Fields — The fields on this level apply to individual documents within the batch instance.
- Page Fields — The fields on this level apply to all pages within the batch instance.
- Page-Level Fields — The fields on this level apply to each individual page within the batch instance.
- Email Metadata in the Batch.xml Schema — Email heading metadata is available on multiple levels of the batch.xml schema for any batch instance that uses email import.
Batch-Level Fields
Refer to the following table for the batch-level fields in the Ephesoft Transact batch.xml schema and XSD.
Note: For information on email metadata, refer to the Email Metadata in the Batch.xml Schema section of this document.
Batch-Level Field Name | Description | Module | Plugin |
BatchInstanceIdentifier | This value is the Identifier column in the batch_instance table. Each batch in Ephesoft Transact has a unique batch identifier. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
BatchClassIdentifier | This value is the Identifier column in the batch_class table. Each batch in Ephesoft Transact is run under a batch class that is a single unit for all configurations and workflow definitions. | Folder Import | IMPORT_BATCH_FOLDE_PLUGIN |
BatchClassName | This value is the batch_class_name column in the batch_class table. Each batch in Ephesoft Transact is run under a batch class that is a single unit for all configurations and workflow definitions. A foreign key relation is established between the ID column of the batch_class table and the batch_class_id column in the batch_instance table. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
Signature | This value is added only when batch encryption is enabled. This is used to ensure that the batch.xml file can only be read and updated by Ephesoft Transact. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
BatchClassDescription | This is the value of the batch_class_description column in the batch_class table. Each batch in Ephesoft Transact is run under a batch class that is a single unit for all configurations and workflow definitions. A foreign key relation is established between the ID column of the batch_class table and the batch_class_id column in the batch_instance table. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
BatchClassVersion | This is the version number of the batch class under which the batch was processed. This is the value of the batch_class_version column in the batch_class table. Each batch in Ephesoft Transact is run under a batch class that is a single unit for all configurations and workflow definitions. A foreign key relation is established between the ID column of the batch_class table and the batch_class_id column in the batch_instance table. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
BatchName | This is the value of the batch_name column in the batch_instance table. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
BatchDescription | This is the batch instance description that is provided at the time of batch creation. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
BatchPriority | This is the value of the batch_priority column in the batch_instance table. Priority can be a value between 1 to 100 with the lower number having higher priority. If not assigned using custom code the batch priority will be the priority from the batch class which is assigned when the batch class is created or imported. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
BatchStatus | This value specifies the current batch status. Possible values for this field are:
|
All modules can modify this field | All plugins can modify this field |
BatchReviewedBy | This field cites users who performed batch review and enables administrators to audit users who are active in the batch instance. This node applies at the batch level and the individual page level.
|
Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
BatchValidatedBy | This field cites users who performed validation and enables administrators to audit all users who are active in the batch instance, and this node applies at the batch level and the individual page level, with these parameters.
|
Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
BatchCreationDate | This is the value of the creation_date column in the batch_class table. This is the date and time of when the batch was created in Ephesoft Transact. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
BatchLocalPath | This is the Transact system folder path where the batch instance folder will be available. This value will be the same across all batches in the system. | Folder Import | IMPORT_BATCH_FOLDE_PLUGIN |
BatchSource | This value states which ingestion mechanism was used to import the batch into system. Possible values are:
|
Folder Import Module | IMPORT_BATCH_FOLDER |
UNCFolderPath | This is the path where the source file for the batch is available. This is a unique path for each batch in the system. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
ETextMode | The batch.xml will have an additional tag, ETextMode, at the root level to define the EText mode. Values may be Automatic, Always, and Never. The value will be populated based on the plugin setting for the following:
|
Page Process | RECOSTAR_HOCR
NUANCE_HOCR |
DocumentClassificationTypes | This value specifies what classification type was used for the document assembly. | Assembly | DOCUMENT_ASSEMBLER |
Document Fields
The document fields in the batch.xml schema apply to all documents in the batch instance. Document fields exist at a higher level than document-level fields.
Note: For information on email document fields in the batch.xml schema, refer to Email Metadata in the Batch.xml Schema section of this document.
The following example illustrates the docField for a batch instance:
<xs:complexType name=”docField”>
<xs:complexContent>
<xs:extension base=”field”>
<xs:sequence>
<xs:element name=”AlternateValues” minOccurs=”0″
maxOccurs=”1″>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs=”0″ maxOccurs=”unbounded” name=”AlternateValue”
type=”field” />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name=”PreviousValue” type=”docField”
minOccurs=”0″ maxOccurs=”1″ />
<xs:element name=”Category” type=”xs:string” minOccurs=”0″
maxOccurs=”1″ />
<xs:element name=”hidden” type=”xs:boolean” minOccurs=”0″
maxOccurs=”1″ />
<xs:element name=”widgetType” type=”xs:string”
minOccurs=”0″ maxOccurs=”1″ />
<xs:element name=”scriptEnabled” type=”xs:boolean”
minOccurs=”0″ maxOccurs=”1″ />
<xs:element name=”Message” type=”xs:string” minOccurs=”0″
maxOccurs=”1″ />
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
The following table lists and defines the fields contained in the docField section of the batch.xml schema.
Document Field Name | Description | Module | Plugin |
AlternateValues | This field contains alternate values for a page-level field. This field stores alternative classification information with confidence levels.
In Ephesoft Transact 4.5.X.X and prior releases, during classification, a page can be classified into 10 different types. The type with the highest confidence value will be set in the page-level field value. All other possible types for a page will be present in alternative values. This tag will contain the LearnedFileName tag for all the alternate values. In Ephesoft Transact 2019.1 and later releases, during classification, the default value for this field is changed to 5. |
Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
Document-Level Fields
Refer to the following table for the document-level fields in the Ephesoft batch.xml schema and XSD.
Note: For email document-level fields in the batch.xml schema, refer to Email Metadata in the Batch.xml Schema.
DocumentLevel Field Name | Description | Module(s) | Plugin(s) |
Identifier | This is the Document Identifier for a document. The sequence for document numbering is DOC0, DOC1…DOCn
Note: In current versions of Transact, each file becomes a separate batch and there will be only one document in the XML after Folder Import. All pages will belong to this one document. The page to document grouping will change after the DOCUMENT_ASSEMBLER_PLUGIN within the Page Process module is executed. The pages may be grouped into multiple documents. |
Folder Import
Document Assembly |
IMPORT_BATCH_FOLDER_PLUGIN
DOCUMENT_ASSEMBLER_PLUGIN |
Type | This is the document type assigned to the document.
Note: In current versions of Transact, each file becomes a separate batch, so there will only be one document in the XML after Folder Import. All pages will belong to this one document and named Unknown. The page to document grouping will change after the DOCUMENT_ASSEMBLER_PLUGIN within the Page Process module is executed. The pages may be grouped into multiple documents. The document that Transact used to determine classification for all pages is assigned this tag. The document types that belong to the batch class assigned to the batch is available in the database table document_type (field – document_type_name). This table has a foreign key reference to the ID column of the batch_class table that associates documents to batch class. |
Folder Import
Document Assembly |
IMPORT_BATCH_FOLDER_PLUGIN
DOCUMENT_ASSEMBLER_PLUGIN |
ExtractionType | Note: This is applicable to Ephesoft Transact 2020.1 or above.
The possible values for the element are:
|
||
Description | This tag contains the corresponding document_type_description of the assigned document_type_name above. It is picked up from the database table document_type. | Folder Import
Document Assembly |
IMPORT_BATCH_FOLDER_PLUGIN
DOCUMENT_ASSEMBLER_PLUGIN |
Size | This element contains the document’s multipage PDF file size in bytes. This field is only populated by the IBM_CM_PLUGIN. | Export Module | IBM_CM_PLUGIN |
Confidence | This value is the confidence with which the document was assembled. If the confidence is greater than the minimum confidence threshold assigned to the document, then the document is not marked for operator review. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN
DOCUMENT_ASSEMBLER_PLUGIN |
OcrConfidenceThreshold | This field helps Ephesoft Transact to decide if the document should skip document review automatically when the classification score is higher than the threshold. The document confidence threshold is available in the table document_type (located in the field- min_confidence_threshold column). The best practice is to set the threshold so that false positives are minimized. | Document Assembly | DOCUMENT_ASSEMBLER_PLUGIN |
OcrConfidence | This is the confidence value returned from the OCR engine used with Ephesoft Transact. | ||
CoordinatesList | Note: This is applicable to Ephesoft Transact 2020.1 or above.
This tag includes the <ExtractionType> element. |
||
ExtractionRuleID | The Rule ID allows batch class designers to identify which rule has been applied for the extraction of each index field. The Rule ID can be found as an attribute listed for a key-value extraction rule.
Note: Applies to 2022.1.00 and above. |
Extraction Module | KEY_VALUE_EXTRACTION |
Valid | This tag determines if the document would stop for Document Field Validation review. This applies only when data extraction is part of the batch class. The value of False indicates that the document has fields that need to stop for Document Field Validation review. The value of True indicates that all fields in the document were extracted with high confidence and need not stop for Document Field Validation review. The value is set to True after execution of REVIEW_DOCUMENT_PLUGIN if the extraction module is not configured in the batch class. | Document Assembly
Review Document |
DOCUMENT_ASSEMBLER_PLUGIN
REVIEW_DOCUMENT_PLUGIN |
Reviewed | This tag indicates that the document has passed through the REVIEW_DOCUMENT_PLUGIN. The value of False indicates that the document was assembled/classified with low confidence and needs to stop for document classification Review. The value of True indicates that the document was assembled/classified with high confidence and does not need to stop for document classification Review. The value is set to True after execution of REVIEW_DOCUMENT_PLUGIN.
Note: Setting the value to False will not force the batch to stop in Review. To stop the batch, you will need to set the confidence score that is lower than the threshold. |
Document Assembly
Review Document |
DOCUMENT_ASSEMBLER_PLUGIN
REVIEW_DOCUMENT_PLUGIN |
Reviewedy | This field lists users who performed reviews in the batch. This field enables administrators to audit users who are active in the batch instance. This node applies at the batch level and the individual page level:
Refer to Case Study: ReviewedBy Node in Batch.xml. |
Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
ValidatedBy | This field lists users who performed validation for the batch. This node enables administrators to audit all users who are active in the batch instance, and this node applies at the batch level and the individual page level, with these parameters:
|
Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
ErrorMessage | A string that contains an error message to be displayed on the Review and Validate screen corresponding to a document. The value of this tag can be set using a scripting plugin. Ephesoft Transact does not set a value for this field. | Review/Validation | Review/Validation |
Document
DisplayInfo |
This field can be used to provide customized names to documents on the Review and Validate screen. The value of this tag can be set using a scripting plugin. Ephesoft Transact does not set a value for this field. | Review/Validation | Review/Validation |
Document
LevelFields |
This field contains the parent node for all index fields inside the documents. | Extraction Module | All plugins inside extraction module can modify this field. |
Pages | This field contains the parent node for all pages inside the document. | Document Assembly | DOCUMENT_ASSEMBLER |
DataTables | This is the root node for all the extracted table information in the document. | Extraction Module | TABLE_EXTRACTION |
AutoSuggestedDataTables | All the auto extraction from documents is present under this node. | Extraction Module | AUTO_TABLE_EXTRACTION_PLUGIN |
MultiPage
TiffFile |
This value contains the name of the multipage TIFF file created. The COPY_BATCH_XML plugin changes this name to the exact location while exporting the file to a user-defined location. Please note the batch.xml in the ephesoft-system-folder in the shared folders that still contain the file name. | Export Module | CREATEMULTIPAGE_FILES
COPY_BATCH_XML |
MultiPage
PdfFile |
This value contains the name of the multipage PDF file created. The COPY_BATCH_XML plugin changes this name to the exact location while exporting the file to a user-defined location. Please note the file name of the batch.xml contained in the shared folders of the ephesoft-system-folder | Export Module | CREATEMULTIPAGE_FILES
COPY_BATCH_XML |
FinalMultiPage
PdfFilePath |
This element is present in the batch.xml in the ehesoft-system-folder and contains the absolute path of the exported multipage PDF document (by the COPY_BATCH_XML plugin). This value is different from MultiPagePdfFile which contains the file name only. | Export Module | COPY_BATCH_XML |
FinalMultiPage
TiffFilePath |
This element is present in the batch.xml in the ehesoft-system-folder and contains the absolute path of the exported multipage TIFF document by the COPY_BATCH_XML plugin. It is different from MultiPagePdfFile which contains the file name only. | Export Module | COPY_BATCH_XML |
Page Fields
Page fields in the batch.xml schema and XSD apply to all pages in the document. Refer to the following table for the page fields in the Ephesoft batch.xml schema and XSD.
Note: For email page fields in the batch.xml schema, refer to Email Metadata in the Batch.xml Schema.
Page Field Name | Description | Module | Plugin |
Identifier | This is the document identifier for a document. The sequence for document numbering is PG0, PG1…PGn.
Note: The IMPORT_BATCH_FOLDER_PLUGIN breaks up each page of the source PDF into individual TIFF files. Each TIFF file is a page in the XML file. The pages can be grouped as documents. |
Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
OldFileName | This tag contains the name of the mapped individual TIFF file within the input folder for the batch. The input folder path is available in the tag UNCFolderPath under batch-level fields. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
NewFileName | This tag contains the name of the mapped individual TIFF file within the Ephesoft system folder. The Ephesoft system folder path is available in the tag BatchLocalPath. The path to the batch instance folder is <BatchLocalPath>\<BatchInstanceIdentifier>. The name of the associated file to this page is a combination of the batch instance identifier and the page sequence. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
SourceEmailID | This element links the page to the source email from which the page originated. It contains only the identification (ID) of the source email. This ID can be searched for in a separate section of the batch.xml which contains more details. | Folder Import Module | IMPORT_BATCH_FOLDER |
SourceFileID | This element links the page to the source file which was originally placed in the watch folder. It only contains id of the source file. This id can be looked up in a separate section in the batch.xml which contains more details. | Folder Import Module | IMPORT_BATCH_FOLDER |
PageLevelFields | This value contains the classification information from different configured plugins. The values in this section are used while assembling pages into documents. | Document Assembly | DOCUMENT_ASSEMBLER |
HocrFileName | The RECOSTAR_HOCR_
GENERATION_PLUGIN (for Windows) or the NUANCE_HOCR (for Linux) extract the contents of each page (individual TIFF). The contents are stored in an XML file which is located in the batch instance folder (<BatchLocalPath>\<BatchInstanceIdentifier>). This tag stores the name of the HOCR XML file for the corresponding page. |
Page Process | RECOSTAR_HOCR_GENERATION_PLUGIN |
ThumbnailFileName | This tag stores the name of the corresponding thumbnail for the page.
The IMAGE_PROCESS_CREATE_THUMBNAILS_PLUGIN is used to create thumbnail images of the batch images. These thumbnails are displayed in the Review and Validate screen, where pages in the documents are shown as thumbnails under the document name. The thumbnails are stored in the batch instance folder (<BatchLocalPath>\<BatchInstanceIdentifier>). |
Page Process | IMAGE_PROCESS_CREATE_THUMBNAILS_PLUGIN |
ComparisonThumbnailFileName | This value contains the name of the thumbnail file which can be used by the CLASSIFY_IMAGES plugin. This element will be present only when the Create Compare Thumbnail Switch is on in the CREATE_THUMBNAILS plugin. | Page Process | CREATE_THUMBNAILS |
DisplayFileName | The IMAGE_PROCESS_CREATE_DISPLAY_IMAGE_PLUGIN performs the functionality of creating the display PNG files for the images being processed. This plugin takes all the images and creates PNG files for the corresponding pages and is displayed on the Review and Validate screens. The display images are stored in the batch instance folder (<BatchLocalPath>\<BatchInstanceIdentifier>). This tag stores the name of the corresponding display image for the page. | Page Process | IMAGE_PROCESS_CREATE_DISPLAY_IMAGE_PLUGIN |
OCRInputFileName | This tag stores the name of the file that was used by the RECOSTAR_HOCR_GENERATION_PLUGIN to extract the contents of the page. The image will be the corresponding individual TIFF for the page available in the batch instance folder (<BatchLocalPath>\<BatchInstanceIdentifier>). | Page Process | RECOSTAR_HOCR_GENERATION_PLUGIN |
Direction | This field indicates the direction of a rotated document. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
IsRotated | This field indicates whether a document is rotated on the Review/Validate screen. | Folder Import | IMPORT_BATCH_FOLDER_PLUGIN |
TreatAsEText | For a PDF, the batch.xml is updated for each page that is EText compatible or not EText compatible. The element at the page level is <TreatAsEText> with values of true or false. The <TreatAsEText> element will not display for the pages having an input file other than PDF in UNC. | Page Process | RECOSTAR_HOCR
NUANCE_HOCR |
ImprintedString | Read the imprinted string from serialized file for each image. If the imprinter was enabled during scanning, then add this string as a page-level field in batch.xml. This will be used during rescan. Insert the functionality on the Review and Validate screen. | Folder Import | IMPORT_BATCH_FOLDER |
IsBlank | IsBlank under the Page tag is used if there is no HOCR content associated with the page (i.e., if there is a blank HOCR XML associated with the page or image). | Page Process | RECOSTAR_HOCR
NUANCE_HOCR |
Page-Level Fields
Page-level fields apply to each document page in the batch.xml file. Refer to the following table for the page-level fields in the Ephesoft batch.xml schema and XSD.
Note: For information on email page-level fields in the batch.xml schema, refer to Email Metadata in the Batch.xml Schema.
Page-Level Field Name | Description | Module | Plugins |
Name | This tag contains the name of the classification method used to classify this page. | Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
Value | Each document type within the batch class is subdivided into pages (first, middle, and last). This tag holds the document page for which this page was classified. | Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
Type | This tag is used in barcode classification only where it keeps the information about the barcode type. | Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
Confidence | This tag holds the confidence score with which the page was classified. The DOCUMENT_ASSEMBLER plugin uses this confidence score during document assembly once the workflow enters the Document Assembly module. | Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
LearnedFileName | This tag holds the name of the lucene-search-classification-sample matched against this page/image. | Page Process | SEARCH_CLASSIFICATION
MULTIDIMENSIONAL_CLASSIFICATION_PLUGIN |
Email Metadata in the Batch.xml Schema
Ephesoft Transact 2019.1 and later releases support email-specific metadata passing through to the batch.xml schema. For additional information about accessing and ingesting email header information for batch instances and how that metadata is passed through to the batch.xml, refer to the Accessing Email Headers in the Batch.xml Schema article.
Case Studies for the Batch.xml File
The following case studies describe configuration steps that may be helpful to you when customizing your Transact environment. The first case study describes the steps for accessing the batch.xml file. The second is a batch-level case study that illustrates the BatchReviewedBy and BatchValidatedBy fields in which there are three active users. The third is a document-level case study that shows the ReviewedBy and ValidatedBy Nodes in a sample batch.xml.
Case Study 1: Accessing the Batch.xml File
Perform the following steps to access the batch.xml file for your deployment of Ephesoft Transact.
- Open Ephesoft Transact, log in as Administrator and open the Batch Class Management screen.
- Open a batch class that contains a fully configured Export module, and in which at least one batch instance has been processed.
- Within the Export module, select the COPY_BATCH_XML plugin. The Plugin Configuration screen for this plugin appears on the right.
Plugin Configuration for COPY_BATCH_XML Plugin
- This screen displays the path to the batch.xml file in the field titled Batch XML Export Folder Location field.
- To access this batch.xml file, navigate to this path on the Ephesoft Transact server. Within the sub-folder named final-drop-folder, there will appear another sub-folder that is named for the batch instance. The following screenshot illustrates one such example of a sub-folder for the batch instance.
Sample Batch Instance Subfolder that contains the batch.xml file for that batch instance
The batch.xml file contained in this sub-folder is also named according to the batch instance in which it was created. The following snapshot illustrates one example of a batch.xml filename:
Sample batch.xml filename
- Open the batch.xml file with Notepad+ or a similar text editor. Right-click on the file, select Open With, and choose an application:
Right-click on the batch.xml file and open it with a text editor
Case Study 2: BatchReviewedBy and BatchValidatedBy Nodes
This is a batch-level case study. This example illustrates the BatchReviewedBy and BatchValidatedBy fields in which there are three active users:
- User1 — reviews some documents in the batch
- User2 — reviews some documents in the batch
- Ephesoft — reviews some documents in the batch, and then validates the batch
On the batch instance level, note the following activity from each user:
- Between User1 and User2, it was User2 who was active last in the review module, so this is the user that is cited in the BatchReviewedBy field.
- The BatchValidatedBy field indicates the Ephesoft user validated this batch instance.
Batch.xml file example with batch instance information
Case Study 3: ReviewedBy and ValidatedBy Nodes in Batch.xml
This is a document-level case study. Note that the users shown in this case study each performed the following tasks:
ReviewedBy
- The ephesoft user, user1, and user2 performed a classification review for the documents in this batch instance.
ValidatedBy
- The ephesoft user performed an extraction validation for the documents in this batch instance.
Batch.xml example with user information for Document 1