Contents

 

 

 

Batch Class Import/Export

Overview

This feature allows a user to export/import an existing batch class to other Ephesoft instance. Using this feature, user has a benefit of transferring the exact information of a batch class to another Ephesoft application running on a remote system which will save a lot of time needed to reconfigure the batch class to having exact processing ability on a remote system.

Export Batch Class

By exporting a batch class, one can transfer the exact environment/configuration of a batch class present on a system to other. This helps a lot when a system fault occurs since there is no immediate way to migrate the environment to run batches. This also helps a lot in testing and debugging of issues faced in a configuration dependent environment.

Steps for exporting a Batch Class

  • On ‘Batch Class Management’ UI, select a batch class to be exported (BC1 in screenshot) and click ‘Export’ button.

ExportBatchClass.jpg

 

  • An application pop up will be generated as displayed below:

ExportBatchClassPopup.jpg
Already selected disabled checkboxes show the data which is mandatory to be exported.
Here, one has an option to either check or uncheck ‘image-classification-sample’ and ‘lucene-search-classification-sample’ based on which, already learnt files and sample html, tiff and xml files will or will not be exported.

Note: If user exports a batch class by un-checking ‘image-classification-sample’ and ‘lucene-search-classification-sample’, user first needs to put sample images in:

‘//Shared Folders/BC1 (Batch Class ID)/ image-classification-sample and lucene-search-classification-sample’

folders and learn those too by clicking on ‘Learn Files’ button before running a batch in the exported batch class. Else the batch will go into error.

 

  • After that, one just needs to click ‘Save’ button and corresponding Batch Class will get exported to the desired location in ‘zip’ format.

Clicking ‘Cancel’ will not perform any operation and pop up will be closed.
This zipped batch class file can now be transferred to any other system and can be imported over there.
Data Exported with Batch Class:

When batch class is exported the following data is exported with batch class:

  • Batch Class specific folder including Script files, properties files, sample images etc.
  • Document types, page types, RegEx Expressions (complete batch class hierarchy) defined in database.
  • Some optional data can be selected by the Export Batch Class pop up UI.

Import Batch Class

By importing a batch class, one can create the exact environment/configuration for a batch class present on any other remote system from which a batch class has been exported. This includes import of batch class configurations, document types and related html, xml and tiff files, learnt indexes etc.

Steps for importing a batch class

Prerequisites:

Exported zipped batch class file.

 

  • On ‘Batch Class Management’ UI, click Import button. Following “Import Batch Class” pop up will be generated:

ImportBatchClass.jpg

 

  • Here user needs to browse the zipped batch class file and click ‘Attach’ button. Pop will expand as displayed below:

ImportBatchClassPopup.jpg
Following are the options provided:

 

  • Browse and Attach buttons.

To replace the existing attached zipped batch class file with a new one. User doesn’t need to do anything if the existing attached zipped batch class file is correct.

 

  • UNC folder textbox/dropdown and Use Existing Checkbox.

Here user has an option to either override an existing batch class or create a completely new batch class from the attached zipped batch class file.

Create a new Batch class:

To create a new batch class from the exported zip file, one needs to uncheck ‘Use Existing’ checkbox. As soon as the checkbox is unchecked, UNC Folder dropdown will turn into an empty checkbox and user has to write the exact path (like D:\\Shared Folders\\new-public-unc-folder) where he needs to create a new UNC folder for the newly created batch class.

Please note that the unc folder should be non-existing and unique, else application will prompt user to enter the folder path again.
Override an existing Batch Class:

To override an existing batch class from the exported zip file, one needs to check ‘Use Existing’ checkbox. As soon as the checkbox is checked, UNC Folder textbox will turn into a dropdown containing a list of all existing unc folders as option values. Here user just needs to select one of the unc folders belonging to a batch class which he wants to override.

 

  • Name, Description, Priority textboxes:

Name textbox contains Batch Class type or workflow name (RecostarMailRoom, TesseractMailRoom etc.) of exported Batch Class.

It should always be unique and must not contain space/hyphens.

In Description textbox user can enter the description of the batch class.

Note: Priority textbox is integer bound and should contain values from 1-100 with-

1-25 = Urgent priority

26-50 = High priority

51-75 = Medium priority

75-100 = Low priority.

 

  • Roles, Email Accounts and Batch Class Definition checkboxes.

Roles: If checked, Roles will be picked from the zipped batch class file. Else, Roles will be blank or same as that of existing batch class (in case of override)
Email Accounts: If checked, Email accounts will be picked from the zipped batch class file. Else, Email Accounts will be blank or same as that of existing batch class (in case of override)
Batch Class Definition: If checked, Batch Class Definition will be picked from the zipped batch class file. Else, it will be same as that of existing batch class (to be overridden). It contains three checkboxes Scripts, Folder List and Batch Class Modules.

 

  • Scripts: By default script checkbox is disabled and selected. This contains all the list of scripts of imported batch class.

It will be enabled if one checks ‘Use Existing’ checkbox.

ImportBatchClass scripts.jpg

 

  • Folder List: This contains all the list of folders which were included while importing the batch class. All the mandatory folders are selected and disabled. For non-mandatory folders user can check or uncheck the option based on which corresponding folder will be included or discarded.

ImportBatchClass FolderList.jpg

 

  • Batch Class Modules: By default Batch Class Module checkbox is disabled and selected. This contains all the list of modules of imported batch class.

ImportBatchClass Modules.jpg

 

  • After that, clicking ‘Save’ button will create a new Batch Class with a new Batch Class Identifier.

FAQ

  • Batches are not visible on Batch Instance management and RV screen which run in imported batch class.

Answer: This may happen when roles are not assigned or ‘Roles’ checkbox is left unchecked while importing the batch class. Issue will be solved by assigning roles to the corresponding batch class.

Steps: Go to Batch Class Management screen – Edit the corresponding batch class – Assign roles to the batch class.

 

  • Batches going to error in Page Processing module which runs in imported batch class.

Answer: There are 2 possible reasons for this:

  • Exported zipped batch class file doesn’t contain Lucene and Image classification samples in it.
  • After its import, sample images are put into these folders but their learning is not done.

Issue will be solved either by exporting the batch class again with ‘image-classification-sample’ and ‘lucene-search-classification-sample’ checkboxes checked or by putting sample images in ‘image-classification-sample’ and ‘lucene-search-classification-sample’ folders present in ‘Ephesoft-data’ and clicking ‘Learn Files’ then for the corresponding batch class.

 

Batch Instance Search

Overview

This feature allows a user to search the batch instance on the basis of the batch instance id or name. If for a batch, the search text is contained in either of them, the batch instance will be listed in the search results.

Usability

Search feature is available on the following screens:

Batch Instance Management screen

BatchInstanceManagementScreen.jpg

The above example searched for string “14” in the batch instance id and batch instance name and returns the corresponding results.

 

Batch List screen

BatchListScreen.jpg

The above example searched for string “14” in the batch instance id and batch instance name and returns the corresponding results.

 

Batch Instance Status

Overview

This document defines various batch instance status used to specify the current state of a batch instance.

Batch Instance Status list

Following are the status used during the course of processing of a batch instance:

  • NEW

This status is for a newly created batch. This status shows that a batch has been created but has not yet been picked up for processing. This status can only occur once in the lifetime of a batch instance before its processing is finished.

  • LOCKED

This status is for a batch instance when a lock is taken over it by an executing server so that no other server can start its processing. This is the first thing done on the batch instance after it has been picked up. Once the batch has been locked, further processing can be done over it by the server having its lock held.

  • READY

This status is for a batch instance which has been either restarted or moved out of Review/validation state and is waiting for the pick-up service to start its processing.

  • ERROR

This status is for a batch whose processing has failed, i.e. during its processing some business logic has been defied and hence the batch has been sent to error.

  • FINISHED

This status is for a batch whose processing has been finished and the desired output has been acquired.

  • RUNNING

This status is for a batch which is currently in one of the automatic processing stages. All the stages except for Review and Validation are counted as the automatic processing stages.

  • READY_FOR_REVIEW

This status is for a batch which has reached the “Review_Document” plugin during the course of its processing and now needs to be reviewed manually by the user. During this stage, user can perform various operations on the batch like classifying, splitting, copying, deleting the document etc.

  • READY_FOR_VALIDATION

This status is for a batch which has reached the “Validate_Document” plugin during the course of its processing and now needs to be validated manually by the user. During this stage, user can perform all the operations it can do during the review stage along with the ability to change the value of the document level fields which have been extracted.

  • RESTARTED

This status is for a batch which has been restarted in earlier versions of Ephesoft Application (before 2.4 versions). This status is no longer used in latest version of Ephesoft Application and has only been present to maintain backward compatibility.

  • DELETED

This status is for a batch which has been deleted during the course of its processing.

  • TRANSFERRED

This status is confined for a batch belonging to a batch class which implements “grid computing”. Batch is displayed in ‘Transferred’ status on a source machine when it has been received completely at the remote destination machine but its processing has not started there as yet.

  • RESTART_IN_PROGRESS

This status is for a batch which has been asked to restart and the system is processing on the information on how and from what point the batch will be restarted.

  • REMOTE

This status is confined for a batch belonging to a batch class which implements “grid computing”. Batch is displayed in ‘Remote’ status on a destination machine when it has been received completely from the source machine but its processing has not started here as yet i.e. the batch is still being prepared to be processed on the destination machine.

CMIS Import

Overview

CMIS Import feature downloads files from CMIS server and process them as batches in Ephesoft Application. Using CMIS import user can monitor the CMIS server using a cron job which checks the specified folder for a new file after the specified interval of time. Along with the document, its properties are also downloaded in an xml format. Users can write their own custom scripts to access these properties in the batch being executed.
Batch is created for every file downloaded file from the CMIS server and execute it on the Ephesoft Application.
FORMAT FOR DOWNLOADED XML (containing document properties)

<CmisImport>

<Properties>

<Property>

<Name>Description</Name>

<Value/>

</Property>

<Property>

<Name>Title</Name>

<Value>BI1E_documentDOC2.pdf</Value>

</Property>

</Properties>

</CmisImport>
CMIS Import feature downloads the file having valid file extension and having cmis property configured in the Property column which have the value mention in the Value column. After downloading the file from CMIS server our application updates that property value using new value configured in the New Value property.

Let’s take the example which will help us in understanding the property. CMIS server contains 15 documents but 10 of them are valid as per our confgured file extension. The property is configured as “cm:author” and value is configured as “Ephesoft”, then only that document out of 10 documents which satisfy the cmis property “cm:author” and its value “Ephesoft” will be downloaded by the application and that document cmis property “cm:author” will be updated to New Value configured.

 

Configuration

User can specify the CMIS server configuration in the batch class.

CMIS Import.jpg

 

Configurable propertyType of valueValue optionsDescription
Server URLStringNAURL for making connection to CMIS server

e.g. http://localhost:8090/alfresco/service/cmis

UsernameStringNAUser name for authentication to the specified CMIS server.
PasswordStringNAPassword for authentication to the specified CMIS server.
Repository IDStringNACMIS server repository ID.
File ExtensionStringRead onlySupported file extensions which will get downloaded. In version 3.0, application supports only PDF and tiff files.
FolderStringNAFolder name on the CMIS server from where files need to be downloaded.
PropertyStringNAThis property is used to specify the cmis property which should be used to download file from CMIS server URL. Valid documents containing this property with the specified value mentioned below will be marked for selection.

e.g. cmis:name, cm:description, cm:title, cm:author

ValueStringNAThis property contains the value for the property mentioned above. This key value pair decides which document will get downloaded.
New ValueStringNAThis specifed the new value to be updated after downloading file from the cmis server of the specified cmis property. This is to ensure that same document doesn’t get downloaded again.

 

Cron job expression

For cron job scheduler: Please update the following property file {Application}\WEB-INF\classes\META-INF\dcma-cmis-import\cmis-import.properties for cron job.

cmisImport.cronxpression=0 0/15 * ? * *

Default value for this property is set to every 15 mins by default.

 

Disabling/Enabling CMIS import functionality

For enabling/disabling CMIS import functionality please uncomment/comment the following line at {Application}\applicationContext.xml

<import resource=”classpath:/META-INF/applicationContext-dcma-cmis-import.xml” />

Default: CMIS import is disabled.

 

Screenshots for Configuration

Screenshot for CMIS folder:

CMIS Folder.jpg
Screenshot for CMIS document:

CMIS Document.jpg
Screenshot for CMIS properties:

CMIS Properties.jpg

Screenshot for CMIS repository information:

CMIS Repository.jpg

URL for fetching repository information in alfresco:

http://{server}:{port}/alfresco/service/cmis/index.html

Troubleshooting

S no.Error messagePossible root cause
1Unable to connect to the serverInvalid configuration being used for making connection to the cmis server.
2Error while generating cmis properties xml
  • Either {Ephesoft Application} is not access to write the properties on the disk.
  • Either network path is unable to connect while writing the file.

Document Info Display

Overview

Document Tree info on the Review-Validation Screen is now configurable based on a new tag introduced in the batch xml. This tag is named as “Document Display Info”.

Whatever is the value set in this tag, that value is displayed in document tree. The value of the tag can be confidence score, confidence threshold, document name, document description or any customer specific data as well.

DocumnetDisplayInfoTag.jpg

DocumentDisplayInfoScreen.jpg

If no value is provided for ‘<DocumentDisplayInfo>’ than default value will be shown which is currently ‘Document Type Name’ i.e. “Unknown” for the above example.

 

Advantages

  • Users can customize the display information by manipulating the batch xml using custom scripts.
  • Customer specific information can also be displayed.

Dynamic Workflow

Overview

This feature allows the user to create a customize workflow dynamically. I.e. the user will be capable of adding/removing/ordering any module/plugin in the workflow. After alteration in the workflow, the user will be allowed to deploy these changes made to the workflow only after it has validated the workflow by fulfilling the dependencies of individual plugin.

Configuration

Configure Modules

To configure the modules of a particular batch class, the user needs to follow the following are the steps:

  • Choose the batch class for which the user wants to change the workflow from “Batch Class Management” screen and go into its edit options.

BatchClassManagement ModuleTab.jpg

  • Under the “Modules” tab in the Edit view of that batch class, there is a button “Configure” on the top right corner. Clicking on that button takes the user to a screen where they can add/delete/re-order any module.

ConfigureModules.jpg

  • In this view, the user can see the following:
    • Selected Modules”: the list of selected modules for the workflow.
    • Available Modules”: the list of available modules.
    • Add New Module” button: to add a new module to the available modules list.
    • Remove” button: to remove any of selected modules from the selected modules list.
    • Add” button: to add any selected module from available modules list to selected modules list. By default the added module will be placed at the bottom of selected modules list.
    • Up” button: to move up in order any selected module in selected modules list. The user can select multiple modules at once and each module will be moved one place up each time the button is clicked.
    • Down” button: to move down in order any selected module in selected modules list. The user can select multiple modules at once and each module will be moved one place down each time the button is clicked.
    • Ok” button: to apply changes locally.
    • Reset” button: to reset the state of selected modules list.
    • Cancel” button: to cancel the module configuring action and move to previous screen.
  • Any newly added module would initially be empty.

Configure plugins

Likewise the module configure functionality, there is a plugin configure functionality. This functionality allows the user to add/delete/re-order the plugins in a particular module of a batch class.

To configure plugins for a particular module of a batch class, the user needs to follow the following steps:

  • Select any particular module of a batch class.

PluginListingTab.jpg

  • Under the “Plugins Listing” tab in the Edit view of that Module, there is a button “Configure” on the top right corner. Clicking on that button takes the user to a screen where they can add/delete/re-order any plugin.

ConfigurePlugins.jpg

 

  • The functionality of above view is similar to the “Configure Modules” view. With “Add”, “Remove”, “Up”, “Down”, “Ok”, “Reset” and “Cancel” buttons having the common functionalities from “Configure Modules” view.
  • Apart from the common functionality, below is the additional functionalities for the “Configure Plugins” view are:
  • Dependency highlight
    • Whenever a plug-in is selected in the available list (currently CMIS EXPORT), all its dependencies will be highlighted (currently CREATE MULTIPAGE FILES) in the same list.

ConfigurePluginsDependency.jpg

  • Warning on plugin addition
    • While adding the plug-in to the selected plugins list using the add button, if all the dependencies of the plugin are NOT already present in selected plugins list, following pop up will be displayed.

WarningPluginAddition.jpg

    • In the above pop-up:
      • Yes: pressing this button will add all the dependencies of the plugin along with it to the selected plugins list.
      • No: pressing this button will just add the selected plugin to the selected plugins list ignoring the dependencies.
      • Cancel: pressing this button will cancel the operation.

Validate and Deploy workflow

Validate

This button is present in center bottom portion of the batch class edit view. Pressing this button will check all the rules to be applied on the selected plug-ins. If no violations are found, pop-up will say “Dependencies Validated Successfully” and the “Deploy Workflow” button will now be enabled.

ValidateDependency.jpg

And if there is any violation of dependencies among plugins, the first violation will be reported in the pop up. And “Deploy Workflow” button will remain disabled.

DependencyViolated.jpg

Deploy Workflow

This button is present in center bottom portion of the batch class edit view. Pressing this button will be initially disabled. It will be enabled after having a confirmation from the validate button that the complete workflow has been validated. On successful deployment of batch class, below pop-up is shown.

DeployWorkflow.jpg

Notes

  • The user needs to deploy the batch class each time it makes any change in the workflow by configuring modules or plugins.
  • Validate and Deploy workflow buttons will be disabled while the user is on either of the configure modules or configure plugins view.
  • User can only deploy a validated batch class.
  • Saving a batch class using “Save” or “Apply” button will not deploy the batch class. But deploying a batch class using “Deploy Workflow” button will 1st perform the save batch class function and will then deploy the batch class.

E-mail Import

Overview

This plug-in is responsible for importing the documents present in a defined form from the user’s mail account. User is allowed to configure any mail account as well as the type of documents which the plug-in will support. This configuration is done per batch class. Multiple email accounts can be setup for each batch class.

Configuration

Mail configuration

EmailConfiguration.jpg

Following are the configurable mail account properties:

 

Configurable propertyType of valueValue optionsDescription
UsernameStringA valid email account username.The user account name to be configured with Ephesoft on which the Email Import service will keep a watch.
PasswordStringCorresponding password for the configured usernamePassword for the configured user account.
Server nameStringA valid mail server nameThe name of the mail server to which the configured user account belongs.
Server typeStringA valid mail server typeThe type of the mail server to which the configured user account belongs.
Folder NameStringA valid and existing mail folder nameThe name of the mail folder on which the Ephesoft Email import will be checking
Is SSLCheck Box
  • Checked
  • Unchecked

 

The property that defines whether application will be connecting to mail server using the SSL settings or Non-SSL.
Port numberIntegerA valid port numberThe port number on which the configured mail server type will work.

Configurable Properties file

  • <Ephesoft installation directory>\ Application\WEB-INF\classes\META-INF\dcma-mail-import\mail-import. properties:

 

Configurable propertyType of valueValue optionsDescription
dcma.importMail.cronExpressionStringA valid cron expressionsThe CRON expression defining the look up time for the plug-in, i.e. at what time the plug-in looks for any updates in the configured mail account.
dcma.supported.attachment.extensionStringList of valid file extensionsDefines the supported documents by the plug-in. Multiple entries are separated by a “;”.
  • <Ephesoft installation directory>\ Application\WEB-INF\classes\META-INF\dcma-mail-import\open-office.properties:

 

Configurable propertyType of valueValue optionsDescription
openoffice.serverUrlList of values* ON

  • OFF

 

Server used for connecting to the remote open office server instance. Used in case of connecting to external/remote service.
openoffice.serverPortIntegerA valid and available port number.Port number used for connecting to the open office server instance. Default port is 8100
openoffice.autoStartBoolean* True

  • False

 

If the open office server should be started / connected upon XE starts. Default value is false.
openoffice.homePathStringN-APath to open office installation. If no path is provided, a default value will be calculated based on the operating environment.
openoffice.maxTasksPerProcessIntegerAny valid integer value.Maximum number of simultaneous conversion tasks to be handled by a single open office process. Default value for optimized performance is 50.
openoffice.taskExecutionTimeoutIntegerTimeout for conversion tasks (in milliseconds). Default value for optimized performance is 30 seconds.

Characteristics

  • The functionality/service allows the user to set up any number of mail accounts for gathering data.
  • The user is allowed to configure the account via UI.
  • The functionality/service can support multiple document formats.
  • The functionality/service makes use of the open-office to convert the received data files into application usable formats.
  • The functionality/service is capable of downloading and saving the attachments of a mail.

Steps of execution/working

  • When the plug-in properties have been set up properly, Ephesoft moves ahead with mail downloading by accessing the mail account.
  • Email import service reads the user’s mail configuration from the database, and tries to access the user’s mail account using the configured settings.
  • If the service is able to connect to the user account, it reads all the mails contained in the configured folder.
  • After the service has read the mails, it starts processing multiple mails at a time.
  • Each read mail goes through a three step procedure of processing, downloading, converting and creating a batch for the mail.
  • If any error occurs processing of a mail, the service sends notification mail to mail accounts configured for notification.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no.Error messagePossible root cause
1Unable to convert Email into PDF file.Open office service is either not running or have not been configured correctly
2Error in Server Type Configuration, only imap/pop3 is allowed.Plug-in only supports imap or pop3 server type. Check the user’s account configuration.
3Not able to establish connection.Connection could not be established for the current user’s account configuration.
4Could not find port number. Trying with default value of 995.Port number specified in the user’s configuration is invalid, hence plug-in tries to connect on the default pop3 port.
5Could not find port number. Trying with default value of 993.Port number specified in the user’s configuration is invalid, hence plug-in tries to connect on the default imap port.
6Error while reading mail contentsEither email body or other attachments could not be read and converted
7Not able to process the mail reading.Some error in reading the contents of mail. Open-office could not convert the source file into desired.

Multi-Server Deployment

Overview

This feature allows user to set up multi server environment for Ephesoft. By using multi server environment, two or more servers can run at the same time having shared database and shared folders.

This feature helps the user to increase the throughput via processing the batches using multiple servers.

Steps to setup Multi-Server environment for Ephesoft

User can install Ephesoft through installer on all the machines by following mentioned steps:

At the time of installing Ephesoft on the first machine

  • User need to enter the information as described in below screenshots:
  • On database configuration screen, user should enter the information in below format:

The same information, entered here, should be used while installing Ephesoft on other machines.
MSSQL Configuration.jpg

 

  • Select “No” option on shared folder configuration screen.

SharedFolderConfiguration.jpg

 

  • The path of shared folders should be such that it should be shared over the network, so that it can be accessed by the other machines also.

For example: [\\server_name\path_to_shared_folders]

DestinationFolder.jpg

 

  • Now complete the installation by following the standard steps.

Note*: The installation path should not contain white spaces.

At the time of installing Ephesoft on the other machines

  • User needs to enter the information as described in screenshots below:
  • Database information needs to be same as for first install.

MSSQL Configuration Multiserver.jpg

 

  • Select “Yes” option on shared folder configuration screen.

SharedFolderConfiguration Multiserver.jpg

 

  • On destination folder configuration screen, user should enter the path of shared folder same as entered while installing on first machine.

DestinationFolder Multiserver.jpg

 

  • Now complete the installation by following the standard steps.

Other Configurations

  • User should enable Folder Monitor Service on one of the Ephesoft server. To do this, user should go to following folder <Ephesoft Installation Directory>\Application. In this directory, comment the folder monitor service inapplicationContext.xml file, on the servers where folder monitor service should not run.

<import resource=”classpath:/META-INF/applicationContext-folder-monitor.xml” />

  • The cron expression will have different values at different machines for following cron jobs at

{Application}\WEB-INF\classes\META-INF\dcma-workflows\dcma-workflows.properties

dcma.pickup.cronjob.expression=15 0/1 * ? * *

dcma.resume.cronjob.expression=15 0/1 * ? * *
e.g.

dcma.pickup.cronjob.expression1=15 0/1 * ? * *

dcma.resume.cronjob.expression1=15 0/1 * ? * *

dcma.pickup.cronjob.expression2=45 0/1 * ? * *

dcma.resume.cronjob.expression2=45 0/1 * ? * *

 

  • All the machines running in multi-server environment should be verified by a single Ephesoft license, installed on an Ephesoft server. To do so, following steps need to be followed:
  • License server should be commented only on the machines where the license server is not running. This should be done in applicationContext.xml

<import resource=”classpath:/META-INF/applicationContext-license-server.xml” />

  • The license server host configuration should be changed in the machines where the license server is not running i.e. they all should refer to the machine with Ephesoft license installed and license server running. This is done by changing the ephesoft.license.server.host property in license-client.properties file, to the IP address of the machine on which license server is running.

Sample properties file:

Location: META-INFephesoft-license-client license-client.properties

How to Change Ephesoft’s Port Number

Sometimes it is necessary to change the port that Ephesoft operates on. This is to prevent conflicts with other programs that are using port: 8080

Shut down Ephesoft if it is currently running.

  • Navigate to the web.xml file found at <Ephesoft Installation Directory >\Application\WEB-INF\web.xml

User needs to change this value from:

<context-param>

<param-name>port</param-name>

<param-value>8080</param-value>

</context-param>

To this 🙁 or the desired port number)

<context-param>

<param-name>port</param-name>

<param-value>8090</param-value>

</context-param>

  • Navigate to dcma-batch.properties

Found at <Ephesoft Installation Directory >\Application\WEB-INF\classes\META-INF\dcma-batch\dcma-batch.properties
Then proceed to change this from:

batch.base_http_url=http://localhost:8080/dcma-batches

to this:

batch.base_http_url=http://localhost:8090/dcma-batches

  • Navigate to server.xml file

Found at <Ephesoft Installation Directory >\Ephesoft\JavaAppServer\conf\server.xml

Change the highlighted value below to match the port number in the previous files (8090 or the desired port number)

<Connector port=”8080″ protocol=”HTTP/1.1″

connectionTimeout=”20000″

redirectPort=”8443″ />
Note: The easiest way to do this is to do a find/replace for “8080” and replace all cases of 8080 with 8090 (or the desired port number).

Change this:

<Server port=”8005″ shutdown=”SHUTDOWN”>

To this:

<Server port=”8006″ shutdown=”SHUTDOWN”>

Also change this:

<Connector port=”8009″ protocol=”AJP/1.3″ redirectPort=”8443″ />
To this: (or desired port)

<Connector port=”8019″ protocol=”AJP/1.3″ redirectPort=”8443″ />

Finally, restart Ephesoft.

Ephesoft Web Service

Overview

This document gives detailed explanation of web services exposed by Ephesoft application.

Authenticated client calls code sample

Here is the code for making authenticated client calls via Ephesoft Web Services:-

Credentials defaultcreds = new UsernamePasswordCredentials (“username”, “password”);

client.getState().setCredentials(new AuthScope(“serverName”, 8080), defaultcreds);

client.getParams().setAuthenticationPreemptive(true);

List of API’s exposed in Ephesoft Product

Image Processing Web Service

createSearchablePDF

This API will generate the searchable pdf. It takes the input tif/tiff files and rsp file for processing. Input parameters will used to specify the output pdf is searchable or color.

Web Service urlhttp://{serverName}:{port}/dcma/rest/createSearchablePDF

Input ParameterValuesDescriptions
isColorImageEither “true”/”false”Generates the color pdf if input image is color and value is “true”.
isSearchableImageEither “true”/”false”Generates the searchable pdf if value is “true”.
outputPDFFileNameString value should ends with .pdf extensionOutput pdf file name generated using API.
projectFileString value should ends with .rsp extensionRSP file used as recostar processing.

Checklist:

  1. Input only tiff, tif files for generating searchable pdf.
  2. RSP file is mandatory for generating the searchable pdf.

Sample Input Used:

ephesoft-web-services\create-searchable-pdf.zip
Sample client code using apache commons http client:-

private static void createSearchablePDF() {

HttpClient client = new HttpClient();

// URL for webservice of create searchable pdf

String url = “http://localhost:8080/dcma/rest/createSearchablePDF“;

PostMethod mPost = new PostMethod(url);
// adding file for sending

// Adding tif images for processing

File file1 = new File(“C:\\sample\\sample1.tif”);

File file2 = new File(“C:\\sample\\sample2.tif”);

File file3 = new File(“C:\\sample\\sample3.tif”);

File file4 = new File(“C:\\sample\\sample4.tif”);

// Adding rsp file for recostar for processing

File file5 = new File(“C:\\sample\\Fpr.rsp”);
Part[] parts = new Part[9];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

parts[2] = new FilePart(file3.getName(), file3);

parts[3] = new FilePart(file4.getName(), file4);

parts[4] = new FilePart(file5.getName(), file5);

// adding parameter for color switch

parts[5] = new StringPart(“isColorImage”, “false”);

// adding parameter for searchable switch

parts[6] = new StringPart(“isSearchableImage”, “true”);

// adding parameter for outputPDFFileName

parts[7] = new StringPart(“outputPDFFileName”, “OutputPDF.pdf”);

// adding parameter for projectFile

parts[8] = new StringPart(“projectFile”, “Fpr.rsp”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

InputStream inputStream = mPost.getResponseBodyAsStream();

// output file path for saving result

String outputFilePath = “C:\\sample\\serverOutput.zip”;

// retrieving the searchable pdf file

File file = new File(outputFilePath);

FileOutputStream fileOutputStream = new FileOutputStream(file);

try {

byte[] buf = new byte[1024];

int len = inputStream.read(buf);

while (len > 0) {

fileOutputStream.write(buf, 0, len);

len = inputStream.read(buf);

}

finally {

if (fileOutputStream != null) {

fileOutputStream.close();

}

}

System.out.println(“Web service executed successfully.”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

convertTiffToPdf

This API will generate the pdf for the input tiffs. If 5 input tiffs are provided then 5 pdf will be return using this api. This API will have following parameter for configuration.
Web Service URL : http://{serverName}:{port}/dcma/rest/convertTiffToPdf

Input ParameterValuesDescriptions
inputParamsThis value can be empty. Reference for image magick parameter. http://www.imagemagick.org/script/command-line-options.phpThis are the image magick input parameters used for processing the input and output file.
outputParamsThis value can be empty. Reference for image magick parameter. http://www.imagemagick.org/script/command-line-options.phpThis are the image magick output parameters used for optimizing the output file.
pdfGeneratorEngineEither “IMAGE_MAGICK”/”ITEXT”This will used for pdf generator engine.

Checklist:

  1. Input only tiff, tif files for generating pdf.
  2. If pdfGeneratorEngine is “IMAGE_MAGICK”, than only input params and output params are works.
  3. If Input tiff is multipage tiff than single multipage pdf is generated as output.

Sample Input Used:

ephesoft-web-services\convert-tiff-to-pdf.zip

Sample client code using apache commons http client:-

private static void convertTiffToPdf() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/convertTiffToPdf“;

PostMethod mPost = new PostMethod(url);
// adding image file for processing.

File file1 = new File(“C:\\sample\\sample1.tif”);

File file2 = new File(“C:\\sample\\sample2.tif”);
Part[] parts = new Part[5];
try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

// adding parameter for input params

parts[2] = new StringPart(“inputParams”, “”);

// adding parameter for output params

parts[3] = new StringPart(“outputParams”, “”);

// adding parameter for pdfGeneratorEngine

parts[4] = new StringPart(“pdfGeneratorEngine”, “IMAGE_MAGICK”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

InputStream in = mPost.getResponseBodyAsStream();

// output file path for saving results.

String outputFilePath = “C:\\sample\\serverOutput.zip”;

// retrieving the searchable pdf file

File f = new File(outputFilePath);

FileOutputStream fos = new FileOutputStream(f);

try {

byte[] buf = new byte[1024];

int len = in.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = in.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

splitMultipageFile

This API will break the pdf and multipage tiff into single page tiff. This will use the image magick and ghost script for splitting the input file. This API will have following parameter for configuration.
Web Service URL : http://{serverName}:{port}/dcma/rest/splitMultipageFile

Input ParameterValuesDescriptions
inputParamsFor Image MagicK:This value can be empty. Reference for image magick parameter. http://www.imagemagick.org/script/command-line-options.phpFor Ghost Script:This value should not be empty. Reference for ghost script input parameter :[#Output_device http://ghostscript.com/doc/8.54/Use.htm#Output_device]This parameter will used for both image magick and ghost script.
outputParamsFor Image MagicK:This value can be empty. Reference for image magick parameter. http://www.imagemagick.org/script/command-line-options.phpThis are the image magick output parameters used for optimizing the output file.
isGhostscriptEither “true”/”false”This parameter is used to specified the weather ghost script is using for breaking the pdf/multipage tiff into single page tiff.

Checklist:

  1. Input only tiff and pdf file only.
  2. If “isGhostscript” is “true”, than only input params will works and file only break PDF files.
  3. If “isGhostscript” is “false”, than input params and output params will works.

Sample Input Used:

ephesoft-web-services\split-multipage-file.zip

Sample client code using apache commons http client:-

private static void splitMultiPageFile() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/splitMultipageFile“;

PostMethod mPost = new PostMethod(url);
File file1 = new File(“C:\\sample\\sample.pdf”);

File file2 = new File(“C:\\sample\\sample.tif”);
Part[] parts = new Part[5];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
parts[2] = new StringPart(“inputParams”, “gswin32c.exe -dNOPAUSE -r300 -sDEVICE=tiff12nc -dBATCH”);

parts[3] = new StringPart(“isGhostscript”, “true”);

parts[4] = new StringPart(“outputParams”, “”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

InputStream in = mPost.getResponseBodyAsStream();

File file = new File(“C:\\sample\\serverOutput.zip”);

FileOutputStream fos = new FileOutputStream(file);

try {

byte[] buf = new byte[1024];

int len = in.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = in.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

System.out.println(“Web service executed successfully..”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

createMultipageFile

This API will create the multipage tif/pdf using “Image MagicK”, “IText” and “GhostScript”. This API works only for tif/tiff files and provided XML file for input parameters. This API will have following parameter for configuration.
Web Service URL :

http://{serverName}:{port}/dcma/rest/createMultiPageFile

 

Input ParameterValuesDescriptions
imageProcessingAPIEither “IMAGE_MAGICK” /”GHOSTSCRIPT”/”ITEXT”This parameter is used for generating pdf using image_magick , itext and ghost script.
pdfOptimizationParamsThis value should not be empty. Reference for ghost script input parameter :[#Output_device http://ghostscript.com/doc/8.54/Use.htm#Output_device]This are the ghost script output parameters used for optimizing the output file.
multipageTifSwitchEither “ON”/”OFF”This parameter is used for generating multipage tif along with multipage pdf.
pdfOptimizationSwitchEither “ON”/”OFF”This switch is used for generated optimized pdf.
ghostscriptPdfParametersThis value should not be empty. Reference for ghost script input parameter :[#Output_device http://ghostscript.com/doc/8.54/Use.htm#Output_device]This are the ghost script parameter used for creating multipage pdf.

Checklist:

  1. Input only tiff file for processing and xml file for inputs.
  2. If “imageProcessingAPI” is “GHOSTSCRIPT”, than only ghostscriptPdfParameters will works.
  3. If “pdfOptimizationSwitch” is “ON”, than pdfOptimizationParams will works.

Sample Input Used:

ephesoft-web-services\ create-multipage-file.zip

Format for XML:
<WebServiceParams>

<Params>

<Param>

<Name>imageProcessingAPI</Name>

<Value>GHOSTSCRIPT</Value>

</Param>
<Param>

<Name>pdfOptimizationSwitch</Name>

<Value>on</Value>

</Param>

<Param>

<Name>pdfOptimizationParams</Name>

<Value>-q -dNODISPLAY -P- -dSAFER -dDELAYSAFER — pdfopt.ps</Value>

</Param>
<Param>

<Name>multipageTifSwitch</Name>

<Value>on</Value>

</Param>

<Param>

<Name>ghostscriptPdfParameters</Name>

<Value>-dQUIET -dNOPAUSE -r300 -sDEVICE=pdfwrite -dBATCH</Value>

</Param>

</Params>

</WebServiceParams>
Sample client code using apache commons http client:-

private static void createMultiPage() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/createMultiPageFile“;

PostMethod mPost = new PostMethod(url);
// Adding XML file for parameters

File file1 = new File(“C:\\sample\\WebServiceParams.xml”);

// Adding tif file for processing

File file2 = new File(“C:\\sample\\sample1.tif”);

File file3 = new File(“C:\\sample\\sample2.tif”);
Part[] parts = new Part[3];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

parts[2] = new FilePart(file3.getName(), file3);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

InputStream inputStream = mPost.getResponseBodyAsStream();

// Retrieving file from result

File file = new File(“C:\\sample\\serverOutput.zip”);

FileOutputStream fos = new FileOutputStream(file);

try {

byte[] buf = new byte[1024];

int len = inputStream.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = inputStream.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

System.out.println(“Web service executed successfully..”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(statusCode + ” *** ” + mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

Classification Web Service

classifyImage

This API classifies the input image as per batch class identifier provided. This API will depend on the three plugin for completion “CREATE_THUMBNAILS_PLUGIN”, “CLASSIFY_IMAGES_PLUGIN” and “DOCUMENT_ASSEMBLER_PLUGIN”. If any batch class doesn’t have those plugin than classify image api will not work.

Web Service URL : http://{serverName}:{port}/dcma/rest/classifyImage

 

Input ParameterValuesDescriptions
batchClassIdThis value should not be empty and it should be batch class identifier as like BC1.This parameter is used for providing batch class identifier on which classify image will perform.

Sample Input Used:

ephesoft-web-services\classify-image.zip

Checklist:

  1. Input file should be single page tif/tiff file only.
  2. batchClassId should be valid batch class identifier and must have the “CREATE_THUMBNAILS_PLUGIN”, “CLASSIFY_IMAGES_PLUGIN” and “DOCUMENT_ASSEMBLER_PLUGIN”.

Sample client code using apache commons http client:-

private static void classifyImage() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/classifyImage“;

PostMethod mPost = new PostMethod(url);
// Adding tif file for processing

File file1 = new File(“C:\\sample\\US-Invoice.tif”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding parameter for batchClassId

parts[1] = new StringPart(“batchClassId”, “BC1”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if(mPost != null) {

mPost.releaseConnection();

}

}

}

classifyHOCR

This API will classify the input HOCR as per batch class identifier provided. This API will depend on the following plugins “SEARCH_CLASSIFICATION_PLUGIN”, “DOCUMENT_ASSEMBLER_PLUGIN” and the learning done on the batch class. If any batch class doesn’t have those plugins than classify hocr will not work.

Web Service URL : http://{serverName}:{port}/dcma/rest/classifyHOCR

 

Input ParameterValuesDescriptions
batchClassIdThis value should not be empty and it should be batch class identifier as like BC1.This parameter is used for providing batch class identifier on which classify HOCR will perform.

Checklist:

  1. Input file should be html file only.
  2. batchClassId should be valid batch class identifier and must have the “SEARCH_CLASSIFICATION_PLUGIN” and “DOCUMENT_ASSEMBLER_PLUGIN”.

Sample Input Used:

ephesoft-web-services\classify-hocr.zip

Sample client code using apache commons http client:-

private static void classifyHocr() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/classifyHocr“;

PostMethod mPost = new PostMethod(url);
// Adding HTML file for processing

File file1 = new File(“C:\\sample\\US-Invoice.html”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding parameter for batchClassId

parts[1] = new StringPart(“batchClassId”, “BC1”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

classifyMultiPageHOCR

This API will classify the input HOCR as per batch class identifier provided. This API will depend on the following plugins “SEARCH_CLASSIFICATION_PLUGIN”, “DOCUMENT_ASSEMBLER_PLUGIN” and the learning done on the batch class. If any batch class doesn’t have those plugins than classify hocr will not work.

Web Service URL : http://{serverName}:{port}/dcma/rest/classifyHOCR

 

Input ParameterValuesDescriptions
batchClassIdThis value should not be empty and it should be batch class identifier as like BC1.This parameter is used for providing batch class identifier on which classify HOCR will perform.

Checklist:

  1. Input file should be zip file containing HTML’s in it.
  2. batchClassId should be valid batch class identifier and must have the “SEARCH_CLASSIFICATION_PLUGIN” and “DOCUMENT_ASSEMBLER_PLUGIN”.

Sample client code using apache commons http client:-

private static void classifyMultiPageHocr() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/classifyMultiPageHocr“;

PostMethod mPost = new PostMethod(url);
// Adding ZIP file for processing

File file1 = new File(“D:\\sample\\New folder.zip”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding parameter for batchClassId

parts[1] = new StringPart(“batchClassId”, “BC1”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + “***” + responseBody);

mPost.releaseConnection();

catch (FileNotFoundException e) {

e.printStackTrace();

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

}

}

classifyBarcodeImage

This API is used to classify the input image as per specified batch class. Image file should have barcode and barcode value should be document type which is present in the batch class.

Web Service URLhttp://{serverName}:{port}/dcma/rest/classifyBarcodeImage

 

Input ParameterValuesDescriptions
batchClassIdThis value should not be empty and it should be batch class identifier as like BC1.This parameter is used for providing batch class identifier on which classify HOCR will perform.

Checklist:

  1. Input file should be tif/tiff file only.
  2. batchClassId should be valid batch class identifier and must have the “BARCODE_READER_PLUGIN” .

Sample client code using apache commons http client:-

private static void classifyBarcodeImage(){

HttpClient client = new HttpClient();
String url = “http://locahost:8080/dcma/rest/classifyBarcodeImage“;

PostMethod mPost = new PostMethod(url);
// Adding image file for processing the barcode classification

File file1 = new File(“C:\\sample\\US-Invoice.tif”);

Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding batchClassId for which barcode classification to be perform.

parts[1] = new StringPart(“batchClassId”, “BC1”);

MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

Extraction Web Service

extractKV

This API will extract the document level fields for the corresponding Key Value pattern provided using input XML. This API will take the HOCR file as input. If the Key Value pattern is not found in the HOCR file than it will create the empty document level fields.

Web Service URL : http://{serverName}:{port}/dcma/rest/extractKV

Batch Class List >>Recostar Mail Room [BC1] >>Application-Checklist >>Invoice Date >>New KV Extraction
KV Extraction.jpg

 

Input ParameterValuesDescriptions
AdvancedKVEither “true”/”false”This parameter is used to specifying the KeyValue extraction is perform by advanced key value or not.
LocationTypeThis value should be one of the following:TOPRIGHTLEFTBOTTOMTOP_RIGHTTOP_LEFT,BOTTOM_LEFTBOTTOM_RIGHTThis parameter will fetch the Value pattern of the particular key pattern on the location provided.
NoOfWordsShould be IntegerThis parameter is used for specify in case of AdvancedKV is false. This parameter is used for adding number word of RIGHT location in the result of the value pattern found in the HOCR.
KeyPatternThis value should not be empty.This value should be valid regex expression.This is used for verify the Key pattern present in given HOCR.
ValuePatternThis value should not be empty.This value should be valid regex expression.This is used for verify the Value pattern present in given HOCR for that particular Key Pattern.
KVFetchValueThis value should be one of the following:ALLFIRSTLASTThis parameter is used to specify whether application needs to fetch all, first or last value pattern found.
MultiplierThis value should be float and should be in between 0 to 1This value is used to multiply with confidence for updating the confidence of the fields extracted using advanced KV.
LengthThis value should be integerFor getting length value use Ephesoft Admin Screen as display screen shot above
WidthThis value should be integerFor getting width value use Ephesoft Admin Screen as display screen shot above
XoffsetThis value should be integerFor getting xoffset value use Ephesoft Admin Screen as display screen shot above
YoffsetThis value should be integerFor getting yoffset value use Ephesoft Admin Screen as display screen shot above
hocrFileNameThis value should be stringThis value should be having HOCR file name passing for processing in XML file format.

Check List:

  1. For using Advance KV user should have admin access to fetch the accurate value of Length, Width, Xoffset and Yoffset. Before using AdvancedKV, please test the image with Ephesoft Admin Screen and note the values of Length, Width, Xoffset, Yoffset and LocationType for the particular KeyValue pattern.
  2. If AdvancedKV is true than NoOfWords is not use and all other parameters is used.
  3. If AdvancedKV is false than NoOfWords, KeyPattern, ValuePattern and LocationType will work.

Sample Input Used:

ephesoft-web-services\extractkv.zip
Format for XML:

<ExtractKVParams>

<Params>

<AdvancedKV>true</AdvancedKV>

<LocationType>BOTTOM_LEFT</LocationType>

<NoOfWords>0</NoOfWords>

<KeyPattern>APPLICATION</KeyPattern>

<ValuePattern>[a-zA-Z]{10,15}</ValuePattern>

<KVFetchValue>ALL</KVFetchValue>

<Multiplier>1</Multiplier>

<Length>384</Length>

<Width>251</Width>

<Xoffset>284</Xoffset>

<Yoffset>105</Yoffset>

</Params>

</ExtractKVParams>
Sample client code using apache commons http client:-

private static void extractKV() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/extractKV“;

PostMethod mPost = new PostMethod(url);
// Adding XML for the input.

File f1 = new File(“C:\\sample\\extractKV.xml”);

// Adding HOCR for processing.

File f2 = new File(“C:\\sample\\Application-Checklist.xml “);

Part[] parts = new Part[3];

try {

parts[0] = new FilePart(f1.getName(), f1);

parts[1] = new FilePart(f2.getName(), f2);

parts[2] = new StringPart(“hocrFileName”, f2.getName());
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mPost.getResponseBodyAsString();

// Generating result as responseBody.

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

 

extractFixedForm

This API extracts the document level fields from the given RSP file and the image provided. This image should be tif/png.

Web Service URL :

http://{serverName}:{port}/dcma/rest/extractFixedForm

 

Input ParameterValuesDescriptions
colorSwitchEither “ON”/”OFF”This parameter is used for extracting the data from color image or black and white image.
projectFileThis value should not be empty and should have valid recostar project file name.This is used for HOCRing the image file using project file associated.

Format for project file :

<_Project MajorRevision=”6″ MinorRevision=”0″ Timeout=”180000″>

<_Collection Name=”Libraries”>

<_Library Type=”Dll” BaseName=”ImageProcess”/>

<_Library Type=”Dll” BaseName=”ImageProcess2″/>

<_Library Type=”Dll” BaseName=”FormIdent”/>

<_Library Type=”Dll” BaseName=”Recognition”/>

</_Collection>

<FormOperator Name=”Operator” SetupImageFileName=”” ProjectID=”0″ DefaultFormType=”Voting_Pharmacy” ExternalFormType=”” Country=”USA” FormRegistration=”Off” FormReading=”true” FormGeometry=”0 0 0 0 0 0″ ResultCoordinates=”OriginalImage” ResultImage=”Off” ResultGraphicalObjects=”false” PassThroughID=”Ignore” DiagnosticsMode=”OnError” DiagnosticsFileName=””>

<ImageSequence2Operator Name=”ImageProcessing” SetupImageFileName=”” RegisterImage=”false” DiagnosticsMode=”OnError” DiagnosticsFileName=”” ConfigurationFileName=”” Geometry=”0 0 1488 0 0 1019″>

<LoadImageOperator Name=”ImageSourceOperator” FileName=”” FileFormat=”Unknown” Resolution=”ReadFromFile” UnifyResolution=”false” RepairResolution=”false” AutoRotate=”false” IgnorePalette=”true” ScaleToGray=”0″/>

<ExtractGrayFromRgbOperator Name=”ColorFilterOperator” LumaRed=”0.299″ LumaGreen=”0.587″ LumaBlue=”0.114″/>

<BinarizeEdgeAdaptiveOperator Name=”BinarizeOperator” EdgeThreshold=”80″ DoubleResolution=”false”/>

<_Collection Name=”BinaryImageSequence”>

<DetectPaperAreaOperator Name=”DetectPaperArea” KeepBlackFrame=”false” SafetyClass=”Medium” DetectTextSkew=”false”/>

</_Collection>

</ImageSequence2Operator>

<_Collection Name=”Forms”>

<FormRecoOperator Name=”Voting_Pharmacy” SetupImageFileName=”” SetupImageWidth=”125.98″ SetupImageHeight=”86.27″>

<_Collection Name=”RecoOperators”>

<IcrField Name=”Field1″ Zone=”440 1151 11269 796 0 1″ ReaderSelection=”Voter” Orientation=”Normal” SyntaxMode=”Alphanumerical” Font=”Unknown” NumberOfLines=”1″ HandprintHeight=”5.50″ HandprintPitch=”5.00″ HandprintMinConfidence=”100″ MachinetypeHeight=”Unknown” MachinetypePitch=”Unknown” MachinetypeMinConfidence=”100″ LogicalContext=”On” TrigramMode=”On” DictionaryFileName=”” DictionaryMode=”Incomplete” DictionaryCandidates=”Words” CharacterSet=”” Pattern=”^[$%*+,\-.0-9:;<=>?A-Z\\a-z]*$” LeftBoundaryHandling=”On” TopBoundaryHandling=”On” RightBoundaryHandling=”On” BottomBoundaryHandling=”On” Classifiers=”” PassThroughID=”None”>

<_Collection Name=”IgnoreAreas”/>

</IcrField>

<IcrField Name=”Field2″ Zone=”593 1930 11803 796 0 1″ ReaderSelection=”Voter” Orientation=”Normal” SyntaxMode=”Numerical” Font=”Unknown” NumberOfLines=”1″ HandprintHeight=”5.50″ HandprintPitch=”5.00″ HandprintMinConfidence=”100″ MachinetypeHeight=”Unknown” MachinetypePitch=”Unknown” MachinetypeMinConfidence=”100″ LogicalContext=”On” TrigramMode=”On” DictionaryFileName=”” DictionaryMode=”Incomplete” DictionaryCandidates=”Words” CharacterSet=”” Pattern=”^[$*+,\-.0-9\\]*$” LeftBoundaryHandling=”On” TopBoundaryHandling=”On” RightBoundaryHandling=”On” BottomBoundaryHandling=”On” Classifiers=”” PassThroughID=”None”>

<_Collection Name=”IgnoreAreas”/>

</IcrField>

</_Collection>

</FormRecoOperator>

</_Collection>

<FormGenerator Name=”Generator”/>

</FormOperator>

</_Project>
Sample for XML:

<WebServiceParams>

<Params>

<Param>

<Name>colorSwitch</Name>

<Value>off</Value>

</Param>

<Param>

<Name>projectFile</Name>

<Value>Fpr.rsp</Value>

</Param>

</Params>

</WebServiceParams>
check List:

  1. projectFile should have fields like the fields marked yellow in above.
  2. If colorSwitch is ON than image should be png.
  3. If colorSwitch is OFF than image should be tif/tiff.

Sample Input Used:

ephesoft-web-services\extract-fixed-form.zip

Sample client code using apache commons http client:-

private static void extractFixedForm() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/extractFixedForm“;

PostMethod mPost = new PostMethod(url);
// adding file for sending

File file1 = new File(“C:\\sample\\WebServiceParams.xml”);

File file2 = new File(“C:\\sample\\Voting_Pharmacy.rsp”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mPost.getResponseBodyAsString();

// Generating result as responseBody.

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

extractFieldFromHocr

This API will extract the KV pattern for the given word in the given HOCR.

Web Service URL :

http://{serverName}:{port}/dcma/rest/extractFieldFromHocr

 

Input ParameterValuesDescriptions
fieldValueThis should not be empty.This parameter is used for extracting the Key Value pattern for the word provided.

Check List:

  1. fieldValue is provided for the word on which Key Value pattern would be found.

Sample Input Used:

ephesoft-web-services\extract-field-from-hocr.zip

Sample client code using apache commons http client:-

private static void extractFieldFromHocr() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/extractFieldFromHocr“;

PostMethod mPost = new PostMethod(url);
// Adding HTML for extracting field

File file1 = new File(“C:\\sample\\Application-Checklist.html”);

Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding field value for extracting Key Value Pattern.

parts[1] = new StringPart(“fieldValue”, “APPLICATION”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

extractFuzzyDB

This API will create the document level fields for the document type for the specified batch class for HOCR file passing it.

Web Service URL:

http://{serverName}:{port}/dcma/rest/extractFuzzyDB

 

Input ParameterValuesDescriptions
documentTypeThis should not be empty and valid document type for that batch classThis parameter is used for generating document level fields for defined document type.
batchClassIdentifierThis should not be empty and valid batch class identifierThis parameter used for fetching the information of the document for defined document type
hocrFileThis value should not and empty and should have same name as HOCR file attached for processing.This parameter is used for verifying the HOCR file name.

Check List:-

  1. hocrFile should have same HOCR file name that are passed for processing.
  2. BatchClass having that batchClassIdentifier should have fuzzyDB plugin for processing.
  3. DocumentType should have document level fields for specified document type.

Sample Input Used:

ephesoft-web-services\extract-fuzzy-db.zip

Sample client code using apache commons http client:-

private static void extractFuzzyDB() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/extractFuzzyDB“;

PostMethod mPost = new PostMethod(url);
// Adding HOCR file for processing

File file = new File(“C:\\sample\\Application-Checklist_000.html”);
Part[] parts = new Part[4];

try {

parts[0] = new FilePart(file.getName(), file);

// Adding parameter for docuement type.

parts[1] = new StringPart(“documentType”, “Application-Checklist”);

// Adding parameter for batch class.

parts[2] = new StringPart(“batchClassIdentifier”, “BC1”);

parts[3] = new StringPart(“hocrFile”, “Application-Checklist.html”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mPost.getResponseBodyAsString();

// Generating result as responseBody.

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

barcodeExtraction

This API will create the document level fields for the document type for the specified batch class for barcode in tiff files passing it.

Web Service URL :

http://{serverName}:{port}/dcma/rest/barcodeExtraction

 

Input ParameterValuesDescriptions
documentTypeThis should not be empty and valid document type for that batch classThis parameter is used for generating document level fields for defined document type.
batchClassIdentifierThis should not be empty and valid batch class identifierThis parameter used for fetching the information of the document for defined document type
imageNameThis value should not and empty.On this file extraction operation will be performed.

Check List:-

  1. BatchClass having that batchClassIdentifier should have Barcode Extraction plugin for processing.
  2. DocumentType should have document level fields for specified document type.
  3. Image name should have valid extension i.e. TIF/TIFF.

Sample Input Used:

ephesoft-web-services\barcodeExtraction.zip

Sample client code using apache commons http client:-

private static void barcodeExtraction() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/barcodeExtraction“;

PostMethod mPost = new PostMethod(url);
File file1 = new File(“C:\\sample\\sample.tif”);

// adding xml file for taking input

File file2 = new File(“C:\\sample\\WebServiceParams-barcodeExtraction.xml”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

System.out.println(mPost.getResponseBodyAsString());

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

regularRegexExtraction

This API will extract the document level fields for the document type for the specified batch class.

Web Service URL : http://{serverName}:{port}/dcma/rest/extractFieldsUsingRegex

 

Input ParameterValuesDescriptions
documentTypeThis should not be empty and valid document type for that batch classThis parameter is used for generating document level fields for defined document type.
batchClassIdentifierThis should not be empty and valid batch class identifierThis parameter used for fetching the information of the document for defined document type.
hocrFileNameThis value should not be empty.XML file name for which document level fields will be extracted.

Check List:-

  1. This batch class specified should have Regular Regex plugin defined for it.
  2. DocumentType should have document level fields for specified document type.
  3. HOCR file name should have valid extension, i.e., XML.

Sample Input Used:

ephesoft-web-services/regularRegexExtraction.zip

Sample client code using apache commons http client:-

private static void extractFieldsUsingRegex() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/extractFieldsUsingRegex“;

PostMethod mPost = new PostMethod(url);

File file1 = new File(“C:\\sample\\sample1.xml”);

// adding xml file for taking input

File file2 = new File(“C:\\sample\\WebServiceParams.xml”);

Part[] parts = new Part[3];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

parts[2] = new StringPart(“hocrFileName”, file1.getName());

MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

System.out.println(mPost.getResponseBodyAsString());

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

commonAPIForExtraction

This API is required to set the Header in the client for which the extraction to be performed. Rest of the information for the individual api found above.
Input for Extraction Type:

Pass the name of extraction api that is to use in the client header as shown in following example: BARCODE_EXTARCTIONRECOSTAR_EXTARCTIONREGULAR_REGEX_EXTRACTIONKV_EXTRACTIONFUZZY_DB
Here’s the sample client code using Regular Regex Extraction:
private static void extractFields() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/extractFields“;

PostMethod mPost = new PostMethod(url);
File file1 = new File(“C:\\sample\\input\\sample1.html”);

// adding xml file for taking input

File file2 = new File(“C:\\sample\\input\\WebServiceParams.xml”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

/* Pass the name of extraction api that is to use:

BARCODE_EXTARCTION

RECOSTAR_EXTARCTION

REGULAR_REGEX_EXTRACTION

KV_EXTRACTION

FUZZY_DB*/

Header header = new Header(“extractionAPI”, “REGULAR_REGEX_EXTRACTION”);

mPost.addRequestHeader(header);

mPost.setRequestEntity(entity);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

System.out.println(mPost.getResponseBodyAsString());

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}} catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

OCR Web Service

createOCR

This API will generate the OCR result for the specified sample image file. This API works for tif and png file. If processing the color image application accept the png/tif file as input and for black and white image application processes tif file as input.

Web Service URL :

http://{serverName}:{port}/dcma/rest/createOCR

 

Input ParameterValuesDescriptions
ocrEngineEither “recostar”/”tesseract”This parameter is used for configuring the ocrEngine to be used
colorSwitchEither “ON”/”OFF”This parameter is used
tesseractVersionCurrently application supports“tesseract_version_3”This parameter is used for tesseract version to be used.
cmdLanguageEither “tha”/”eng”This parameter is used for configure the language that has been learnt by tessearact
projectFileThis can be empty in case of tesseract ocrEngineThis parameter is validating the RSP used for OCRing in case of recostar

Check List:-

  1. In case of ocrEngine is recostar, than colorSwitch and projectFile is mandatory parameters.
  2. In case of ocrEngine is tesseract than colorSwitch, tesseractVersion and cmdLanguage are mandatory parameters.
  3. If colorSwitch is ON, input image can be tif/png.
  4. If colorSwitch is OFF than input image should be TIFF.

Sample Input Used:

ephesoft-web-services\create-ocr.zip

File format for XML file:

<WebServiceParams>

<Params>

<Param>

<Name>ocrEngine</Name>

<Value>recostar</Value>

</Param>

<Param>

<Name>colorSwitch</Name>

<Value>off</Value>

</Param>

<Param>

<Name>tesseractVersion</Name>

<Value>tesseract_version_3</Value>

</Param>

<Param>

<Name>cmdLanguage</Name>

<Value>eng</Value>

</Param>

<Param>

<Name>projectFile</Name>

<Value>Fpr.rsp</Value>

</Param>

</Params>

</WebServiceParams>
Format for RSP file:

<_Project MajorRevision=”1″ MinorRevision=”0″ Timeout=”180000″>

<_Collection Name=”Libraries”>

<_Library Type=”Dll” BaseName=”ImageProcess”/>

<_Library Type=”Dll” BaseName=”ImageProcess2″/>

<_Library Type=”Dll” BaseName=”Recognition”/>

</_Collection>

<FullPageOperator Name=”Operator” SetupImageFileName=”” Country=”USA” TextReading=”true” ResultCoordinates=”OriginalImage” ResultImage=”RecoImage” ResultGraphicalObjects=”false” DiagnosticsMode=”OnError” DiagnosticsFileName=””>

<ImageSequence2Operator Name=”ImageProcessing” SetupImageFileName=”” RegisterImage=”true” DiagnosticsMode=”OnError” DiagnosticsFileName=”” ConfigurationFileName=”” Geometry=”-1 0 1683 0 1 2190″>

<LoadImageOperator Name=”ImageSourceOperator” FileName=”” FileFormat=”Unknown” Resolution=”200″ UnifyResolution=”true” RepairResolution=”true” AutoRotate=”false” IgnorePalette=”false” ScaleToGray=”0″/>

<ExtractGrayFromRgbOperator Name=”ColorFilterOperator” LumaRed=”0.299″ LumaGreen=”0.587″ LumaBlue=”0.114″/>

<BinarizeEdgeAdaptiveOperator Name=”BinarizeOperator” EdgeThreshold=”80″ DoubleResolution=”false”/>

<_Collection Name=”BinaryImageSequence”>

<RemoveShadingOperator Name=”RemoveShading” MinRegionWidth=”10.00″ MinRegionHeight=”3.00″/>

<DetectPaperAreaOperator Name=”DetectPaperArea” KeepBlackFrame=”false” SafetyClass=”Medium” DetectTextSkew=”true”/>

<BinaryAutoRotateOperator Name=”AutoRotate” DocumentOrientation=”Unknown” InputOrientation=”MostlyCorrect”/>

<ProtectBarCodesOperator Name=”ProtectBarCodes” SafetyClass=”Medium” SearchRegion=””/>

<RemoveLineSystemOperator Name=”RemoveLineSystem” HorizontalLineLength=”10.00″ VerticalLineLength=”12.00″ DashedLineLength=”30.00″ MaxLineWidth=”1.50″ MaxGapWidth=”1.00″ BoxSeparatorHeight=”4.00″ InvertedRegionWidth=”12.00″ InvertedRegionHeight=”4.00″ LineQuality=”Medium”/>

</_Collection>

</ImageSequence2Operator>

<LayoutOperator Name=”LayoutOperator” FindTextBlocks=”true”/>

<FullPageField Name=”TextField” Zone=”0 0 0 0 0 1″ ReaderSelection=”Voter” Orientation=”Normal” SyntaxMode=”Alphanumerical” NumberOfLines=”TextLineSegments” MachinetypeHeight=”Unknown” MachinetypePitch=”Unknown” MachinetypeMinConfidence=”100″ LogicalContext=”On” TrigramMode=”On” DictionaryFileName=”” DictionaryMode=”Incomplete” DictionaryCandidates=”Words” CharacterSet=”” Pattern=”” PassThroughID=”None”/>

<FormGenerator Name=”Generator”/>

</FullPageOperator>

</_Project>
Sample client code using apache commons http client:-

private static void createOCR() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/createOCR“;

PostMethod mPost = new PostMethod(url);
// adding image file for processing

File file1 = new File(“C:\\sample\\sample1.tif”);

// adding xml file for taking input

File file2 = new File(“C:\\sample\\WebServiceParams.xml”);

// adding rsp file used for creating OCR in case of recostar

File file3 = new File(“C:\\sample\\Fpr.rsp”);
Part[] parts = new Part[3];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

parts[2] = new FilePart(file3.getName(), file3);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

InputStream in = mPost.getResponseBodyAsStream();

// saving result generated.

File outputFile = new File(“C:\\sample\\serverOutput.zip”);

FileOutputStream fos = new FileOutputStream(outputFile);

try {

byte[] buf = new byte[1024];

int len = in.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = in.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

Users and Groups Web Service

getBatchInstanceForRole

This API is used to fetch all batch instance list having accessed by the specified role. This API is GET api, works with web url and client code.
Web Service URL :

http://{serverName}:{port}/dcma/rest/getBatchInstanceForRole/{role}

 

Input ParameterValuesDescriptions
roleThis value should not be empty.This parameter is used for specifying the role name for which batch instance list to be fetched.

Sample client code using apache commons http client:-

private static void getBatchInstanceForRole() {

HttpClient client = new HttpClient();
// URL path to be hit for getting the batch instance list having accessed by the role specified.

String url = “http://localhost:8080/dcma/rest/getBatchInstanceForRoles/admin“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

getBatchClassForRole:

This API is used to fetch all batch class lists having accessed by the specified role. This API is GET api, works with web url and client code.

Web Service URL :

http://{serverName}:{port}/dcma/rest/getBatchClassForRole/{role}

 

Input ParameterValuesDescriptions
roleThis value should not be empty.This parameter is used for specifying the role name for which batch class list to be fetched.

Sample client code using apache commons http client:-

private static void getBatchClassForRole() {

HttpClient client = new HttpClient();
// URL path to be hit for getting the batch class list having accessed by the role specified.

String url = “http://localhost:8080/dcma/rest/getBatchClassForRole/admin“;

GetMethod getMethod = new GetMethod(url);

int statusCode;

try {

statusCode = client.executeMethod(getMethod);

if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

Reporting Web Service

runReporting

This API is used to run reporting on using web services. This web service takes server side installer path as an input and performs synchronizing the report database.
Web Service URL :
http://{serverName}:{port}/dcma/rest/runReporting

 

Input ParameterValuesDescriptions
installerPathThis value should be valid path.This parameter is used for specifying path fie build.xml for reporting present on the server side.

Checklist :-
1. This path should be valid file path and must be server path for the build.xml file.
Sample Input Used:
ephesoft-web-services\run-reporting.zip
Format for inputXML file :
<ReportingOptions>
<installerPath>C:\\testing</installerPath>
</ReportingOptions>
Sample client code using apache commons http client:-
private static void runReporting() {

 HttpClient client = new HttpClient();
 String url = "http://localhost:8080/dcma/rest/runReporting&quot;;
 PostMethod mPost = new PostMethod(url);
 File file1 = new File("C:\\sample\\reporting.xml");
 Part[] parts = new Part[1];
 try {
   parts[0] = new FilePart(file1.getName(), file1);
   MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());
   mPost.setRequestEntity(entity);
   int statusCode = client.executeMethod(mPost);
          if (statusCode == 200) {
                            System.out.println("Web service executed successfully.");
                            String responseBody = mPost.getResponseBodyAsString();
                            System.out.println(statusCode + " *** " + responseBody);
                                              }

else if (statusCode == 403) { System.out.println(“Invalid username/password.”);

                                               }

else { System.out.println(mPost.getResponseBodyAsString());

                                               }
                               } catch (FileNotFoundException e) {
                             System.out.println("File not found for processing.");
                               } catch (HttpException e) {
                                               e.printStackTrace();
                               } catch (IOException e) {
                                               e.printStackTrace();
                               } finally {
                                               if (mPost != null) {
                                                               mPost.releaseConnection();
                                               }
                               }
               }

Batch Instance Management Web Service

restartBatchInstance

This API is used to restart the batch instance from specified module. User can restart those batch instances those are accessible by their role. This API is GET api, works with client code and web url.

Web Service URL :

http://{serverName}:{port}/dcma/rest/restartBatchInstance/{batchInstanceIdentifier}/{restartAtModuleName}

 

Input ParameterValuesDescriptions
batchInstanceIdentifierThis value should be valid batch instance identifier.This parameter is used to specifying the batch instance identifier for which batch instance to be restart.
restartAtModuleNameThis value should not be empty.This parameter is used specifying the module name from where batch to be restart.

Checklist:-

  1. Batch Instance identifier should be valid identifier and having access by the user which are authenticate the web service.
  2. restartAtModuleName this value should valid module name and it can be differing with batch class.

Sample client code using apache commons http client:-

private static void restartBatchInstance() {

HttpClient client = new HttpClient();
// URL path to be hit for restarting batch instance identifier from specified module.

// User can restart only those batch instance having status “ERROR”, “READY_FOR_REVIEW”, “READY_FOR_VALIDATION”, “RUNNING”

String url = “http://{serverName}:{port}/dcma/rest/restartBatchInstance/BI1/Folder_Import_Module“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

deleteBatchInstance

This API is used to delete the batch instance for specified batch instance identifier. This API will delete that batch instance having accessed by the authenticated user.

Web Service URL :

http://{serverName}:{port}/dcma/rest/deleteBatchInstance/{identifier}

 

Input ParameterValuesDescriptions
batchInstanceIdentifierThis value should be valid batch instance identifier.This parameter is used to specifying the batch instance identifier to be deleted.

Sample client code using apache commons http client:-

private static void deleteBatchInstance() {

HttpClient client = new HttpClient();

// URL path to be hit for deleting the batch instance having access to the authenticated user.

// User can delete only those batch instance having status “ERROR”, “READY_FOR_REVIEW”, “READY_FOR_VALIDATION”, “RUNNING”

String url = “http://localhost:8080/dcma/rest/deleteBatchInstance/BI1“;

GetMethod getMethod = new GetMethod(url);

int statusCode;

try {

statusCode = client.executeMethod(getMethod);

if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

restartAllBatchInstance

This API is used to restart all the batch instance having status READY_FOR_REVIEW and READY_FOR_VALIDATION and having access by the authenticated user. This API is GET API, works with client code and web url.

Web Service URL : http://{serverName}:{port}/dcma/rest/restartAllBatchInstance

Checklist:-

  1. Only those batch will restart having status READY_FOR_REVIEW and READY_FOR_VALIDATION which are accessible by authenticated user.

Sample client code using apache commons http client:-

private static void restartAllBatchInstance() {

HttpClient client = new HttpClient();
// URL path to be hit for restarting the batch instance having access to the authenticated user.

// User can restart only those batch instance having status “READY_FOR_REVIEW”, “READY_FOR_VALIDATION”

String url = “http://localhost:8080/dcma/rest/restartAllBatchInstance“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

addUserRolesToBatchInstance

This API is used to adding roles to batch instance identifier. This API takes batch instance identifier and role name as an input and adding it to the database. This API is GET api, works with client code and web browser both.
Web Service URL :

http://{serverName}:{port}/dcma/rest/addUserRolesToBatchInstance/{batchInstanceIdentifier}/{userRole}

 

Input ParameterValuesDescriptions
batchInstanceIdentifierThis value should be valid batch instance identifier.This parameter is used to specifying the batch instance identifier for which roles to be added.
userRoleThis value should not be empty.This parameter is used specifying the role to be added on the specified batch instance identifier.

Sample client code using apache commons http client:-

private static void addUserRolesToBatchInstance() {

HttpClient client = new HttpClient();
// URL path to be hit for adding user roles to batch instance identifier

String url = “http://localhost:8080/dcma/rest/addUserRolesToBatchInstance/BI45/admin“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

Batch Class Management Web Service

importBatchClass

This API is used for importing batch class to the ephesoft. This API takes XML for input parameters and exported batch class data as an input. Exported batch class is in zip format as exported by Ephesoft.
Web Service URL :

http://{serverName}:{port}/dcma/rest/importBatchClass

 

Input ParameterValuesDescriptions
RolesImportedEither “true”/”false”This value is used for importing roles with batch class or not.
EmailAccountsEither “true”/”false”This value is used for importing email accounts with batch class or not.
UseSourceEither “true”/”false”This value is used for saving the information of source batch class to be imported
NameThis value should not be emptyThis value is used to configure the batch class name of the imported batch class.
DescriptionThis value should not be emptyThis value is used to configure the description of the imported batch class.
PriorityThis value should lie in between 1 to 100.This value indicates the priority of batch class.
UseExistingEither “true”/”false”This value is used for overwrite the existing batch class with new batch class.
UncFolderThis value should not be empty and have any string value that specified directory pathThese values specify the UNC folder path for batch class to be imported along with batch class.
ScriptThis tag is configured for ScriptFile to be importedThis tag is configured for which Script file to be imported
FolderThis tag is configured for Folder to be importedThis tag is configured for which folder to be imported along with batch class

Checklist:-

  1. If UseExisting is “true”, existing batch class will be overwriting with the Folders and Script as well as others parameter.
  2. If UseExisting is “false”, new batch class will created and Folders and Scripts will be used as false.
  3. If UseSource is “true”, new batch class will have same Name, Description and Priority as source batch class.
  4. If UseSource is “false”, new batch class will have property like Name, Description and Priority configured.

SampleInputXML:

<ImportBatchClassOptions>
<RolesImported>false</RolesImported>

<EmailAccounts>true</EmailAccounts>
<UseSource>false</UseSource>
<Name>BatchClassName</Name>

<Description>Description</Description>

<Priority>10</Priority>
<UseExisting>true</UseExisting>

<UncFolder>C:\ephesoft-data\Test-UNC</UncFolder>
<BatchClassDefinition>

<Scripts>

<Script>

<FileName>ScriptDocumentAssembler.java</FileName>

<Selected>true</Selected>

</Script>

<Script>

<FileName>ScriptPageProcessing.java</FileName>

<Selected>true</Selected>

</Script>

</Scripts>
<Folders>

<Folder>

<FileName>image-classification-sample</FileName>

<Selected>false</Selected>

</Folder>

</Folders>

<BatchClassModules>

<BatchClassModule>

<ModuleName></ModuleName>

<PluginConfiguration>true</PluginConfiguration>

</BatchClassModule>

</BatchClassModules>

</BatchClassDefinition>

</ImportBatchClassOptions>
Sample client code using apache commons http client:-

private static void importBatchClass() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/importBatchClass“;

PostMethod mPost = new PostMethod(url);

mPost.setDoAuthentication(true);

// Input XML for adding parameter.

File file1 = new File(“C:\\sample\\importbatchclass.xml”);

// Input zip file for importing batch class.

File file2 = new File(“C:\\sample\\BC1_050712_1714.zip”);

Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Batch class imported successfully”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.out.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

exportBatchClass

This API is used for exporting existing batch class. This method will take the batch class identifier and learnt-sample to be exported with the batch class.
Web Service URL :

http://{serverName}:{port}/dcma/rest/exportBatchClass

 

Input ParameterValuesDescriptions
identifierThis value should not be empty and valid batch class identifier.This parameter is used for identifying which batch class is to be exported.
lucene-search-classification-sampleEither “true”/”false”This parameter is used to configure the lucene learnt sample is exported with batch class or not.
image-classification-sampleEither “true”/”false”This parameter is used to configure the image classification sample is exported with batch class or not.

Check List:-

  1. Identifier should be batch class identifier.

Sample client code using apache commons http client:-

private static void exportBatchClass() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/exportBatchClass“;

PostMethod mPost = new PostMethod(url);
mPost.addParameter(“identifier”, “BC1”);

mPost.addParameter(“lucene-search-classification-sample”, “true”);

mPost.addParameter(“image-classification-sample”, “false”);

int statusCode;

try {

statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Batch class exported successfully”);

InputStream in = mPost.getResponseBodyAsStream();

File f = new File(“C:\\sample\\serverOutput.zip”);

FileOutputStream fos = new FileOutputStream(f);

try {

byte[] buf = new byte[1024];

int len = in.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = in.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

getBatchClassList

This API returns all the batch class having accessible by the authenticated user. This API is GET API, works with the client code and web url.
Web Service URL :

http://{serverName}:{port}/dcma/rest/getBatchClassList
Sample client code using apache commons http client:-

private static void getBatchClassList() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/getBatchClassList“;

GetMethod mGet = new GetMethod(url);

int statusCode;

try {

statusCode = client.executeMethod(mGet);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mGet.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mGet.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mGet != null) {

mGet.releaseConnection();

}

}

}

 

getRoles

This API is used to get the roles of the specified batch class. This API is GET API, works with the client code and web url.

Web Service URL :

http://{serverName}:{port}/dcma/rest/getRoles/{batchClassIdentifier}

 

Input ParameterValuesDescriptions
identifierThis value should not be empty and valid batch class identifier.This parameter is used for identifying which batch class roles to be fetched.

Check List:-

  1. Identifier should be batch class identifier.

Sample client code using apache commons http client:-

private static void getRoles () {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/getRoles/BC1“;

GetMethod mGet = new GetMethod(url);

int statusCode;

try {

statusCode = client.executeMethod(mGet);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mGet.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mGet.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mGet != null) {

mGet.releaseConnection();

}

}

}

getAllModulesWorkflowNameByBatchClass

This API will return the module workflow names and the module names of the specified batch class identifier. This API is GET API, works with client code and web url.

Web Service URL :

http://{serverName}:{port}/dcma/rest/getAllModulesWorkflowNameByBatchClass/{batchClassIdentifier}

 

Input ParameterValuesDescriptions
batchClassIdentifierThis value should not be empty.This parameter is used for specifying the batch class identifier for which module name to be fetched.

Sample client code using apache commons http client:-

private static void getAllModulesWorkflowNameByBatchClass() {

HttpClient client = new HttpClient();
// URL path to be hit for getting the module workflow name of the specified batch class identifier

String url = “http://localhost:8080/dcma/rest/getAllModulesWorkflowNameByBatchClass/BC1“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

Uploading a Batch through a Web Service

uploadBatch

This API is for uploading a new batch to a watch folder for a given batch class. It executes the new batch with supplied tif, tiff or pdf files. User need to be authorized to execute a batch for a particular batch class otherwise an error message would be generated. All the files would be copied to the unc folder of the requested batch class with the folder name supplied by the user as input.

Web Service URL :

http://{serverName}:{port}/dcma/rest/uploadBatch/{batchClassIdentifier}/{batchInstanceName }

 

Input ParameterDescriptions
batchClassIdentifierThe identifier of the batch class in which user wishes to upload its batch.
batchInstanceNameThis name with which user wishes to upload the batch.

Check List:-

  1. The value for batchClassIdentifier is compulsory and should be valid with permissions to the user to run the batch on it.
  2. The value for batchInstanceName is compulsory and if left empty then it will send an error.

Sample client code using apache commons http client:-private static void uploadBatch () {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/uploadBatch/{BatchClassIdentifier}/{BatchInstanceName} “;

PostMethod mPost = new PostMethod(url);

// adding image file for processing

File file1 = new File(“C:\\sample\\sample1.tif”);

Part[] parts = new Part[1];

try {

parts[0] = new FilePart(file1.getName(), file1);

MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

String responseBody = mPost.getResponseBodyAsString();

// Generating result as responseBody.

System.out.println(statusCode + “***” + responseBody);

if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

Ephesoft UI Theme Configuration

Overview

Ephesoft UI is now made configurable to use different color themes of user’s choice. Now user can choose any three base colors of his choice and it will be reflected in the whole application by just refreshing the page. Images for a particular theme need to be modified and placed in the respective folder.

Configuration:

Folder Structure

A ‘themes’ folder is added inside the Application containing following files:-

  • theme.less
  • common.css
  • folders of respective themes

UI ThemeConfiguration.jpg

theme.less is the file where all configurations have to be made for color and image path. It contains the following properties which need to be configured:-

  • @base: hexadecimal color codes for the tabs and sub-tabs of the application.

TabsBaseColour.jpg

  • @baseDark: hexadecimal color codes for the buttons and links in the application.

ButtonsBaseColour.jpg

 

  • @baseLight: hexadecimal color codes for backgrounds, selected document or folder and grid tables.

BackgroundColour.jpg

 

  • @gridRowSelectionColor: hexadecimal color code or color name for grid row selection.

GridRowSelectionColour.jpg

 

  • @overlayColor, @overlaySecondColor, @overlayThirdColor, @overlayBorderColor: hexadecimal color code or color name for overlay backgrounds shown on RV Screen and Advanced KV Extraction screen.

To be specific:

@overlayColor – background color of overlay on RV screen, overlay on Table Extraction and overlay generated on selecting a key/value while defining an advanced KV pair

@overlaySecondColor – background color of overlay generated on capturing a key while defining an advanced KV pair

@overlayThirdColor – background color of overlay generated on capturing a value while defining an advanced KV pair

@overlayFourthColor – background color of overlay generated on captured value on Table Extracted on RV Screen

@overlayBorderColor – Color of all overlay boundaries
OverlayBoundaryColour.jpg

 

  • @themeImagePath: path of the image folder for the respective theme. For example:-

Default_theme: default_theme/images/
After making these changes user needs to clear the browser cache and refresh the page to see the reflected changes.

 

Function Keys

Overview

This functionality aims at providing the application user (mainly review operators) to have the flexibility of customizing it according to its own needs by adding shortcuts to RV screen performing a specific operation. The user will be allowed to run some code script as per the need which will be fired just by pressing a key.

 

  • Parameters involved:
    • Method name: defines the name of the method in the script which should be executed upon usage.
    • Key: the function key associated with the method. Can be used as a shortcut.
    • Description: contains the user’s description for the method.

AddFunctionKey.jpg

  • Sample values:

FunctionKeySampleValues.jpg

Characteristics

  • The functionality allows one to customize the RV screen to use shortcut keys performing user defined functions.
  • The user can associate a function key to a particular method specified in the script ‘ScriptFieldValueChange.java‘ present at the location ‘{Ephesoft-Home}\SharedFolders\BCID\scripts’
  • User can run the script either by clicking the button displayed on UI or the function key button available on the Review Validate UI.
  • User can only associate one method to a particular key but same method can be assigned to multiple keys.
  • User can choose all the values from F1 to F11 except F5.
  • The functionality allows user to add description of the script associated to a key.
  • Function keys are document type specific and will only be displayed on RV screen if selected document type has function keys defined for a batch class.

Working

  • When a batch reached Review/Validation stage, user can either press the function key to run a particular method or can press the function key button displayed in the 2nd panel.

FunctionKey RVScreen.jpg

 

  • A dialog box saying ‘Executing script’ will appear. By the time it goes off, user’s script has been executed.

FunctionKey ExecutingScript.jpg

Grid Computing

Overview

Ephesoft supports distributed computing via Grid Computing Enabled Workflows. These batch classes have the feature for transferring the batches from one independent Ephesoft server to another (over the network either internet or intranet), with the assumption that, batch class on the both the system should be same. Each Ephesoft server transfers the data using Web-Services and FTP.

Configuration

Configurable Properties

Following are the list of configurable properties for the above configuration:

  • FTP Configuration

Create the FTP connection for transferring and retrieving file from FTP server.

It uses the following properties for data transmission:

(META-INF\dcma-ftp\dcma-ftp.properties)

 

Configurable propertyType of valueValue optionsDescription
ftp.server.urlStringAny valid url.Default Value: ftp.yourFTPserver.comThe URL for FTP server.
ftp.server.usernameStringAny string.Default Value: ephesoftUsername used for accessing the server.
ftp.server.passwordStringAny string.Default Value: ********Password required for user’s authentication.
ftp.number_of_retriesIntegerAny Integer Value.Default Value: 3Number of retries a client makes if any exceptions occur while transferring or retrieving file from FTP.
ftp.upload_base_dirStringAny valid directory path.Default Value: testFolder location on ftp server in which data has to be uploaded or location from which the data to be downloaded.
ftp.data_timeoutIntegerAny Integer value.Default Value: 600000Maximum time provided for the data transfer. If data is not uploaded or downloaded within the specified time interval, transfer is stopped. Stored in milliseconds.

Assumptions:

  • upload_base_dir property mentioned in the properties file should be present on ftp server.
  • Username and Password are valid.
  • Same Batch Class must be available on all the Ephesoft Server Instances.
  • Dcma-Workflow.properties Configuration

Web Service related configuration need to be provided in dcma-workflow.properties file.

 

Configurable propertyType of valueValue optionsDescription
dcma.batch.status.cronjob.expressionStringAny valid cron expression.Default Value: 0 0/1 * ? * *Cronjob for batch status pulling of remote batches. Defines the time after which a source machine will check the remote location for the result or Batch Instance status.
wb.hostURLStringAny valid http url.Default Value:http://172.16.1.68:8080/dcma/restHost URL in the specified format. i.e. http://LocalHostAdress:port/dcma/rest.
wb.folderPathStringAny valid folder path.Default Value: testServer folder path.

Assumptions:

  • wb.hostURL should be like http://172.16.1.68:8080/dcma/rest.
  • hostURL link should have unique IP Address or user domain name in URL.
  • Wb.folderPath should be same as ftp.upload_base_dir.
  • Web based Configuration

BatchClassManagementBatchClassModuleEdit

EditBatchClassModule.jpg

Upon clicking the Edit button, following screen will be presented where user can configure grid computing properties:
ConfigureGridComputingProperties.jpg

Map the remote URL and remote Batch Class identifier on which module user need to execute on remote server, except in Folder Import Module.

Constraints

  • User can restart batch from the module which is executing on his local system.
  • User cannot restart those batches which are executing remotely.
  • User cannot delete the batch instance if it is transferred to other system. Also if the batch is transferred from the one system to another than none of the user can delete that batch instance.

Troubleshooting

Following are few common error messages received due to mal-functioning of the feature:

 

S no.IssuePossible root cause
1Source directory is nullwb.folderPath and ftp.upload_base_dirBoth the paths should be same and valid
2Destination directory is nullwb.folderPath, ftp.upload_base_dirBoth the paths should be same and valid
3Invalid Connection to FTP serverInvalid attempt to make FTP connection.
4Error in generating output Stream for fileInvalid output file name.
5TargetServerURL is nullCheck the remote URL entry for the batch
6BatchInstanceId is nullCheck database connection or network.
7BatchClassId is nullCheck the remote batch class ID entry for the batch
8SourceServerURL is nullhostURL is not mapped in property files.
9FolderPath is nullfolderPath is not mapped in property files.
10moduleName is nullCheck database connection or network.
11batchName is nullBatch name is not found in batch xml.
12Exception in transferring batch to remote locationAny of the error among 5 to 11 must have caused this.

<html xmlns:v=”urn:schemas-microsoft-com:vml” xmlns:o=”urn:schemas-microsoft-com:office:office” xmlns:w=”urn:schemas-microsoft-com:office:word” xmlns:m=”http://schemas.microsoft.com/office/2004/12/omml” xmlns=”http://www.w3.org/TR/REC-html40“>

<head> <meta http-equiv=Content-Type content=”text/html; charset=windows-1252″> <meta name=ProgId content=Word.Document> <meta name=Generator content=”Microsoft Word 14″> <meta name=Originator content=”Microsoft Word 14″> <link rel=File-List href=”1_files/filelist.xml”> <link rel=Edit-Time-Data href=”1_files/editdata.mso”> <link rel=themeData href=”1_files/themedata.thmx”> <link rel=colorSchemeMapping href=”1_files/colorschememapping.xml”> <style> </style> </head>

<body lang=EN-US style=’tab-interval:.5in’>

Installer Upgrade

Overview

This document describes step by step procedure of upgrading Ephesoft on a machine. This document should be referenced when user is going to upgrade Ephesoft Application through Ephesoft installer setup.

 

Steps of execution

Following are the steps to upgrade an existing installation with Installer version 3.0.3.0 or later-

 

  • Points to check before upgrade –

a. If user is running Ephesoft by JavaAppServer (Apache tomcat) then first stop server before upgrading.
b. Any file and folder inside Ephesoft install directory (The path where previous installation exists) like dcma-all.log and other such files and folders must be closed before upgrading.
c. Make sure that Windows installation drive(C drive in most of cases) has enough free space so that installer setup file can be properly extracted. Same in the case of drive where previous Ephesoft application is installed.

 

  • First Step –

Run command prompt as administrator and execute the command as shown in below screen shot –
Installer upgrade cmd.jpg
In this command “D:\Ephesoft_3.0.3.2.msi” is the path of Ephesoft installer setup on user’s system. This command will initiate Ephesoft installer setup and below screen will be displayed to user –
Installer upgrade welcome screen.jpg
Click on ‘Next’ button. Following screen will be displayed to user –
Installer upgrade license agreement.jpg
Accept Ephesoft end user agreement by clicking check box on UI and then click ‘Next’ button.

 

  • Second step 

After License agreement UI following screen will be displayed –
Installer upgrade upgrade.jpg
If this checkbox is checked before clicking next button then after successful upgrade when user start Ephesoft, database patch will execute on database and if this checkbox is unchecked before clicking next button then database patch will not execute.
Uncheck this checkbox if user is upgrading an Ephesoft installation which is in multi-server environment and database patch is already executed on common database by some other Ephesoft installation.
For standalone Ephesoft application this checkbox must be checked before clicking ‘Next’ button.

 

  • Third step –

After clicking ‘Next’ button on Ephesoft Upgrade Installation screen, following screen will be displayed if some files or folders are in use or installer setup has no sufficient privileges to perform Windows service operations –
Installer upgrade popup.jpg
Close all such files and folders and re-run installer setup with admin privileges.

 

  • Fourth step –

After completing all these steps below screenshot will be displayed-
Installer upgrade ready install.jpg
Click on install button and installer will do rest of the work.

 

  • Fifth step –

After complete installation, install the latest license of Ephesoft and please restart the machine.

 

  • Sixth step –

Start EphesoftEnterprise and EphesoftWebService services manually as installer can’t start these services. Upgrade procedure by installer setup is completed now.

Integrating External Application with Ephesoft Validation

Overview

Ephesoft allows its customers to develop external modules or applications and integrate them to work together with Ephesoft. This document gives in depth details on how to integrate external modules with Ephesoft Validation module.

External modules/applications are technology independent and can be written in any language HTML/JavaScript or GWT or JSP/Servlet or combination of both.

Review Validation Screen

AppsOnRV.jpg

There are shortcut keys as well as buttons defined to fire an External Application for a batch on the Review Validate UI. (App1, App2, App3, App4 as seen in the above screenshot)

When the shortcut key or the App button is pressed, the integrated application or module will be displayed in the Review Validation screen as Modal Window.

In the image below, the Right Hand side shows the image of the documents whereas the Left Hand side shows the integrated application/module.
External Application

ExternalApplication.jpg
IMPORTANTbatch.xml must be updated by the external application. Ephesoft simply loads the batch.xml.

Configuration

Please follow the below steps to integrate external application with Ephesoft validation:

  • Let’s assume external application is available at http://localhost:8080/dcma/ExternalApp.html.
  • Login to the Ephesoft Admin Module (Batch Class Management).
  • Navigate to Batch Class -> Modules -> Review/Validate Document module -> Validate Document plugin.

ReviewDocumentPluginConfiguration.jpg

Closing the external application modal window:

As one can see in the screenshot of the External Application on the review-validate screen, there were two buttons present for closing the external application (earlier).

The OK button provided the functionality of refreshing and displaying refreshed content (on the review-validate screen) for the batch that has been modified through the external application.

The CLOSE button simply got back to the review-validate screen without refreshing the content of the batch, assuming that no changes have been made to the batch.xml by the external application.

The extra clicks that the user used to do in case of refreshing the screen or closing the pop up dialog window have been removed now.

Now both the functionalities (i.e. refreshing the screen after batch.xml updates or closing the pop up window without refreshing the Review Validate Screen) will be the implemented via third party applications. Ephesoft application provides a handle to the externally integrated applications by means of which both applications can communicate.

Ok and cancel buttons have been removed. External applications need to copy the below mentioned method in their code. External applications need to invoke this method on the respective button (ok or close) calls which it has implemented. External applications will signal Ephesoft to perform respective operations by passing the appropriate operation string in the method argument. Accepted operation strings are listed in the table below.

Method code for GWT based applications:
private native void fireEvent(String operation) /*-{

window.top.postMessage(operation,”*”);

}-*/;
Method code for Javascript based applications:
function fireEvent(var operation) {

window.top.postMessage(operation, “*”);

}

The action performed by us in accordance to the argument passed to this method in the external application’s code:

 

Argument Passed by External applicationResult on our application
“Save”The dialog box containing the external application on the review-validate screen closes and the changes made in batch.xml get reflected on the screen(The functionality previously provided by the OK button on the dialog box)
“Cancel”The dialog box containing the external application on the review-validate screen closes, without refreshing the RV screen. (The functionality previously provided by the CLOSE button on the dialog box)
Any other stringNo change (Dialog box will not disappear)

How to use the External Application:

It is expected that the external application would play around with the data contents of the documents (i.e. the batch) presently being displayed on the review validate screen. (For which the external app is fired)

Hence Ephesoft Application provides the external application the following two parameters appended in its URL, using which the external app can fetch/modify/delete the contents of the batch:

  • Path of the batch.xml for the current batch: The batch.xml contains the information regarding the batch. Ephesoft application provides the batch.xml path, which the external app can parse and play around with as and when it likes.

The parameter is specified in the URL by: “batch_xml_path”
Encoding of Batch xml path parameter:

The batch xml path is encoded using java.net.URLEncoder and UTF-8 encoding.

 

  • Document Identifier: The identifier of the document in focus is also passed onto the external application. The parameter is specified in the URL by: “document_id”

Sample URL fired for an external app by Ephesoft:

{Ext. App URL}&document_id={Document Identifier}&batch_xml_path={Path of batch.xml}&ticket={Security Token}

Or

{Ext. App URL}?document_id={Document Identifier}&batch_xml_path={Path of batch.xml}&ticket={Security Token}

 

External Application and Security:

Ephesoft application generates a dynamic token for every External Application window which is opened via Ephesoft Application. This token is sent to the External App by appending another parameter “ticket” in the External App URL. Once this token is received by the External App, it can hit the below provided URL for checking the authenticity of the token. (Note: Please send the token as received by the application)

http://{EphesoftServerIP}:{port}/dcma/authenticate?ticket={ticket}

Ephesoft Server in response will send a status code as to whether this ticket is valid or not:

Status Code 200 – Authorized

401 – Unauthorized

The token is issued as soon as the user opens the external application window. A valid token becomes invalid once:

  • Token has already been sent to the Ephesoft server for authentication.
  • One hour after this token has been issued.

Configuring the Title of External Applications through the admin UI

Ephesoft application has eliminated the Application URL from the title of External App Window. Title is now configurable through the Admin UI.

 

Learning

Overview

A well-formed set of HOCR xml files which are placed in a hierarchical structure such as: Batch Class > Document type > Page type, is used for the purpose of registering few standard HOCR xml documents with Lucene search engine. This process is called learning because it is like feeding the xml files into Lucene memory so that when a batch instance comes, it can be compared with these memorized documents to find a best match. Note that learning is a one-time-process for all batch instances and is really helpful in making the process fast.

Steps of learning

  • First create document type that Ephesoft has to recognize. Suppose user has created HUD-1 document type in batch class BC1.
  • Edit BC1 and click on ‘Generate Folder’ button.

LearningGenerateFolder.jpg

  • Browse “Ephesoft-install-dir\SharedFolders\BC1\lucene-search-clasification-sample” folder. There will be following three subfolder –
    • HUD-1_First_Page
    • HUD-1_Last_Page
    • HUD-1_Middle_Page

LearningSharedFolders.jpg
The first and last page of the document goes in the HUD-1_First_Page and HUD-1_Last_Page respectively and all other pages of the document go in HUD-1_Middle_Page.

In the provided sample, image 000001 is the first page and image 000002 is the last page of the document type HUD-1. All other pages are different document types. The sample does not have middle pages for HUD-1 document type.

 

  • Click the Learn Files button.

Learning LearnFiles.jpgThe Ephesoft software is now ready and has learned the document type of HUD-1.

Troubleshooting

Following are few common error messages received due to mal-functioning of the learning:

 

S no.Error messagePossible root cause
1Problem occurred while learning/Problem learning files.
  • Network connection failure.
  • Multiple networks connected to system. E.g. LAN and WLAN connected at a same time.
  • License is not installed or invalid.
  • Tomcat is not up.

 

Multi word data population in a DLF

Overview

This functionality allows a user to select multiple words from the image on Validation screen to be populated into the selected DLF. It works similar to the usual word processing applications where Control is used to select multiple values at once and Shift to select an area of values.

Characteristics

  • Control functionality allows the user to select multiple words from distant places in the image.
  • Shift functionality allows a user to select entire text occurring between 2 selected words.
  • User can also define a rectangular area over the 3rd panel image using mouse right click and the entire text present inside or overlapping the defined area will be populated in the selected DLF.
  • This functionality can be helpful if for a particular field, the data in the image is divided in various parts.
  • Functionality works similar to the usual word processing applications.

Steps of execution

  • When a user opens a batch on Validate screen, he can use any of the above defined methods to populate multiple data in selected DLF.
  • For this, user first needs to select the DLF (Document Level Field) to which the data needs to be populated. After that, user can use any of the 3 options:
    • Control functionality:

User needs to press the Ctrl key first and then select multiple values from distant places in the image. Selected values will be displayed populated in the selected DLF (See screenshot)

MultiwordDataInDLF.jpg

In above example, user first presses Ctrl key and then using mouse, he clicks ‘Field1’ and then ‘456’. Resultant data gets populated in the selected DLF.

Note: Data which is selected after Ctrl key press will only be displayed populated in DLF. Once a user releases Ctrl key, no other data selected later (even after pressing Ctrl key again) will be concatenated to the existing data.

    • Shift functionality:

User needs to press the Shift key first and then select 2 values from distant places in the image. All data occurring in between the selected values will be displayed populated in the selected DLF (See screenshot)

MultiwordInDLF ShiftFunctionality.jpg

In above example, user first presses Shift key and then using mouse, he clicks ‘Field1’ and then ‘456’. Resultant data gets populated in the selected DLF.

    • Area selection:

To populate all the data occurring inside a particular area, user can draw a rectangle over the 3rd panel image. All overlapping data will be populated in the selected DLF as displayed in screenshot.

MultiwordInDLF AreaSelection.jpg

Performance Reporting

Overview

This standalone module enables the user to generate execution reports on the basis of the batch class, user etc. Reports can be calculated per module, plugin or user basis. There are two ways of generating reports

  • Through UI
  • Manually through scripts

The module aggregates the report-data on the basis of user’s choice parameters.

Admin has the options of generating reports for module, plugin or all users.

Admin can:

  • Get reports per page for a Workflow Type for a specified time.
  • Get reports per page for a User for a specified time.
  • Get total records for a Workflow Type for a specified time
  • Get total records for a User for a specified time.

Configuration

Property File Configurations

Property file: ‘{Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-data-access/dcma-db.properties’:-

 

Configurable propertyType of valueValue optionsDescription
hibernate.connection.usernameStringRootDatabase’s username
hibernate.connection.passwordString****Database’s password
hibernate.connection.urlStringjdbc:mysql://localhost:3306/reportDBMS specific connection URL
hibernate.connection.driver_classStringcom.mysql.jdbc.DriverDBMS specific class driver
hibernate.show_sqlMulti selectTrue/FalseOption to sql command to logs file
hibernate.dialectStringInteger valueDBMS specific query dialect

Property File: ‘{Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-performance-reporting/dcma-report-db.properties’

This property file needs to be configured for connecting to reports database. By default, it is configured to point to the reports database created by Ephesoft. If user wants to use different database, this property file needs to be configured accordingly:

 

Configurable propertyType of valueValue optionsDescription
hibernate.connection.passwordStringNAPassword of reports database.
hibernate.connection.usernameStringNAUsername of reports database.
hibernate.connection.urlStringNAConnection string for reports database. Example: jdbc:mysql://localhost:3306/reports
hibernate.connection.driver_classStringFor MySQL: com.mysql.jdbc.DriverFor mssql:jdbc:jtds:sqlserver://localhost;databaseName=reports;user=ephesoft;password=Password##Driver required for connecting to the database. Example: for MySQL, it should be set tocom.mysql.jdbc.Driver

Property file: ‘{Ephesoft-Home}/WEB-INF/classes/META-INF/application.properties’:-

 

Configurable propertyType of valueValue optionsDescription
report.ant.buildfile.pathString{Report}\ephesoft-reporting\build.xmlThis property defines the absolute path of build.xml file that is bundled with the stand alone java program for reports.
enable.reportingString
  • True
  • False
Whether or not reports UI will be displayed to the user. If set to True, reports UI will be displayed otherwise not.

Reports generation

Reports can be generated by user either by user interface provided for reporting or by executing reporting scripts manually.

Reports generation from UI

In order to run Ephesoft reporting UI, user needs to click on syncDB button.

Select any option from modules, plugin and user, start and end date, then click on GO.

Reporting data will be displayed in tabular format.
ReportsGeneration.jpg

Manually reports generation through scripts

For running the stand alone java program to load the report, please perform the following steps for the first time installation:-

  • Application needs a new database to load the report data. Run the “init-data.sql” found at

“{Report}\ephesoft-reporting\META-INF\dcma-performance-reporting\init-data.sql”.
Change the database username and password in these files:

  • {Report}\ephesoft-reporting\META-INF\dcma-performance-reporting\hibernate.cfg.xml. This will point to the new database “reports” just created.
  • {Report}\ephesoft-reporting\META-INF\dcma-performance-reporting\hibernate-dcma.cfg.xml. This will point to the existing Ephesoft database.
  • Set the environment variable ANT_HOME and corresponding CLASSPATH in environment variables. Not required in case user want to run the ant from {ANT_HOME}/bin/* directory.

Perform the following steps every time the scripts are required to run:-

  • Navigate to the installation directory. Run “ANT” with either of these targets:
    • ANT start-report-generator
      • This starts the scheduler based service. This is the default behavior of the script.
    • ANT stop-report-generator
      • This stops the scheduler based service if already started.
    • ANT manual-report-generator
      • This runs the specified service for just one time and exits.
  • The ANT Command window needs not to be closed if the scheduled service needs to be run in background. If closed, the scheduler service is stopped. In order to start it again, please delete the following directory “C:\ephesoft-data\report-data\lock” and then invoke the ant start target.
  • The scheduler service is scheduled to run at 1 a.m. every day by default.

User can change the configuration from this file: “{Report}\ephesoft-reporting\META-INF\dcma-performance-reporting\dcma-report-scheduling.properties” file.

Read Only Document Level Fields

Overview

When a batch class owner needs to extract a document level field but do not want to change its extracted value at any stage and also do not want to allow any user (review /validation user) to change its value, then the owner can make that field ‘read-only’.

  • By making a field read-only, batch class admin can restrict its value to be non-modifiable at any stage.
  • When a field is set as read-only, no regular expression can be applied for that field. KV-Extraction and advanced KV-Extraction rules can be applied to the read-only fields as it can be done for any regular field.
  • This feature is useful for those document level fields which do not require user intervention at validation stage .Fields whose values are so obvious and do not need to be changed.

Setting the read-only flag

User can make a field as read-only by selecting the checkbox ‘isReadonly’ which is displayed on editing a document level field as shown in screenshot below:

ReadOnlyDLF.jpg
After selecting the read-only attribute, the selected fields will be non-editable in review Validate screen as shown in screenshot below(Invoice Date and State was non-editable):

ReadOnlyDLF RVScreen.jpg

 

Recostar Design Studio

Overview

Ephesoft allows extraction of fixed form documents that contain zonal Barcode, OCR, ICR and OMR fields. Ephesoft makes it easy to configure fixed forms by following below easy steps:

  • Create/Edit Batch class.
  • Create/Edit Document Type. Please note that fixed form processing naming convention does not allow spaces in the form (Document Type). Forms should not start with numeric values either.
  • Create/Edit Index fields. Please note that fixed form processing does not allow spaces in the field name. Fields should not start with numeric values.
  • Design a Recostar Project file, .RSP.
  • Copy the RSP file into the batch class folder, \\SharedFolders\{Batch Class}\recostar-extraction.
  • Assign RSP file to document type using Ephesoft Admin Module (Edit Document Type)

Recostar Design Studio

RecoStarDesignStudio.exe can be found in “{Ephesoft- Home}\native\RecostarPlugin\RecoStarDesignStudio” in default configuration. This tool allows the Ephesoft admin to define the zonal areas for OCR, ICR or OMR field extraction.

Steps of execution

Please follow the steps below. Here, Tax return form called BOE is used. The same techniques can be applied to any fixed forms application.

Sample files covered here can be downloaded.

  • Batch Class
  • Recostar Project File definition
  • Launch RecoStarDesignStudio.exe

RecostarDesignStudio Run.jpg

  • Select New Project

RecostarDesignStudio SelectNewProject.jpg

 

  • Select Single Form

RecostarDesignStudio SelectSingleForm.jpg

  • Give a name and select a project location. In this example BOE is used as a project name and project is saved at C:\Fixed Form Projects folder. User can save them to any location. Later, application will move these files to Batch class folder.

RecostarDesignStudio NewProjectFileName.jpg

  • Project name as BOE is seen below.

RecostarDesignStudio ProjectName.jpg

  • Select some sample images so that zones can be drawn.

RecostarDesignStudio DrawZones.jpg

 

  • Right click on the image area and select the desired images by using “Add Files”

RecostarDesignStudio AddFiles.jpg

  • In this example, one image is selected.

RecostarDesignStudio SelectFile.jpg

 

  • Once the image is selected, click “Next”.

RecostarDesignStudio WorkingImageFiles.jpg

  • On this step, select the country user is operating in. Multiple countries can be selected. Once user clicks on the USA which is default country, menu option (…) appears where user can select more countries.

RecostarDesignStudio MandatoryParameter.jpg

 

  • Project file can now be created with one form and one ICR field. Click Finish to proceed.

RecostarDesignStudio ReadyToInstall.jpg

  • Initial Project has been created.

RecostarDesignStudio InitialProject.jpg

 

  • Rename the form to doc type called BEO. This has to match with the document type in Ephesoft Admin module.

RecostarDesignStudio RenameForm.jpg

  • Rename index field to Year.

RecostarDesignStudio RenameIndexField.jpg

 

  • Re-Arrange zone

RecostarDesignStudio RearrangeZone.jpg

  • Field is now renamed to Year

RecostarDesignStudio FieldRenamed.jpg

 

  • Zoom into the image

RecostarDesignStudio ZoomImage.jpg

  • After zooming in, arrange the zone so it covers only the value 2006

RecostarDesignStudio ArrangeZones.jpg

 

  • Add “Remove Lines” options

RecostarDesignStudio AddRemoveLines.jpg

  • Remove Lines option is added

RecostarDesignStudio RemoveLinesAdded.jpg

 

  • Add new field called Account Number

RecostarDesignStudio AddFieldAccountNumber.jpg

  • New field has been added

RecostarDesignStudio AccountNumberFieldAdded.jpg

 

  • Zoom into the new field

RecostarDesignStudio ZoomIntoAccountNumberField.jpg

  • Adjust field Location/Zone

RecostarDesignStudio AddLocationField.jpg

 

  • Set field properties. i.e. Font should be Machine Type.

RecostarDesignStudio SetFieldProperties.jpg

  • Run selected images to see the results

RecostarDesignStudio RunSelectedImages.jpg

 

  • View Results.

RecostarDesignStudio ViewResults.jpg

  • Copy the RSP file into the batch class folder, <Ephesoft-Shared-Folder>\{Batch Class}\recostar-extraction. Please note that BOE.rsp file needs to be on the <Ephesoft-Shared-Folder>\{Batch Class}\recostar-extraction folder. There should not be another folder layer.
  • Map RSP file to Document Type. Logon to the Ephesoft Admin Module, Batch Class Management. Navigate to the Document Type BOE and select BEO.rsp file from the Fixed Form Project File drop down menu. If BEO RSP file can’t be seen, please check if an extra layer of folder is not copied in step 2 above.

RecostarDesignStudio MapRSPToDocumentType.jpg

  • Save the batch class and run documents through the system.

Review Document

Overview

This document defines the operations that can be done on a batch in review state. During this stage, the user can perform various operations on the batch like classifying, splitting, copying, deleting the document etc. The document also explains the various plugin properties that can be set for batches that are in review state. Whenever batch comes to review state, its status is changed to “Ready for Review” and it needs to be reviewed by the user manually, if it is not reviewed automatically (i.e. its confidence score is less than the specified threshold). After the review, the batch processing continues until it reaches validation stage.

To access a Batch in Review state, user needs to hit the URL {http://localhost:8080/dcma/ BatchList.html}, Click on Review sub tab and then click on a batch displayed in the grid.

BatchList ReviewSubTab.jpg

This will open a batch on Review screen (see below screenshot)

BatchList BatchDetailTab.jpg

Configuration

Please follow the below steps to set Review plugin properties:

  • Login to the Ephesoft Admin Module (Batch Class Management).
  • Navigate to Batch Class -> Modules -> Review Document module -> Review Document plugin.

BatchClassManagement ReviewDocumentPlugin.jpg

Properties:

 

Configurable propertyType of valueValue optionsDescription
External Application SwitchList of values
  • ON
  • OFF

 

This field is used to develop external modules or applications and integrate them to work together with Ephesoft.
X DimensionIntegerInteger valueTo specify the x-dimension of the external application in pixels.
Y DimensionIntegerInteger valueTo specify the y-dimension of the external application in pixels.
URL1 Title, URL2 Title, URL3 Title and URL4 TitleStringN-AThese properties hold titles for the external application.
URL1 (Ctrl+4), URL2 (Ctrl+7), URL3 (Ctrl+8) and URL4 (Ctrl+9)StringN-ATo fire the specified External Application for a batch on the Review Validate UI. URL of the external application is specified here which can be accessed via shortcut keys (Ctrl+4, etc.) as well as by pressing buttons defined. (App 1, App4, App2, App 3 as can be seen in the below UI).

Review screen with External Application switch ON will look something like this:

BatchDetail ExternalAppSwitchON.jpg

Features List

There are three panels in the review screen.

  • Left-most-panel or 1st panel – showing document tree having all classified and unclassified Ephesoft documents in a batch.
  • Middle-panel or 2nd panel – contains the review panel and the thumbnail images of next and previous document pages. Review panel contains the list of document types and the list of documents available for merging. Thumbnail images of the previous and the following document, w.r.t current selected document, are shown.
  • Right-most-panel or 3rd panel shows the enlarged image of the selected document.

BatchDetail ThreePanels.jpg
In the document tree, there are classified as well as unclassified documents. Classified documents are marked by a green tick on its right-top. Unclassified documents are marked by a red question-mark on its right-top.

Clicking on shortcuts will open a table of shortcuts for operations like saving, splitting, merging, deleting the document etc.

BatchDetail KeyboardShortcuts.jpg
The top-most-panel contains the buttons/shortcuts for splitting, deleting, rotating the document, etc.

BatchList TopPanel.jpg

 

Fresh Installation Steps

Overview

This document describes step by step procedure of installing Ephesoft on a machine. This document should be referenced when user is going to install Ephesoft through Ephesoft installer setup for very first time.

 

Steps of execution

Following are the steps for fresh installation with Installer version 3.0.3.4 or later:

 

  • Run command prompt as administrator and execute the command as shown in below

screenshot:
Fresh install cmd.png
In this command “D:\Ephesoft_3.0.3.4.msi” is the path of Ephesoft installer setup on user’s system.

 

  • The above command will initiate Ephesoft installer setup on machine and below screen will be displayed to user:

Fresh install welcome screen.png

 

  • On clicking ‘Next’ button, following screen will be displayed to user:

Fresh install license agreement.png

 

  • Accept Ephesoft end user agreement by clicking check box on UI and then click ‘Next’ button. Following screen will be displayed:

Fresh install dotnet.png
If .Net framework 4.0 is not installed on machine, ‘Next’ button remains disable and a button with title ‘Download’ will appear on UI. Click this button to download .Net framework 4.0. Clicking this button opens appropriate web link from where user can download .Net framework 4.0. Download and install .Net framework 4.0 and then re-run Ephesoft installer setup.
If .net framework is installed on machine above screen will appear with ‘Next’ button enabled. Simply click on next button in this case.

 

  • After clicking ‘Next’ button on .NET  Framework installation screen, following

screen will be displayed to user:
Fresh install prerequisites.png
If C++ redistributables are not installed on system, Installer setup will first install C++ redistributables and then enable ‘Next’ button. Now click on ‘Next’ button.

 

  • After clicking ‘Next’ button on Ephesoft Prerequisites Installation screen, following screen will be displayed to user:

Fresh install select database.png
o Select radio button 1 if user either wants to install a new instance of MySQL server or want to configure existing MySQL installation.
Installer will just update properties file with MySQL server configuration information but will not create Application and report database on MySQL server. Run {Ephesoft-install-directory}\ Dependencies\MySQLSetup\ephesoft-mysql-config.sql manually on remote or local MySQL server.
Fresh install select database mysql.png
Following are the screens through which user can install new MySQL server instance on his machine:
Fresh install select install mysql.png
Fresh install install mysql config.png
Following are the screens through which user can configure existing MySQL server:
Fresh install select configure mysql.png
Fresh install configure mysql config.png
Please enter all server configuration information correctly installer will use this information in properties files. Make sure DB names should unique.
o Select radio button 2 if user either want to install a new instance of MS SQL server or want to configure existing MS SQL installation (local\remote).
If MS SQL server is installed on local machine then installer can configure local or remote MS SQL server and if MS SQL server is not installed on local machine and user want to configure remote MS SQL server instance then user has to create Application and report database manually. Run {Ephesoft-Home}\ Dependencies\MsSQLSetup\ ephesoft-mssql-config.sql manually on remote MS SQL server.
Fresh install select database mssql.png

 

  • Following are the screens through which user can install new MS SQL server instance  on his machine:

Fresh install install or configure mssql.png
Fresh install install mssql config.png
When user will click next button then Ephesoft installer setup will start MS SQL installation in silent mode.
Following are the screens through which user can configure existing MS SQL server :
Fresh install configure mssql.png
Fresh install configure mssql config.png
Please enter all server configuration information correctly installer will use this information in properties files. Make sure DB names should be unique.

 

  • After database configuration or installation following screen will be displayed to user:

Fresh install registration info.png
Fill all the information and then click on ‘Next’ button.

 

  • After Ephesoft Registration Information following screen will be displayed to user:

Fresh install shared folder config.png
o Select radio button 1 if user is not creating multi-server environment or does not has existing shared folder. Selecting this radio button will also install shared folder along with application setup.
Fresh install shared folder no.png
Fresh install destination folder.png
o Select radio button 2 if user is creating multi-server environment or user has existing shared folder. Selecting this radio button will not install shared folderIn this case Shared folder path is the path of parent directory of Shared Folders directory. For example if existing Shared Folders directory is inside a folder named as share and this folder is shared on system named EPHESOFT then shared folder path will be \\EPHESOFT\share not\\EPHESOFT\share\SharedFolders .
Fresh install shared folder yes.png
Fresh install shared destination folder.png

 

  • After completing all these steps below screenshot will be displayed to user:

Fresh install ready to install.png
Click on install button and installer will do rest of the work.
Fresh install read me.png
Fresh install finish.png

 

  • After complete installation, please restart the machine.

User Management

Overview

This module is responsible for handling the user’s connectivity to the application. It handles authentication as well as authorization process for the user.

Configuration

Login configuration

For a user to login into Ephesoft, user need to configure “server.xml” file located in the {Ephesoft-Home}\JavaAppServer\conf folder.

The admin will configure a tag named “Realm” located in server.xml. The tag can be located at following structure:

<Server>

<Service>

<Engine>

<Host>

<Context >

<Realm />

</Context>

</Host>

</Engine>

</Service>

</Server>
The realm tag has many configurable parameters. The use and need of these parameters depends upon the type of authentication server used by the user.

Various implementations can be configured at once. Please refer to this link for configuring the Realms according to your requirements. [#Standard_Realm_Implementations Tomcat Realms]

The commonly used realm configurations are:

The user which tries to login to the application, the username and password are verified against the mentioned authentication server using the specified configuration properties.

Ephesoft user roles handling

Ephesoft, on the basis of the roles of the user logged in to the application, decides the following:

  • Batch classes the user will be allowed to view on the batch class management view.
  • Batch instance the user will be allowed to view batch instance management view.
  • Folders the user is allowed to view on the folder management view.
  • Scanner profiles and other configurations on the web scanner view.

The user roles for the logged in user will be verified from authentication server configured in the property file{Ephesoft-Home}\Application\WEB-INF\classes\META-INF\dcma-user-connectivity\user-connectivity.properties:

Following is the list of the configurable properties for this properties file

 

  • LDAP configurable properties

 

Configurable propertyType of valueValue optionsDescription
user.ldap_urlStringA valid URL to connect to LDAP server.The connection URL for LDAP type configuration in the “ldap://<server_address>:<port_number>” format.
user.ldap_configStringN-AClass name for the LDAP context factory.
user.ldap_domain_component_nameStringN-AThe domain component name for the LDAP configuration.
user.ldap_domain_component_organizationStringN-AThe domain component organization name for the LDAP configuration.
user.ldap_usernameStringA valid username to connect and access LDAP server.The username of the user responsible for interacting with the server. Only required if LDAP is configured.
user.ldap_passwordStringA valid password to connect and access LDAP server.The password of the user responsible for interacting with the server. Only required if LDAP is configured.
user.ldap_user_baseStringN-AThe relative path under which all the users information will be located. This path will be relative to the domain components specified by the user.
user.ldap_group_baseStringN-AThe relative path under which all the groups/roles information will be located. This path will be relative to the domain components specified by the user.
  • MS-Active Directory configuration

 

Configurable propertyType of valueValue optionsDescription
user.msactivedirectory_urlStringA valid URL to connect to Active directory server.The connection URL for msactivedirectory type configuration in the “ldap://<server_address>:<port_number>” format.
user.msactivedirectory_configStringN-AClass name for the user-connectivity configuration.
user.msactivedirectory_context_pathStringN-AThe directory path where the intended user resides.
user.msactivedirectory_domain_component_nameStringN-AThe domain component organization name for the msactivedirectory type configuration.
user.msactivedirectory_domain_component_organizationStringN-AThe domain component organization name for the msactivedirectory type configuration.
user.msactivedirectory_user_nameStringA valid username to connect and access Active directory server.The username of the user responsible for interacting with the server. Only required if Active Directory is configured.
user.msactivedirectory_passwordStringThe password corresponding to connect and access Active directory server.The password of the user responsible for interacting with the server. Only required if Active Directory is configured.
user.msactivedirectory_group_search_filterStringN-AThis filter defines can have |(OR), &(AND) and !(NOT) e.g. ((!(cn=a*))(|(cn=ephesoft*)(&(cn=b*)))
  • Tomcat specific configuration

 

Configurable propertyType of valueValue optionsDescription
user.tomcatUserXmlPathStringN-AThe directory path where the tomcat configuration xml file resides.
  • Connection choosing configuration

 

Configurable propertyType of valueValue optionsDescription
user.connectionList of values
  • 0
  • 1
  • 2

 

The type of connection user wants for the application.

  1. for LDAP
  2. for MS Active Directory
  3. for Tomcat

 

Examples

LDAP

Realm

<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”ldap://localhost:389″

connectionName=”cn=Manager,dc=ephesoft,dc=com”

connectionPassword=”********”

userPattern=”cn={0},ou=people,dc= ephesoft,dc=com”

roleBase=”ou=groups,dc= ephesoft,dc=com” roleName=”cn”

roleSearch=”uniqueMember={0}”/>

user-connectivity.properties

  • user.ldap_url=ldap://localhost:389
  • user.ldap_config=com.sun.jndi.ldap.LdapCtxFactory
  • user.ldap_domain_component_name= ephesoft
  • user.ldap_domain_component_organization=com
  • user.ldap_username=cn=Manager,dc=ephesoft,dc=com
  • user.ldap_password=*******
  • user.ldap_user_base=ou=people
  • user.ldap_group_base=ou=groups
  • user.connection=0

MS-Active Directory

Realm

<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”[ldap://172.16.1.68:389 ldap://localhost:389]”

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=”cn=Users,DC=ephesoft,DC=com”

userSearch=”(&(objectClass=person)(sAMAccountName={0}))”

userSubtree=”true”

roleBase=”cn=Users,DC=ephesoft,DC=com”

roleName=”cn”

roleSubtree=”true”

roleSearch=”member={0}” referrals=”follow” />

user-connectivity.properties

  • user.msactivedirectory_url=ldap://172.16.0.191:389
  • user.msactivedirectory_config=com.sun.jndi.ldap.LdapCtxFactory
  • user.msactivedirectory_context_path=CN=Users
  • user.msactivedirectory_domain_component_name= ephesoft
  • user.msactivedirectory_domain_component_organization=com
  • user.msactivedirectory_user_name=CN=Administrator,CN=Users,DC= ephesoft,DC=com
  • user.msactivedirectory_password=*******
  • user.connection=1 (for fetching group and user from active directory)

Multiple realm example

<Realm className=”org.apache.catalina.realm.CombinedRealm” >

<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”[ldap://172.16.1.68:389 ldap://172.16.1.68:389]”

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=”cn=Users,DC=ephesoft,DC=com”

userSearch=”(&(objectClass=person)(sAMAccountName={0}))”

userSubtree=”true”

roleBase=”cn=Users,DC=ephesoft,DC=com”

roleName=”cn” roleSubtree=”true”

roleSearch=”member={0}” referrals=”follow” />
<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”[ldap://172.16.1.68:389 ldap://172.16.1.68:389]”

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=”ou=test1,DC=ephesoft,DC=com”

userSearch=”(&(objectClass=person)(sAMAccountName={0}))”

userSubtree=”true”

roleBase=”ou=test1,DC=ephesoft,DC=com” roleName=”cn”

roleSubtree=”true” roleSearch=”member={0}” referrals=”follow”/>
<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”[ldap://172.16.1.68:389 ldap://172.16.1.68:389]”

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=”ou=test,DC=ephesoft,DC=com”

userSearch=”(&(objectClass=person)(sAMAccountName={0}))”

userSubtree=”true” roleBase=”ou=test,DC=ephesoft,DC=com”

roleName=”cn” roleSubtree=”true”

roleSearch=”member={0}” referrals=”follow” />

</Realm>

 

Global realm example

<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”ldap://172.16.1.68:3268″

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=” DC=ephesoft,DC=com”

userSearch=” (sAMAccountName={0})”

userSubtree=”true” roleBase=”ou=test,DC=ephesoft,DC=com”

roleName=”cn” roleSubtree=”true”

roleSearch=”member={0}” referrals=”follow” />

 

Validate Document

Overview

This document defines the operations that can be done on a batch in validation state. During this stage, the user can perform various operations on the batch like classifying, splitting, copying, deleting the document etc., along with the ability to change the value of the document level fields which have been extracted. The document also explains the various plug-in properties that should be set for batches that are in validation state. With the help of these properties Ephesoft facilitates fuzzy search option, suggestion box facility, and development of external modules or applications and integrate them to work together with Ephesoft. Whenever batch comes to validate state, its status is changed to “Ready for Validation” and it needs to be validated by the user manually, if it is not validated automatically.

Below is the screen shot of the BatchList page which contains a tab for the list of all the batches present in “READY_FOR_VALIDATION” state?

BatchList ValidationSubTab.jpg

Configuration

Please follow the below steps to set the validation plug-in properties:

  • Login to the Ephesoft Admin Module (Batch Class Management).
  • Navigate to Batch Class -> Modules -> Validate Document module -> Validate Document plugin.

BatchClassManagement ValidateDocumentPlugin scrollup.jpg

BatchClassManagement ValidateDocumentPlugin scrolldown.jpg
Properties:

 

Configurable propertyType of valueValue optionsDescription
Field Value Change Script SwitchList of values
  • ON
  • OFF

 

If the switch is enabled, then every time the field values are changed, the field value change script runs.Default OFF.
Fuzzy Search SwitchList of values
  • ON
  • OFF

 

If the switch is enabled, then fuzzy search facility is enabled.Default ON.
Suggestion box SwitchList of values
  • ON
  • OFF

 

If the switch is enabled, then suggestions for alternate values for document level fields are available.Default OFF.
External Application SwitchList of values
  • ON
  • OFF

 

This field is used to develop external applications and integrate them to work together with Ephesoft.Default OFF.
Fuzzy Pop Up X Dimension (in px)IntegerInteger valueTo specify the x-dimension of the fuzzy search result pop-up in pixels.
Fuzzy Pop Up Y Dimension (in px)IntegerInteger valueTo specify the y-dimension of the fuzzy search result pop-up in pixels.
Validation Script SwitchList of values
  • ON
  • OFF

 

If the switch is enabled, then whenever the batch in validation state is saved, the specified script runs.Default OFF.
External Application X Dimension (in px)IntegerInteger valueTo specify the x-dimension of the external application in pixels.
External Application Y Dimension (in px)IntegerInteger valueTo specify the y-dimension of the external application in pixels.
URL1 Title, URL2 Title, URL3 Title and URL4 TitleStringN-AThese properties hold titles for the external application.
URL1 (Ctrl+4), URL2 (Ctrl+7), URL3 (Ctrl+8) and URL4 (Ctrl+9)StringN-ATo fire the specified External Application for a batch on the Review Validate UI. URL of the external application is specified here which can be accessed via shortcut keys (Ctrl+4, etc.) as well as by pressing buttons defined. (App1, App4, App2, App 3 as can be seen in the below UI).

External application on Review Validate Screen

BatchDetail ExternalAppOnRVScreen.jpg

Features List

There are three panels in this screen.

  • Left-most-panel or 1st panel – contains a document tree for the classified and unclassified Ephesoft documents.
  • Middle-panel or 2nd panel – contains the review panel and facilitates fuzzy search option. Review panel contains the list of document types and the list of documents for merging. Below review panel there is fuzzy search textbox, document level fields (with their extracted value) are present for the corresponding document.
  • Right-most-panel or 3rd panel shows the enlarged image of the selected document.

BatchDetailTab ThreePanels.jpg

Left-most-panel

In the document tree, there are classified as well as unclassified documents. Classified documents are marked by a green tick on its right-top. Unclassified documents are marked by a red question-mark on its right-top.

Middle-panel

Document level fields with their extracted values are displayed in the middle panel. In the below UI document level field is Invoice and extracted value is 5432000.Value of document level field can be populated by selecting overlay from image in right-most-panel

BatchDetail MiddlePanel.jpg

Clicking on the table view button opens another panel that contains a table corresponding to the selected document. This option only comes when there is some table configuration given for the batch class. If there is no table configured then this option doesn’t appear. The table should contain valid data. If any cell in the table contains any invalid data, then the batch is not validated.

Values in table can also be populated by selecting overlay from Right-most-panel.
Table on Review Validate Screen

BatchList TableOnRVScreen.jpg

Fuzzy search option returns table data that match a pattern approximately. Every document is mapped to a table in database. Data from the table in database is returned corresponding to the pattern specified in the fuzzy search textbox. A particular row from that table can be selected for populating data into document level fields.
BatchDetail FuzzyDBSearchResult.jpg
BatchList DLFFilledOnRVScreen.jpg

Right-most-panel

The right-most-panel contains the buttons for splitting, deleting, rotating the document, etc. These buttons can be used to perform some functionality given in the shortcuts tab. User can select any page from any document and use these buttons to perform the functionality shown in the screen shot below:

BatchDetail RightmostPanel.jpg

Clicking on shortcuts will open a table of shortcuts for operations like saving, splitting, merging, deleting the document etc. Following shortcuts are explained at this location: [#Keyboard_Shortcuts http://www.ephesoft.com/wiki/index.php?title=User_Manual#Keyboard_Shortcuts]

BatchDetailTab KeyboardShortcuts.jpg

 

Web Based Folder Management

Overview

It provides all web users to maintain ‘ephesoft shared folder’ including batch class folders, java script files and other configuration files. It has following below listed features:

  • Super Admin user can execute his batch using unc folder for required batch class.
  • New files can be uploaded as samples.
  • Old samples can be deleted.
  • New folders can be created.
  • Admins would be relieved from accessing server folder structure every time they desire to make some changes.
  • Super Admin users can view batch back up xml files as well and anlayze output of various executed plugins.
  • Super Admin users can view the final output PDF/TIFF generated by the application.
  • Users can configure some batch level property configurations.

To access it, user can hit the URL {http://localhost:8080/dcma/FolderManager.html} or click on the newly added tab displayed in the following image:

FolderManagementTab.jpg

Features list

The structure of this folder management feature has been designed in a way similar to make the job of Admins and Super Admins simpler.

  • FOLDER SELECTION WIDGET: This is a list box containing a list of batch class folders available for selection.

FolderManagement FolderSelectionWidget.jpg

The options appearing in the list of available folders depends upon the role assigned to the user who has logged-in:

  • Super Admin Users: The shared folder appears in the dropdown list. Also all the batch class folders present in the shared folders location will appear in the dropdown.
  • Admin Users: Only those Batch class folders (from the shared folder) appear in the list for which the user has permissions to access.

The first option in the list is selected by default. Sub folders are displayed in the Tree hierarchy on the left hand side of the screen.

  • Folder Structure Tree: All the subfolders are listed in the tree format as shown in the image below. User can click on the node to expand and select a sub folder as well.
  • Folder Content Table: The sub folders as well files contained are listed here.

FolderManagement FolderStructure&ContentTable.jpg

 

  • OPTIONS PANEL: There are a number of options available in the Options Panel above the table displaying the folder content:

Folder Options Panel:

  • New Folder: This creates a new folder under the currently selected folder. Each time a new folder is created, it is automatically assigned a name “New Folder” with an index appended at the end.
  • Up: This allows a user to go one level up in the folder structure.
  • Refresh: This refreshes the content of the folder currently selected in the folder tree.

Upload Options:

  • Browse: This allows user to browse through folder structure and select multiple files to upload.
  • Upload: Attached multiple files are uploaded to the selected folder on the click of this button.
  • View attached files: Just next to the Upload button is a link to view (and even remove) files attached using the browse button.

FolderManagement ViewOrRemoveAttachedFile.jpg

Multi-select Options: These options operate on the basis of files selected from the checkboxes available besides each file in the folder content table.

The options available are: Cut, Copy, Paste and Delete

Below the options panel, comes the table representing the table content. The table has the following sortable headers: Name, Modified on and Type.

FolderManagement ColumnSorting.jpg

For each entry in the table, the user is provided with the following options:

  • Double click on File/Folder: Double clicking any file/folder opens it. In case of a folder, the folder opens in the folder tree-table structure shown here itself. In case of a file, the file opens up in either the browser itself (if the browser supports it) or it prompts the user to open/save the file.
  • Right click on file/folder: Right clicking on a file/folder in the folder content table presents the user with various options:
    • Open: Open selected file or folder.
    • Cut: Cut the selected files or folders.
    • Copy: Copy the selected files or folders.
    • Rename: Opens a dialog box so that the file/folder can be renamed.
    • Delete: Delete the selected files or folders.
    • Download (Option only for files): Opens the browser dialog box to open/save the file.

FolderManagement RightClickOnFilleOrFolder.jpg
Please Note: Each of the above right click options are provided on individual files (on which the right click has been performed) and not on the selected folder or files.

Web Scanner Configuration

Overview

The purpose of this document is to show how to configure and run Ephesoft Web Scanner for the first time on any browser. Supported browsers are Firefox, Chrome and IE.

Configuration

Configuration steps are as follows:-

  • Enter the Ephesoft Web Scanner URL in address bar:-

For e.g.: http://localhost:8080/dcma/WebScanner.html

 

  • If login page appears than enter valid credentials and login to the application.

LoginScreen.jpg

 

  • Please check ‘Always trust content from this publisher’ on the security popup appearing on the screen (see below screenshot) when running the web scanner for the first time on any browser. After that, click ‘Run’ button.

WebScanner SecurityInformation.jpg

 

  • Refresh the browser now to use the Ephesoft Web Scanner.

Troubleshooting

  • If ‘Start’ button is not visible even after refreshing, please restart the browser.
  • If error message ‘Unable to perform action: INITIALZE on the Web Scanner Applet’ appear, please update browser’s Java plugin.

Workflow Configuration

Overview

This property file is used to configure the workflow i.e. how different services will be executed. This file consists of following four types of service configurations:

  • Pickup Service Configuration.
  • Resume Service Configuration.
  • Workflow Configuration.
  • Web Service Configuration.

Configuration

dcma-workflows.properties

Following is the list of configurable properties:

  • PickUpService Configuration

Pick up service runs as a scheduler service which keeps a watch on BATCH_INSTANCE table. Every time a batch is ready to be picked and its status is NEW/READY/LOCKED, this service takes a lock on that batch and triggers the workflow for that batch. Priority of batches to be picked up by pick up service is READY > LOCKED > NEW.

 

Configurable propertyType of valueValue optionsDescription
dcma.pickup.cronjob.expressionStringAny valid cron expression.
Default Value:0 0/1 * ? * *(For this value, pick up service will be invoked every one minute.)
This parameter specifies the schedule for which Pick up service will run. This is specified as a cron job expression.
server.instance.max. process.capacityIntegerAny Integer Value.
Default value: 5
This parameter specifies the max number of RUNNING batch instances that a server instance can process at a given instance of time.
server.instance.pick.capacityIntegerAny Integer Value.
Default value: 3
This parameter specifies the maximum number of batches that a pickup service will pick in 1 round of execution, such that ‘batches picked up <=( max process capacity – RUNNING state batches)’
  • Resume Service Configuration

Resume service, like pick up service, also runs as a scheduler service. This service also keeps an eye on BATCH_INSTANCE table. It picks the batches that are locked by another server instance which is not active now or has gone down (as detected by HeartBeat service). It takes a lock on those batches and resumes the workflow for them.

 

Configurable propertyType of valueValue optionsDescription
dcma.resume.cronjob.expressionStringAny valid cron expression.
Default Value:0 0/1 * ? * *(For this value, resume service will be invoked every one minute.)
This parameter specifies the schedule for which resume service will run. This is specified as a cron job expression.
server.instance.resume.capacityIntegerAny Integer Value.
Default value: 4
This parameter specifies the max number of batches that a resume service can pick at one go i.e., in 1 iteration.
  • Other Configuration

Workflow configuration for deploying and sending mails when an error occurs in the workflow. When a batch instance goes into error, then a mail is sent via following configuration mentioned below:

 

Configurable propertyType of valueValue optionsDescription
workflow.error.from_mailStringAny valid email id.
Default Value: enterprise.support@ephesoft.com
This parameter specifies the e-mail id from which the mail should be sent, when some error occurs during batch processing.
workflow.deployString* True

  • False

Default Value=true

This parameter re-deploys all the available workflows in the system. Should be set only once during the new installations or upgrades.
workflow.error.subjectStringValid e-mail subject.
Default Value: Error in workflow execution!!
This parameter specifies the subject for mail to be sent when some error occurs during batch processing.
workflow.error.to_mailStringAny valid email id.Default Value: enterprise.support@ephesoft.comThis parameter specifies the e-mail id to which the mail should be sent when some error occurs while batch processing.
newWorkflows.basePathStringValid path of directory.
Default Value: {Application}\\SharedFolders/workflows
This parameter specifies the path where all jpdl(s) are placed when a new workflow or plugin is deployed.
  • WebService Configuration

These configurations should only be set in case of configuring Grid Computing workflow.

 

Configurable propertyType of valueValue optionsDescription
wb.folderPathStringAny valid path of ftp folder.
Default Value:test
This parameter specifies the folder path to be picked from the ftp location for processing batch in Grid Computing Batch Class.
wb.hostURLStringValid URL.
Default value:http://localhost:8080/dcma/rest
This parameter specifies the host URL for sending the batch instance from one Ephesoft instance to another.
dcma.batch.status.cronjob.expressionStringAny Integer Value.
Default value: 0 0/1 * ? * *
This parameter specifies the schedule for fetching the batch instance status of remote batch instance executing on another Ephesoft instance server. This is specified as a cron job expression.

Dependency

For e-mail on error functionality, it depends on the mail configuration done in “mail.properties” property file.

 

Workflow Management

Overview

This document covers all the aspects which user needs to configure a workflow. This document focusses on the preparation of content, i.e. plugins and their respective dependencies, needed by any workflow and criteria on which these plugin will be working.

Features

This tab is visible only to the admin and provides feature for adding a plugin and configuring its dependencies. On clicking the “Workflow Management” tab or accessing the “http ://<Server-name>:<port number>/dcma/CustomWorkflowManagement.html”, the user will see a screen containing the following as shown in the screenshot:

  • “Plugins List”: List of plugins already present.
  • “Add New Plugin” button: For adding or updating a plugin.
  • “Dependencies” button: On being clicked it will take user to a screen where it can manage the dependencies among the plugins.
  • “Help” button: User will be shown a pop-up message containing the information about the how to information on new plugin upload.

WorkflowManagement.jpg

Plugins list

Landing screen for the workflow management tab will contain the list of already installed plugins. The list displayed will support the default Ephesoft UI functionalities of pagination and sorting. The list will display the following for each plugin:

  • Plugin name
  • Plugin description

Add New Plugin

View

On clicking on “Add New Plugin” button, a file upload widget will open up with the following options. See the screenshot below:

WorkflowManagement AddNewPlugin.jpg

  • “Browse”: a file selection window will open up.
  • “Save”: this will save the plugin to the DB after validating the files.
  • “Cancel”: this will cancel the operation.

Working

Working of this functionality depends on the following conditions:

  • This widget will accept a .zip file for uploading. Contents of the zip file:
    • .Jar file: The Jar for the plugin to be added.
    • .Xml file: containing the plugin information. Please see below for structure of this XML file.
    • .Zip file must only contain these two files i.e. .Jar and .Xml.
    • .Zip file and .Jar file must have the same name.
    • .Jar file content cannot be verified, so the user must make sure that they are as required.
  • In order for this plugin to take effect, user needs to restart the tomcat server.
  • Note:
  • This Zip file after successful validation of its contents will be stored in the configurable location specified in “<Ephesoft installation path>\Application\WEB-INF\classes\META-INF\application.properties” file under the property named “plugin_upload_folder_path”.
    • The JPDL file for the uploaded plugin will be stored at <Ephesoft installation build>\Application \WEB-INF\classes\META-INF\dcma-workflows\plugins\<PLUGIN_NAME>

Plugin XML structure

Validation on XML

  • All tags are compulsory and will have any string value, except for “is-scripting”,” is-mandatory”, “is-multivalue” AND “override-existing” tags which will have Boolean values (TRUE, FALSE).
  • “jar-name” tag value must match the name of the jar file present in the zip file.
  • If “is-scripting” tag has a value “TRUE”, only then the values of “back-up-file-name” and “script-name” tag will be taken into account.
  • “plugin-property” and “dependency” tag can have multiple instances and have the values for plugin configs and dependencies respectively.
  • “override-existing” tag decides whether to add the new plugin or update an existing one. If value is “true”, then the existing plugin will be updated else it will be added as new.
  • Three operations can be done on the plugin properties and will be defined by the “operations” tag inside “plugin-property” tag. Following are the supported operations:
    • Add: adds a plugin property. An error is shown if it exists already.
    • Update: updates a plugin property identified by its name and if it doesn’t exist, creates a new one.
    • Delete: deletes a plugin property identified by its name. An error is shown if no such property exists.
  • Default values for the plugin properties:

 

Property data typeDefault value
StringDefault
Integer0
BooleanYes

These properties will be assigned for properties which are mandatory. Also if a property is multivalued, the first value from the list will be the default value.

Assumptions

  • “plugin-service-instance” and “method-name” tags must be correct as they cannot be validated.
  • “application-context-path” refers to the application context file name for the plugin.
  • For the dependencies tag:
    • For a new plugin, with dependencies as
    • ORDER_BEFORE : P2,P3/P4,P6/P7/P8
    • UNIQUE : TRUE

Dependencies management

On clicking the “Dependencies” button, the dependency management screen will open up. It contains the following:

  • “Plugin” List Drop Down: Allows the user to select the plugin whose dependencies it wants to see.
    • “Dependency List”: List of Dependencies with the following attributes:
      • Plugin Name: Name of the plugin.
  • Dependency Type: Type of dependency it shares with the dependent plugins.
  • Dependency: List of dependent plugins.
  • Add” Button: Allows the user to add a dependency for a plugin.
  • Edit” Button: Allows the user to edit an already existing dependency for a plugin.
  • Delete” Button: Allows the user to delete an already existing dependency for a plugin.
  • Save” Button: Saves the current state of dependencies of all the dirty plugins and takes the user to the “Workflow Management” Screen.
  • Apply” Button: Saves the current state of dependencies of all the dirty plugins and stays on the current Screen.
  • Cancel” Button: Discards the current state of dependencies of all the dirty plugins and takes the user to the “Workflow Management” Screen.

WorkflowManagement Dependencies.jpg

 

Add Dependencies

This screen shows the following:

  • Plugin Name: name of the plugin selected on the previous screen.
  • Dependency type: List of available dependency types. Only single select is allowed.
  • Dependencies List: this list contains the list of available plugins minus the plugin selected on the previous screen. This list will be enabled only when “ORDER_BEFORE” is chosen as dependency type. Only single select is allowed.
  • Selected Dependencies: List of dependencies selected by the user.
  • And Button: on being clicked, adds the dependency selected in the “Dependencies List” as an “and” dependency to the “Selected Dependencies” text box.
  • Or Button: on being clicked, adds the dependency selected in the “Dependencies List” as an “or” dependency to the “Selected Dependencies” text box.
  • Ok Button: Saves the dependency to the plugin.
  • Reset Button: Resets all the fields to their initial values.

WorkflowManagement EditDependency.jpg

Edit Dependencies

This allows the user to edit a particular dependency record.

Delete Dependencies

This allows the user to delete a particular dependency record.

Help Content

  • On being clicked it will display a pop-up message providing information on how to upload and use a new plugin.

WorkflowManagement Help.jpg

Dependencies database table structure

Idplugin_iddependency_typedependency
1P1ORDER_BEFOREP3,P5
2P2ORDER_BEFOREP1/P8
3P3ORDER_BEFOREP4
5P1UNIQUE
  • Fields:
    • Id:
      • Data type: Long
      • The unique Id for the table
    • plugin_id:
      • Data type: Long
      • Plug-in id mapped directly to the Plug-in table.
    • dependency_type:
      • Data type: ENUM(ORDER_BEFORE,UNIQUE)
      • Defines the type of dependency between the plug-in and list of plug-ins in dependency column.
    • dependency:
      • Data type: String
      • List of plug-ins which on which the plug-in depends.
      • Format of values in the delimiter separated values.

 

DelimiterMeaning
,AND
/OR
  • Example: P1,P2,P3/P4,P5,P6/P7/P8
    • Above example means that plug-in needs the following:
      • P1
      • P2
      • P3 OR P4
      • P5
      • P6 OR P7 OR P8
    • NOTE: if dependency_type = UNIQUE, then this field will be empty.
  • Type of dependencies:
    • Ordering of plugins:
      • This type of dependency signifies a dependency where a plugin requires a plugin to run before it.
      • Example:
        • For 1st plugin in workflow: no dependency.
        • For any other plug-in: All of its ancestor plug-ins.
  • Uniqueness:
    • This type of dependency signifies a plugin’s uniqueness in the workflow, i.e. it should only run once in the workflow. E.g. clean up plug-in

Automated Regex Validation Plugin

Overview

This plugin performs the functionality of validating the documents with respect to the given regex pattern. The regex pattern described in the Regular Expression Listing is used to validate the documents. The given regex pattern is matched with respect to all the values in each document for all the document level fields present, if all are matched then that document is marked as valid i.e. their valid tag is set to true and if out of all, any document level field doesn’t match then that document is set as invalid i.e. their valid tag is set to false.

Configuration

Steps for configuring the plugin

  • User can select the batch class module and create the regex pattern by navigating to Regular Expression Configuration page as shown below:

BatchClassManagement RegularExpressionConfiguration.jpg

  • User can create multiple regex patterns for each document level field. This is shown below in the screenshot:

BatchClassManagement RegularExpressionListing.jpg

Steps of execution

  • Plug-in uses the regex pattern defined for each document type in document level fields.
  • It matches all the regex defined with each document level fields from batch.xml. If all the values of document level fields are matched with regex defined then that document’s “Valid” tag is set to true, otherwise it is set to false.
  • The documents that are valid do not need validation but those which are set as false for valid tag are to be validated during Validation.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1Invalid initialization of field service.No field type initialized in a document.
2Invalid input pattern sequence.Regex pattern is not supplied for required field.

Barcode Extraction Plugin

Overview

This plug-in performs the functionality populating the field type value when barcode type is given. When the plug-in switch is ON then the barcode value extracted is saved as the value for the field type. If the switch is OFF for the plug-in then it doesn’t perform anything.

Configuration

Steps for configuring the plugin

  • User can select the batch class module and navigate to barcode extraction plug-in configuration page as shown below:

BatchClassManagement BarcodeExtractionPlugin.jpg

The User can edit the above settings by clicking on “Edit” in order to change the settings for their requirements.

 

Configurable properties

Following are the configurable properties available for the Barcode Extraction plugin:

 

Configurable property
Type of value
<center>Value options<center>Description
Barcode Extraction SwitchList of values
  • ON
  • OFF

 

Switch to decide whether or not to perform barcode extraction.Default ON.
Barcode Extraction Maximum ConfidenceInteger0-100The maximum confidence value that is used for extraction.
Barcode Extraction Minimum ConfidenceInteger0-100The minimum confidence value that is used for extraction.
Barcode Extraction Reader TypesMulti select
  • CODE39
  • QR
  • DATAMATRIX
  • CODE128
  • CODE93
  • ITF
  • CODABAR
  • PDF417
  • EAN13

 

All the barcode extraction types that is present. Following list of preset barcode extraction types are present: – CODE39, QR, DATAMATRIX, CODE128, CODE93, ITF, CODABAR, PDF417 AND EAN13.
Barcode Extraction Valid ExtensionMulti select
  • Tiff
  • gif

 

It is used to configure all the possible types of files that will be used for extraction.

Steps of execution

  • Plug-in uses the type of barcode given in the field type listings. While creating the field type user can enter the type of barcode that has to be used for classification. This is shown in the following screen shot:-

BatchClassManagement BarcodeType.jpg

  • While executing, if there is any barcode present on the document, then the value extracted from the document for barcode is used to populate the value of the document level field.
  • If there is no barcode given then it will not set the value for document level field.

Dependency

  • There must be document level field present and barcode type must be selected in the field type configuration.
  • Files must have required extensions only which are configured.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1No valid extensions are specified in resources.There are no values for valid extensions.
2File has invalid extension.If the files present has extension other that the given valid extension. For example if valid extension given is “gif” and document being processed is “tif”.

Barcode Reader Plugin

Overview

Barcode Reader Plugin is used to read barcode from the input images using zxing. Barcode Reader plugin is used to read the following barcode types:

  • CODE39
  • CODE93
  • CODE128
  • ITF
  • PDF417
  • QR
  • DATAMATRIX
  • CODABAR
  • EAN13

Any barcode detected on the images using barcode reader plugin then that barcode decoded value will be consider as document type name for the barcode classification in Document Assembler.

Barcode values should be document type value used in the batch class having this plugin.

Configuration

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable propertyType of valueValue optionsDescription
Barcode Valid ExtensionsString.tif; .gifThese are the valid extension of the input images file for decoding the barcode.
Barcode Max ConfidenceIntegerNAThis is max confidence to be set if the barcode is decoded on the input images.
Barcode Min ConfidenceIntegerNAThis is the min confidence to be set if the barcode is not found on the input images.
Barcode Classification SwitchString
  • ON
  • OFF

 

Switch is used ON/OFF the barcode reader plugin. Default ON
Barcode Reader TypeString
  • CODE39
  • CODE93
  • CODE128
  • ITF
  • PDF417
  • QR
  • DATAMATRIX
  • CODABAR
  • EAN13

 

These values are used to decode the barcode type using the barcode reader plugin.

This is shown in the screen shot given below:

BarcodeConfigurableProperties.jpg

Steps of execution

  • This plug-in works in the page process phase of the application when all the import processing on the batch has been done and it’s ready to be page processing.
  • The plug-in decodes the barcodes on the input images.
  • After all the work is done, it writes the information into batch.xml file for the barcode being decoded.

Dependency

The plugin assumes the import processing of the batch has been done properly and after this plugin will decode the barcode from the input images.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1No pages found in batch XML.Invalid Batch.xml present in Batch Instance Folder.
2No valid extensions are specified in resources.No “Barcode Valid Extensions” extensions found in the database.
3File {image name} has invalid extension.This image file is having invalid file extension for processing Barcode Reader Plugin.

Clean Up Plugin

Overview

This plug-in is used to delete system files and UNC folder data once all the processing on the batch is complete. This plugin also removes all the content associated with a particular batch instance. Using this plugin, all the content associated with batch instance is also removed.

Steps of execution

  • This plug-in ideally works after the export phase of the application when all the processing on the batch has been done. I.e. the desired results have been exported and the batch instance’s content is no longer required by ephesoft application.
  • The plug-in takes the identifier of a batch instance and removes all the contents and it’s sub-files from the following paths:
    • <SHARED_FOLDER_PATH>\<BATCH_CLASS_UNC_FOLDER>\<BATCH_INSTANCE_FOLDER_NAME>: This folder is deleted always.
    • <LOCAL_FOLDER_PATH>\<BATCH_INSTANCE_IDENTIFIER>: This folder is deleted only if plugin configuration “Delete System Folder Information” is “TRUE”.
    • Also deletes “<BATCH_INSTANCE_IDENTIFIER>.ser” file from the <LOCAL_FOLDER_PATH> \properties folder. This file is deleted only if plugin configuration “Delete System Folder Information” = TRUE.

Configuration

Configuration screenshot

BatchClassManagement CleanupPlugin.jpg

Configurable Properties

Following are the list of configurable properties for the plugin:

 

<center>Configurable property
Type of value
Value options
Description
Delete System Folder InformationList of values
  • TRUE(Default)
  • FALSE

 

Defines whether or not the “<Local folder>\<Batch instance>” folder and its contents are to be deleted or not.

Dependency

  • The plugin depends on “IMPORT BATCH FOLDER” as it considers a batch to be imported first before its associated files are cleaned up.
  • This plugin should ideally occur in the workflow only once and should be the last plugin for the workflow. If not, then it will remove the resources to be used by the other which run after it and hence will cause the batches to go into error.
  • The plugin assumes the extraction for the incoming batch has been done properly and just changes the results of provided batch.xml in a desired format.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1Unable to delete FolderFolder could not be deleted. Due to one of the following reasons:

  • It is locked by some other process
  • It is opened by user on explorer

 

2Not enough permission to delete folderSecurity exception occurred. The user/process/JVM does not have sufficient rights to delete the folder to be cleaned.

CMIS Export Plugin

Overview

This plug-in is used for uploading PDF/TIFF file being generated as final output of batch execution to a CMIS compliant repository as ‘Document’ object. Currently the application supports Alfresco, Nuxeo, SharePoint, Documentum and IBM CM repository.

 

Configuration

Ephesoft Configurable Properties

Edit the configurations in CMIS Export Plugin as follows:

Cmis Export Plugin.jpg

Following is the list of configurable properties for the plugin:

 

Configurable propertyType of valueValue optionsDescription
CMIS Root folder NameStringN-AName of the folder at CMIS repository.
CMIS Upload File ExtensionList of values
  • pdf
  • tiff
The extension of the file being uploaded.
CMIS Server URLStringFor example : http://{Server_ip}:{port_number}/alfresco/service/cmisThe URL of the CMIS repository server. This URL is varies for different repository like “alfresco”, “share point”, “Nuxeo”, “Documentum” etc.
CMIS Server User NameStringFor example:“admin”The username for CMIS repository server login.
CMIS Server User PasswordStringFor example:“admin”The password for CMIS repository server login.
CMIS Server Repository IdStringFor example: “83b9c8bb-415e-46fd-9feb-c9fb8e4e2122”Id of the CMIS repository used for uploading files.
CMIS Server Switch ON/OFFList of values
  • ON
  • OFF
This property enables/disables CMIS Export Plugin.
Aspect SwitchList of values
  • ON
  • OFF
This property is specific to Alfresco repository. This property enables/disable
Aspects on the document.
CMIS Export File NameStringFor example: “$EphesoftBatchID && _ &&$Ephe softDOCID”The name of the file to be uploaded.

  • Must contain one or more parameters out of – EphesoftBatchID/ EphesoftDOCID /a document level field name.  A parameter name must begin with ‘$’

symbol.  Different fields must be separated by ‘&&’.

  • If none specified, name of the local folder to be exported is used

to get filename to be exported.

 

Documentum Repository Configurable Properties

  • CMIS Server URL: http://<host address>:<port_of_emc-cmis>/emc cmis/resources/repositories/Repository_id

 

  • Repository id is the repository name and can be extracted from a xml file which can be downloaded by hitting URL:  http://<host address>:<port_of_emc-cmis>/?repositoryId=RepositoryName

 

  • To create/edit configuration types in Documentum, use Eclipse plugin for WebTop Development Kit  (Please refer following link:

http://marketplace.eclipse.org/content/documentum-webtop-development-kit). Create a type corresponding to Ephesoft’s Document Type to be used in CMIS as a child of dm_document type and add attributes which corresponds to DLFs in Ephesoft’s Document type.

 

  • Following are the steps for viewing Documentum Repository configurations:

o Users use Documentum Administrator to explore uploaded files and to create types in Documentum Repository.
o URL to access Documentum Administrator is http[[://<host address>:< port_of_da>/da]]
o It will ask for login credentials to repository.
o After successful login, user can select a type under Repository/Administration/Types to view its properties and attributes :

Cmis Access Properties.jpg
Figure: Showing access to properties of a type in Repository

 

  • DLF-Attribute-mapping.properties(located at [EphesoftInstallationDirectory]\SharedFolders\[Batch-class-Folder]\cmis-plugin-mapping):

DocumentTypeName=DocumentumTypeName
DocumentTypeName.FieldTypeName1=Documentum’sType’sAttributeName1
DocumentTypeName.FieldTypeName2=Documentum’sType’sAttributeName2
DocumentTypeName.FieldTypeName3=Documentum’sType’sAttributeName3
A sample file content is:
INV=invoice
INV.inv_num=inv_num
INV.inv_amount=inv_amount
INV.inv_date=inv_date

 

  • Uploaded documents can be viewed at Repository/Administrator/Cabinets/*location in Documentum Administrator:

Cmis Uploaded Doc.jpg
Figure: Showing an uploaded document via Ephesoft

 

  • The properties of uploaded batch can be viewed by following by right click on uploaded file and selecting Properties.

Cmis Access Props Of Uploaded Doc.jpg

Figure:Showing access to properties of an uploaded document

Cmis Props Of Uploaded Doc.jpg

Figure: Properties of an uploaded document.

 

  • Properties in dcma-cmis.properties file located at  [EphesoftInstallationDirectory]\

Application\WEB-INF\classes\META-INF\dcma-cmis\* are similar to for Alfresco Repository (Refer ‘dcma-cmis.properties  for Alfresco ‘specified below).
 (NOTE: If wssecurity is used, URL that returns a page that containing a list of web services is: http://<host address>:< port_of_emc-cmis >/emc-cmis/services/RepositoryService)

Alfresco Configurable Properties

There are configuration files which should be placed at the Alfresco installation directory’s following path :< Alfresco installation path>\tomcat\shared\classes\alfresco\extension

 

  • There are three configuration files used in Ephesoft to map parameters:

web-client-config-custom.xml”’:   Alfresco automatically looks for this file on the class path in the alfresco.extension package for configuration.
ephesoft-model-context.xml”’:  To tell the location of the custom configuration file (Any file ending with “-context.xml” is used to tell the location of the custom configuration file).
ephesoftModel.xml”’:  The custom configurations file for stating the parameters (Document level index fields) that will be mapped with alfresco repository parameters.
Sample entries in ephesoftModel.xml file:-
<type name=”ephesoft:ephesoft”>
<title>Ephesoft Document Procedure</title>
<parent>cm:content</parent>
<properties>
<property name=”ephesoft:invoiceDate”>
<type>d:text</type>
</property>
<property name=”ephesoft:partNumber”>
<type>d:text</type>
</property>
<property name=”ephesoft:invoiceTotal”>
<type>d:text</type>
</property>
<property name=”ephesoft:state”>
<type>d:text</type>
</property>
<property name=”ephesoft:city”>
<type>d:text</type>
</property>
</properties>
</type>
o A default xml file is available with the Ephesoft release.

 

  • DLF-Attribute-mapping.properties:

o The properties file used to get the mapping of parameters onto alfresco custom parameters i.e. mapping of Ephesoft specific Document Types to Alfresco Document Types and Ephesoft specific Document Level Fields to Alfresco specific Document Level Fields.
o Sample entries in properties file:
Application-Checklist=D:ephesoft:ephesoft
Application-Checklist.InvoiceDate=ephesoft:invoiceDate
Application-Checklist.PartNumber=ephesoft:partNumber
Application-Checklist.InvoiceTotal=ephesoft:invoiceTotal
Application-Checklist.State=ephesoft:state
Application-Checklist.State=ephesoft:city
Application-Check\ list.State=ephesoft:city
o Note: In case there is space in the name of the Document or in Document Level Fields, then escape it with “\ “character.
o A default properties file is available with the Ephesoft release starting from 2.5 latest versions.

 

  • Properties for dcma-cmis.properties:

A property is given as property_name=value

 

Configurable propertyType of valueValue optionsDescription
cmis.document_versioning_stateList of values· NONE:  The document will be created as a non-versionable document.
· CHECKEDOUT: The document MUST be created in the checked-out state.
· MAJOR: The document MUST be created as a major version.
· MINOR: The document MUST be created as a minor version.
This is the document versioning state for uploading.
Default or in case of invalid option: CHECKEDOUT
cmis.security.modeList of values· “basic” for HTTP Basic Authentication (default)
· “wssecurity” for WS-Security Username Token based security.
Specify the security mode employed by the CMIS endpoint.
cmis.repo.create_batch_subfoldersList of values· true
· false
This is to specify whether or not a subfolder should be created for the batch within the configured target repository folder.  If invalid or missing it is true.
cmis.aspect_mapping_file_nameStringFor example:
aspects-mapping.properties
This is the name of the aspect properties file present in “\META-INF\dcma-cmis\dcma-cmis.properties”.
This is to add aspects to documents being uploaded on CMIS repository via Ephesoft for alfresco repository.

Specify the WSDL URL’s for each of the CMIS services if “wssecurity” is specified for the value of the “cmis.security.mode” property. The text {serverURL} may be inserted into the path if you wish to have the batch class configured server URL to be used for part of the URL.
For example:
o cmis.url.acl_service=http://hostname:8080/alfresco/soap/ACLService?wsdl
Or
cmis.url.acl_service={serverURL}/ACLService?wsdl, where {serverURL} is the CMIS server URL configured within the batch class.
Similarly following properties are set for wssecurity:
o cmis.url.discovery_service=http://localhost:8181/alfresco/cmisws/DiscoveryService?wsdl
o cmis.url.multifiling_service=http://localhost:8181/alfresco/cmisws/MultiFilingService?wsdl
o cmis.url.navigation_service=http://localhost:8181/alfresco/cmisws/NavigationService?wsdl
o cmis.url.object_service=http://localhost:8181/alfresco/cmisws/ObjectService?wsdl
o cmis.url.policy_service=http://localhost:8181/alfresco/cmisws/PolicyService?wsdl
o cmis.url.relationship_service=http://localhost:8181/alfresco/cmisws/RelationshipService?wsdl
o cmis.url.repository_service=http://localhost:8181/alfresco/cmisws/RepositoryService?wsdl
o cmis.url.versioning_service=http://localhost:8181/alfresco/cmisws/VersioningService?wsdl

 

  • Mappings of Data types defined in Ephesoft and at Alfresco Server

Reference Links:- [[1]], [[2]]

 

Ephesoft Data TypeAlfresco data typeAlfresco Property Type mapping (Internally Converted to)Comments (If any)
STRINGd:textString
INTEGERd:intInteger
FLOATd:floatDecimal
DOUBLEd:doubleDecimal
DATEd: datetimeDateTime
BOOLEANd: booleanBoolean
LONGd: longIntegerMax allowed values: 999-999-999

Checklist

  • Mapping of document types in DLF-Attribute-mapping.properties file should be equivalent to type defined in ephesoftModel.xml file in Alfresco repository.
DLF-Attribute-mapping.propertiesephesoftModel.xml
Application-Checklist=D:ephesoft:ephesoft<type name=”ephesoft:ephesoft”>
  • Data Type of document level fields defined in DLF-Attribute-mapping.properties file should be equivalent to the types of document attributes defined in ephesoftModel.xml file in Alfresco repository.
DLF-Attribute-mapping.propertiesephesoftModel.xml
Application-Checklist.InvoiceDate
The datatype from Ephesoft Application should be of type “String”
<type>d:text</type>
  • Screenshot from Ephesoft application for Data Types.

Cmis Ephesoft Data Types.jpg

Aspect switch configuration

Below is the requirement to add aspect:

 

  • To add aspects to the file being uploaded:

Aspects will be added to the document file being uploaded. This will be done according to its document type defined in its batch.xml file.  To know which aspect is to be added to documents of which document type (when uploading), there has to be a mapping of document type v/s aspects.

 

  • Add values to properties defined by an aspect:

These values will be the values of document level fields that have been mapped to that property.

 

Mapping Properties

Path of mapping properties file:

There has to be a mapping defined for the above two requirements. This will be done in Ephesoft with the help of a property file.
The absolute path of the file is specified by the following steps:
o The folder name in which this property file resides inside the batch class folder of ephesoft-data is specified through the “batch.cmis_plugin_mapping_folder_name” property in the file: “\META-INF\dcma-batch\dcma-batch.properties”.
e.g.:  batch.cmis_plugin_mapping_folder_name=cmis-plugin-mapping property
P.S: This is the same property that defines the folder path of the property file for CMIS content type mapping.
o The name of this property file is specified by a new property “cmis.aspect_mapping_file_name” in the property file: ”\META-INF\dcma-cmis\dcma-cmis.properties”.
e.g.:   cmis.aspect_mapping_file_name=aspects-mapping.properties
The above defined property file contains the entire mapping associated with aspects.

 

Content of mapping properties file

It is needed to add mapping for:
Mapping document types to aspects:
User can map document types to multiple aspects (i.e. the aspects user  intend to add to documents of a certain document type).
This will be done through adding the name of the document type as key and aspects as the value (each aspect separated with a semi-colon “;”)
e.g.:  Application-Checklist=P:cm:titled;P:cm:taggable
In this example user is adding two aspects:  “P:cm:titled” and “P:cm:taggable” to all documents with document type “Application-Checklist”.
Mapping document level fields to aspect properties:
User can map document level fields to aspect properties.
This can be done by using the key as “{DocumentType}.{DocumentLevelFieldName}” and the value as the property to be mapped to.
e.g. :  Application-Checklist.State=cm:description
In this example user is specifying that for all documents with document type “Application-Checklist” he/she will be populating the value of document level field “State” into the aspect property “cm:description”.
In case of an error encountered while adding aspects to a uploaded document, the user will have to restart the batch after correcting the errors due to which the error was being encountered, and the document will be uploaded again.
For more information on aspects, please refer to the link: [[3]]

Dependency

The plugin runs after Create Multi Page Files Plugin in Export Module. The plugin assumes that the multipage tiff/pdf has been successfully generated for the batch and uploads the multipage tiff/pdf to the CMIS repository.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1com.ephesoft.dcma.core.DCMAException: Not FoundAlfresco server URL is invalid.
2com.ephesoft.dcma.core.DCMAException: Repository not found!Repository ID is invalid.
3Cannot initialize Web Services service object[org.apache.chemistry.opencmis.binding.webservices.RepositoryService]: Failed to access the WSDL at:http://localhost:8181/alfresco/cmisws/RepositoryService?wsdl. It failed with:Connection refused: connect.Invalid URL for wssecurity either updates it to basic or corrects the URL in {dcma-cmis.property} file.
4com.ephesoft.dcma.core.DCMAException: UnauthorizedInvalid user name or password.
5Server URL is null/empty from the data base. Invalid initializing of properties.Server URL is empty or not mapped to database.
6Server User Name is null/empty from the data base. Invalid initializing of propertiesUsername is Empty or not mapped to database.
7Server User Password is null/empty from the data base. Invalid initializing of properties.Password is empty or not mapped to database.
8UploadFileTypeExt is null/empty from the data base. Invalid initializing of propertiesUpload file type extension is empty or not mapped to database.
9RootFolder is null/empty from the data base. Invalid initializing of properties.Root Folder is empty or not mapped to database.
10org.apache.chemistry.opencmis.commons.exceptions.CmisConstraintException: ConflictFiles already exist in the specified folder hierarchy. Please try deleting old files.
11java.lang.IllegalArgumentException:Object Id must be set!Unable to create folder in the specified hierarchy.
12CMISExporter- Bad Request issueMapping defined in DLF-Attribute-mapping.properties file is not the same as mapping defined in content model at Alfresco repository.NOTE: Detailed description of error #12 is below.
13CMISExporter- Property ‘ephesoft:partNumber’ is a String property” from Alfresco repository.Mismatch in the type of Document Level fields defined in Ephesoft application andthose defined in the Alfresco content model.
NOTE: Detailed description of error #13 is below.

Description of error #12

· User  may define in properties file a mapping as follows:-
o Application-Checklist=D:ephesoft:document
o Application-Checklist.InvoiceDate=ephesoft:invoiceDate
o Application-Checklist.PartNumber=ephesoft:partNumber
o Application-Checklist.InvoiceTotal=ephesoft:invoiceTotal
· At Alfresco repository, however, it  may be defined as follows:-
<type name=”ephesoft:ephesoft”>
<title>ephesoft Document Procedure</title>
<parent>cm:content</parent>
<properties>
<property name=”ephesoft:invoiceDate”>
<type>d:text</type>
</property>
<property name=”ephesoft:partNumber”>
<type>d:text</type>
</property>
<property name=”ephesoft:invoiceTotal”>
<type>d:text</type>
</property>
<property name=”ephesoft:state”>
<type>d:text</type>
</property>
<property name=”ephesoft:city”>
<type>d:text</type>
</property>
</properties>
</type>
· This mismatch would give “Bad Request” error from CMIS plugin while it tries to upload the document.

Description of error #13
· User may have following mappings defined in Alfresco content model:-
<property name=”ephesoft:partNumber”>
<type>d:text</type>
</property>
<property name=”ephesoft:invoiceTotal”>
<type>d:int</type>
</property>
o “partNumber” may be mapped as “text” type.(Let us say, this is of type LONG in Ephesoft application)
o  “invoiceTotal” may be mapped as “int” type.( Let us say, this is of type DOUBLE in Ephesoft application)
Above mismatch gives the “CMISExporter – Property ‘ephesoft:partNumber’ is a String property” from Alfresco repository.
Correction: Update the content model in Alfresco repository with appropriate data types. (For further reference of data type mappings, please refer following link. http://wiki.alfresco.com/wiki/Data_Dictionary_Guide#Data_Types)
<property name=”ephesoft:partNumber”>
<type>d:int</type>
</property>
<property name=”ephesoft:invoiceTotal”>
<type>d:double</type>
</property>

Copy Batch XML Plugin

Overview

Overview

Copy batch xml plugin is an export plugin available in Ephesoft application. It allows us to export the metadata generated by processing a batch to any location on the file system. Using this plugin, we can export the generated batch.xml and output document files(PDF and/or TIFF files). The following configurable parameters are available:

  • Base export Folder for all the batches.
  • Naming pattern of the export folder for a batch.
  • Naming pattern of the document files to be copied.
  • Type of output document files to be copied(PDF and/or TIFF files).

Plugin Properties

  • Final Export Folder: Folder path where all the output files (batch xml, multipage pdf, multipage tiff) are to be exported.
  • Export To Folder Switch: Switch to decide whether or not to copy the batch output files to the Final Export Folder. It switches the Copy Batch XML plugin *ON/OFF. Default value is ON.
  • Export Folder Name: Folder with the specified name will be created in the Final Export Folder and all the document output files (multipage pdf(s) and tiff(s)) will be copied into this folder.
  • Export File Name: All the document output files (multipage pdf(s) and tiff(s)) will be renamed based on the parameters specified in this property.
  • Batch XML Export Folder: Folder where batch xml file will be moved. It’s possible values are :
  • Batch Instance Folder: Batch XML files will be copied to batch instance folder (BIXX) in the Final Export Folder.
  • Final Export Folder: In this case, batch xml’s will be copied directly in the Final Export Folder.

Export Folder Name and Export File Name configuration

  • Values specified in these fields can be either document field name, EphesoftBatchID or EphesoftDOCID.
  • Each parameter (document field name, EphesoftBatchID or EphesoftDOCID) should be preceded by ‘$’.
  • Example: $Invoice Date && _ && $Invoice Total && _ && $EphesoftBatchID. If batch xml has following values for these parameters :
    • Invoice Date : 13 Jan
    • Invoice Total : 22.22
    • EphesoftBatchID : BIA

then folder or file names will be named as “13 Jan_22.22_BIA”.

  • && will be used as a separator between parameters.
  • If any invalid character is entered by user (for either Export Folder Name or Export File Name) or value for any of the parameters specified contains invalid character, it will be replaced by replace_char which is configurable from properties file.

Note Invalid character is the one that cannot be used for a file or folder name.

(e.g., / \ : * < > ? ” | for windows)

  • In case document doesn’t contains any of parameter specified (e.g., $Invoice Date && _ && $Invoice Total) or doesn’t contain value for any of the parameter,
    • Document files (multipage pdfs and tiffs) names will be retained as it is, in case of Export File Name.
    • Document files will be moved to folder named Unknown, in case of Export Folder Name.

Configuration

UI Configurations

User can configure Copy batch xml plugin from UI:

{Batch Class List} -> {Batch Class} -> Export -> COPY_Batch_XML
BatchClassManagement CopyBatchXMLPlugin.jpg
Properties Description:

 

Configurable propertyType of valueValue optionsDescription
Final Export FolderStringNAFolder Path where all the files (batch xml, multipage pdf, multipage tiff) are to be exported.
Export To Folder SwitchString
  • ON
  • OFF

 

Switch to decide whether or not to copy the batch files to the Final Export Folder. Default ON.
Export Folder NameStringNAFolder with this name will be created in the Final Export Folder and all the document files (multipage pdf(s) and tiffs) will be copied in this folder. Refer to Guidelines for entering Export Folder Name and Export File Name.
Export File NameStringNAAll the document files (multipage pdf’s and tiff’s) will be renamed based on the parameters specified in this property. Refer to Guidelines for entering Export Folder Name and Export File Name.
Batch XML Export FolderString
  • Batch Instance Folder
  • Final Export Folder

 

Folder where batch xml file will be copied. Possible Values:Batch Instance Folder: Batch XML files will be copied to batch instance folder (BI??) in the Final Export Folder.Final Export Folder: In this case, batch xml’s will be copied directly in the Final Export Folder.

Property File Configurations

Configuration for Replacing Invalid Character:

Property File Name: dcma-export.properties

Property file location: {Ephesoft_Home}/WEB-INF/classes/META-INF/dcma-export/*
Properties Description:

 

Configurable propertyType of valueValue optionsDescription
export.invalid_file_name_charactersStringNASemi-colon separated list of characters that will be treated as invalid characters for file names. Default value is /;\\;\:;*;<;>;?;”;| for windows environment.
export.replace_charStringNAInvalid characters will be replaced by export.replace_char.

Dependencies

CREATEMULTIPAGE_FILES plugin: This plugin is responsible for creating multipage pdf and tiff files which are copied by Copy batch XML plugin to Batch XML Export folder.

Multipage tiff files will be created only if Create Multipage Tiff Switch is ON in this plugin.

Troubleshooting

S no.Error messagePossible root cause
1Could not create folder.Batch instance folder could not be created in Final Export Folder. Check for permission on this folder.
2Folder does not exist.Folder specified for Final Export Folder doesn’t exist.

Create Multipage Files Plugin

Overview

The Create Multipage Files plugin by default is a part of export module. This plugin generates multipage PDF and TIF files for each document type of a batch inside final drop folder. This final drop folder path is a configurable property defined inside Copy Batch XML Plugin.

This plugin also generates colored, searchable and optimized PDF depending upon the configuration made.

Configuration

UI Configurations

Following are the list of configurable properties from UI:-

BatchClassManagement CreateMultipagefilesPlugin.jpg

 

Configurable propertyType of valueValue optionsDescription
PDF Optimization switchList of values
  • ON
  • OFF

 

This switch is used to create optimized PDF by adding web –view to PDF. This feature currently only works with Ghostscript.
Create Multipage Tiff SwitchList of values
  • ON
  • OFF

 

This switch is used to create multipage tiff files with the help of Imagemagick when the switch is turned ON.
Multipage File Export ProcessList of values
  • ITEXT
  • ITEXT-SEARCHABLE
  • HOCRtoPDF
  • IMAGE_MAGICK
  • GHOSTSCRIPT

 

This option provides user an option to select API to create multipage files.
Colored Output PDFList of values
  • TRUE
  • FALSE

 

This option provides the user an option to generate colored PDF as output.
Searchable Output PDFList of values
  • TRUE
  • FALSE

 

This option provides the user an option to create searchable PDF when this option is set to true.
PDF Creation ParametersStringNAThis option provides the user an option to define ghostscript parameters for creating PDF.
PDF Optimization ParametersStringNAThis option provides the user an option to define ghostscript parameters for creating optimized PDF.

Property File Configurations

Following are the list of configurable properties from property file located at ‘{Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-imagemagick/imagemagick.properties’:-

 

Configurable propertyType of valueSample ValueDescription
imagemagick.tif_compressionStringLZWThis property defines the compression mode to be used while creating multipage tiff.
imagemagick.pdf_qualityint100This property defines the quality of PDF which can vary from 0-100.
imagemagick.coloredStringTrueThis property is used to define whether multipage tiff will have colored or monochrome images
imagemagick.pdf_compressionStringLZWThis property defines the compression mode to be used while creating multipage PDF.
imagemagick.display_image_output_parametersString-colorspace gray -alpha offThis property defines imagemagick output parameters to be used while generating multipage tiff
imagemagick.max_files_processed_per_gs_cmdInteger75This property defines number of maximum files ghostscript can process to generate multipage PDF
imagemagick.height_for_pdf_pageInteger792This property defines height of PDF page while generating PDF using iText
imagemagick.width_for_pdf_pageInteger612This property defines width of PDF page while generating PDF using iText
imagemagick.max_files_processed_per_im_cmdInteger100This property defines number of maximum files imagemagick can process to generate multipage tiff

Steps of execution

  • This plug-in works in the export phase of the application when all processing on the batch has been done and it’s ready to be exported.
  • The plug-in creates multipage tiff or PDFs in the final drop folder for all document types in a batch.
  • After all the work is done, batch.xml is updated and batch is passed to other export plugins.

Dependency

This plugin requires hocr.xml file for creating searchable PDF. It has a dependency on one of the plugins from: ‘Recostar HOCR’/ ‘Tesseract HOCR’.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1IM4JAVA_TOOLPATH is not set for converting images using image magicEnvironment variable for Image Magick is either not set.
2Environment Variable GHOSTSCRIPT_HOME not set.Environment variable for Ghost Script is either not set.

Create Thumbnails Plugin

Overview

This plug-in is used to create the thumbnail image of the batch images. Two types of thumbnails will be generated by this plugin:

Display Thumbnails: These thumbnails are displayed in Review and Validate screen, where pages in the documents are shown as thumbnails under the document name.

Compare thumbnails: These thumbnails are used by classify images plugin to classify the pages.

By default, this plugin is added in the page process module.

Configuration

Setting the plugin configuration

The above mentioned configurable properties can be edited at following UI:

Edit Batch Class  Edit Page Process module Edit CREATE_THUMBNAILS Plugin BatchClassManagement CreateThumbnailsPlugin.jpg

 

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable propertyType of valueValue optionsDescription
Create Thumbnails SwitchString
  • ON
  • OFF

Default Value: ON

Setting it to ON/OFF to determine whether compare thumbnails will be created or not. Display thumbnails will always be created.
Create Thumbnails Display Thumbnail TypeString.pngDetermines in which format the thumbnail image should be displayed. It’s a non-editable property.
Create Thumbnails Compare Thumbnail TypeString.tifThe format of the image type created used for comparing with the display thumbnail type. It’s a non-editable property.
Create Thumbnails Display Image HeightIntegerAny Integer value.
Default value: 200
Sets the height of the thumbnail image.
Create Thumbnails Display Image WidthIntegerAny Integer value.
Default value: 150
Sets the width of the thumbnail image.
Create Thumbnails Compare Image HeightIntegerAny Integer value.
Default value: 200
Sets the height of the compare thumbnail image.
Create Thumbnails Compare Image WidthIntegerAny Integer value.
Default value: 150
Sets the width of the compare thumbnail image.
Create Thumbnails output Image ParametersStringValid parameters for Image Magick.
Default Value: -colorspace gray
This property is used if the user wants to input something additional, to be processed by Image Magick

Dependency

The plugin is dependent on Import Batch Folder Plugin. Import Batch Folder plugin copies the batch files from UNC folder to the Ephesoft System Folder.

Troubleshooting

S. No.
Error Message
Description
1Problem generating thumbnails.Check if IM4JAVA_TOOLPATH environment variable is set correctly.

Db Export Plugin

Overview

This plug-in is responsible for saving the data of document level fields for a particular batch instance to the external or same database. It takes the mapping file provided for the plugin and creates a SQL query to insert the mapped document level field into the mapped table.

Configuration

Configurable properties screenshot

BatchClassManagement DBExportPlugin.jpg

 

Configurable properties

Following are the configurable properties available with the plugin:

 

Configurable property
Type of value
Value options
Description
Database Export SwitchList of values* ON

  • OFF

 

The switch that defines whether this plugin will run or not. Default value is “OFF”
Database Connection URLStringA valid database connection URL.The database connection URL corresponding to the selected driver.
Database DriverList of values* net.sourceforge.jtds.jdbc.Driver

  • com.microsoft.jdbc.sqlserver.SQLServerDriver
  • com.mysql.jdbc.Driver

 

Type of driver to be used for database connection.
Database User NameStringA valid username value to connect to databaseSQL account username.
Database PasswordStringA valid password value to connect to databaseSQL account password.

Mapping File

  • Mapping file for this plugin is stored for each batch class at the following path:
    • <SHARED_FOLDER_PATH>\<BATCH_CLASS_IDENTIFIER>\db-export-plugin-mapping\db-export-mapping.properties
  • Its contents should in the following syntax:
    • <Document Type>.<Document Level Field Name>=<Database Table Name>:<Database Table Column Name>
    • For e.g.:
      • Invoice.type=testTable:invoiceType
      • Invoice.sender=testTable:invoiceSender
      • Invoice.receiver=testTable:invoiceReceiver
      • Invoice.total=testTable:invoiceTotal

Dependency

The plugin requires the following prerequisites:

  • Plugin does depend on any other plugin. But desired output comes only when the document level field has some extracted value.
  • A table with name as provide in the mapping file must be created with the following structure:

 

Field Name
Null allowed
BATCH INSTANCE ID
NO
BATCH CLASS ID
NO
DOCUMENT TYPE
NO
DOCUMENT LEVEL FIELD
NO
VALUE
YES
  • If the “Database Export Switch” is ON, then the mapping provided should be correct. Invalid mapping will result in batch going to error.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no.</center>Error message</center>Possible root cause</center>
1Error in parsing DB Export Plugin mapping file, FileNotFoundExceptionThe “db-export-mapping.properties” file is not located at the “<SHARED_FOLDER_PATH>\<BATCH_CLASS_IDENTIFIER>”
2Error in parsing DB Export Plugin mapping file, NoSuchElementExceptionOne or more properties “db-export-mapping.properties” is in incorrect syntax.
3Problem occurred in updating database tableOne of the following reasons caused this:

  • Database connection setting is incorrect
  • Error occurred while writing values to DB.

 

4Error in initialising Hibernate ConnectionDatabase connection settings are invalid.

Document Assembler Plugin

Overview

This Plugin is responsible for forming documents from single pages. This plugin reads all the pages present at the document type “Unknown” and on the basis of page level fields, creates new documents. The create document plug-in will review page level field results and decide which page is the first page and what is the document type based on page_level_index fields.

Ephesoft supports 5 types i.e. barcode, search, and image, automatic and searchable PDF classification. It also assumes that only one type of classification can be applied at a time for a batch. Also, User can select ‘Automatic Classification’ which should operate like Search classification but it should include top results from Barcode and Image classification as well. Default configuration provided in property file in the order starting from Barcode, then Image and then Lucene search classification.

  • Barcode classification: In barcode classification, Ephesoft are forming document type on the basis of the bar code present in the processing document and document provided for sample on the time of learning.
  • Search classification: In search classification, Ephesoft are forming document type on the basis of text found on the images using lucene. While learning HOCRing is done of the image samples provided in the batch class data. Data is compare of the HOCR files and the sampled HOCR files.
  • Image classification: In Image classification, Ephesoft are forming document type on the basis of their image samples provided on the learning time. Image search classification is done using superimposing of two images and fetches the best match for it.
  • Automatic classification: In Automatic classification, Ephesoft are forming document type on the basis of top results from Barcode and Image classification as well. Default configuration provided in property file in the order starting from Barcode, then Image and then search classification.
  • Searchable PDF classification: In Searchable PDF Classification, this classification is only for searchable batch class. Ephesoft are assuming Searchable batch class having single document type if not than first document type is set to all the documents and merged into single document.

Configuration

Property File Configuration

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-docassembler/dcma-document-assembler.properties

 

Configurable property
Type of value
Value options
Description
da.barcode_classification
String
Barcode (default)
This field is used to specify the barcode plugin name.
da.lucene_classification
String
Search_Engine_Classification (default)
This field is used to specify the Search Engine Classification plugin name.
da.image_classification
String
Image_Compare_Classification (default)
This field is used to specify the Image Compare Classification plugin name.
da.automatic_classification
String
Automatic_Classification (default)
This field is used to specify the Automatic Classification plugin name.
da.first_page
String
First_Page (default)
This field is used to specify the First Page name.
da.middle_page
String
Middle_Page (default)
This field is used to specify the Middle Page name.
da.last_page
String
Last_Page (default)
This field is used to specify the Last Page name.
da.automatic_include_list
String
Barcode;Image_Compare_Classification;Search_Engine_Classification
This field is used to specify the order of classification type using semicolon separator.

UI Configuration

Document Assembler plugin can be configuring from at following UI:

BatchClassManagement DocumentAssemblerPlugin.jpg

 

Configurable property
Type of value
Value options
Description
DA Barcode confidence
Integer
0-100
This field is used to specify the barcode confidence.
DA Rule First-middle-last Page
Integer
0-100
This field is used to specify the confidence for first, middle and last page.
DA Rule First Page
Integer
0-100
This field is used to specify the confidence for first page.
DA Rule Middle Page
Integer
0-100
This field is used to specify the confidence for middle page
DA Rule Last Page
Integer
0-100
This field is used to specify the confidence for last page.
DA Rule First-last Page
Integer
0-100
This field is used to specify the confidence for first and last page.
DA Rule First-middle Page
Integer
0-100
This field is used to specify the confidence for first and middle page.
DA Rule Middle-last Page
Integer
0-100
This field is used to specify the confidence for middle and last page.
DA Classification Type
List of values
* Search Classification

  • Barcode Classification
  • Image Classification
  • Searchable Pdf Classification
  • Automatic Classification

 

This value decides the document classification type to be used for classification.
DA Merge Unknown Document Switch
List of values
* ON

  • OFF

 

This value decides the weather the unknown document to be merged with pre classified document or not.

Steps of execution

    • This plug-in works in the document assembler phase of the application when the entire page processing on the batch has been done and it’s ready to be exported.
    • The plug-in use the page classified in the page processing module as an input and generates the merged and classified document as an output.
    • After all the work is done, if DA Merge Unknown Document Switch is ON, it merged the unknown document left due to lesser confidence to the previous classified document.

Dependency

The plugin assumes the page processing for the incoming batch has been done properly. Afterwards this plugin will merge the page and create the document for the classified pages into the page processing module.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1Invalid format of page level fields. Doc Field Type found for {Document Assembler Classification Type} classification is null.
  • Page level fields weren’t present on the batch.
  • Page processing module didn’t work properly.

 

2Document Type name is not found in the data base for the page type nameBarcode decoded value is not found as document type in the Ephesoft Application database.
3No Document type defined for batch instanceBatch class doesn’t have document type for classification.
4Invalid integer for barcode confidence score in properties file.Invalid value for “DA Barcode confidence” at Ephesoft Admin Screen Configuration.

Docushare Export Plugin

Overview

This plug-in is used for exporting zipped file for a batch. It transforms the batch xml to another xml format acceptable by Docushare CMS and zips it along with multipage pdf to Docushare export folder location.

Steps of execution

    • This plug-in works in the export phase of the application when all the processing on the batch has been done and it’s ready to be exported.
    • The plug-in makes use of a predefined xml to convert the batch xml file into a Docushare supported format. And name the new xml file according to the user specified value.
    • It then group pdf file associated with the batch.
    • After all the work is done, it makes a zip file of all the content and name the file according to the user specified value.

Configuration

Property File Configuration

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-docushare-export/dcma-docushare-export.properties

Following are the list of configurable properties for the plugin:

 

Configurable property
Type of value
Value options
Description
docushare.final_export_folderStringdocushare-export-folder (default folder is <Shared folder directory>\ SharedFolders \DOCUSHARE-export-folder)This field stores a string value of folder in which the zipped file will be exported after transformation in desired format.
docushare.final_xml_nameString_docushare.xmlThis value holds name of the batch xml finally created.
docushare.zip_file_nameString_docushare.zipThis value holds name of the zip file finally created.
docushare.switchList of values* OFF

  • ON

 

This property determines whether the plug-in will run or not.

Dependency

The plugin assumes the extraction for the incoming batch has been done properly and just changes the results of provided batch.xml in a desired format.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1.Problem in zipping directoryExport folder name is invalid, i.e. either

  • not present
  • is not a directory

 

2.Could not find xsl fileXsl file is not present in classpath resource
3.Problem occurred in transforming
  • Xsl file is not present
  • Problem in transforming

 

Fuzzy Db Extraction Plugin

Overview

Fuzzy DB plugin is used to extract the document level fields of a document from records in the database on the basis of the matched value of the HOCR content or the previously extracted value of a document level field. This plug-in involves creation of search engine based indexing and extracting document level field value based on fuzzy match of HOCR content against index. User can configure any Vendor database in order to capture Vendor name, Vendor ID or any other field from the incoming invoices. This can be done simply by mapping the document to the Vendor database table and the index fields of the document to the columns in the database table. The plugin will find the matching vendor from the database and update the fields in the document.

Configuration

Configurable properties

Following are the configurable properties available for the Fuzzy Db plugin:

 

Configurable property
Type of value
Value options
Description
Minimum Word LengthIntegerN-AThe minimum word length below which words will be ignored from the HOCR content.
Minimum Term FrequencyIntegerN-AThe frequency below which terms will be ignored in the source document.
Minimum Doc FrequencyIntegerN-ASets the frequency at which words will be ignored which does not occur in at least this many documents
Maximum Query TermsIntegerN-AThe maximum number of query terms that will be included in any generated query.
Database PasswordStringA valid password value to connect to databaseThe password for connecting to the user SQL account.
Database User NameStringA valid username value to connect to databaseThe username for connecting to the user SQL account.
Database DriverList of values* net.sourceforge.jtds.jdbc.Driver

  • com.microsoft.jdbc.sqlserver.SQLServerDriver
  • com.mysql.jdbc.Driver

 

The database driver to be used, this will DBMS specific.
Database Connection URLStringA valid database connection URL.The database connection URL required for connection, this will DBMS specific.
Minimum Confidence ThresholdIntegerN-AMinimum threshold value required for a Fuzzy Db row to be selected for Fuzzy Extraction.
Date FormatStringN-ADate format to be used for identifying the date field
No Of PagesIntegerN-AMaximum Number of pages to be included while querying for the content
Option To Include PagesList of values* ALLPAGES

  • FIRSTPAGE

 

Determines whether all the pages or the first page of the document will be chosen for fetching the HOCR content.
FuzzyDB Extraction switchList of values* ON

  • OFF

 

Determines whether or not the fuzzy extraction should work or not.
Query DelimitersStringN-ADelimiters to be used while using the fuzzy text search in the validation phase.
Ignore Words ListMulti select* Name

  • Title

 

List of words to be ignored from HOCR content
Fuzzy Extraction Search Columns based on FieldsStringN-AThis property defines the name of the Document Level Field for which the user wants to search. E.g. for “$City, $State” The values of the “City” and “State” DLFs would be queried in the learnt indexes and appropriate row for database table is returned. DLFs for the concerned document are populated accordingly.
Fuzzy Extraction HOCR SwitchList of values* ON

  • OFF

 

This property defines if no value corresponding to the above mentioned column is found, whether or not to continue searching the complete HOCR content. ON signifies whether to continue searching with HOCR content in case the value specified in “Fuzzy Extraction Search Columns based on Fields” is not found. OFF signifies to search on the values extracted by previous extraction plugin in case the value specified in “Fuzzy Extraction Search Columns based on Fields” is not found.

Steps for configuring the plugin

  • User can select the batch class module and navigate to fuzzy DB plugin configuration page as shown below:

BatchClassManagement FuzzyDBPlugin scrollup.jpg

BatchClassManagement FuzzyDBPlugin scrolldown.jpg

The User can edit the above settings by clicking on “Edit” in order to connect to the vendor database.

  • User can map the document type to a database table by clicking on “Mapping” as shown below:
    • The document type can be mapped to a database table (having data records to be indexed) for the list of tables provided.

BatchClassManagement FuzzyDB DatabaseMapping.jpg

    • The document level fields can be mapped to table columns for extraction.

BatchClassManagement FuzzyDB DatabaseMapping TableMapping.jpg

  • Once the mapping is defined, the user can click on “Learn DB” to create indexes of all the records present in the database.
    • Lucene indexing is generated against all database records belonging to all document types which have been mapped for current batch class. Only mapped columns are indexed.
    • Indexes are built on a string which is the combined text of all the fields mapped to various columns of the database table.
    • Separate index directories are created to store indexes per document type per batch class. The hierarchy used for storing index files against each document level field is: <Shared-Folder-Path>\<Batch-Class>\fuzzydb-index\<Database-Name>\<Table-Name>.

Steps of execution

  • Plug-in uses HOCR content of a document and generate a query comprising of the keywords based on their occurrence in the document. It then compares the HOCR based query against indexes on DB table rows.
  • Lucene returns the matching records among which the record with the highest confidence score is selected. If the score is greater than the threshold then the corresponding values will be stored in document level fields’ values in batch xml file.
  • Following are cases that can occur in execution of the plugin:

 

“FuzzyDB Extraction switch” Value“Fuzzy Extraction Search Column” Value“Fuzzy Extraction HOCR Switch” ValueResult
OFFN.A.N.A.No Fuzzy Extraction.
ON<Empty>N.A.Usual Fuzzy Extraction using HOCR content.
ON“$City,$State”OFFSearches the value of “City” and “State” document level fields extracted by previous extraction plugins and search for them in the learned Lucene content and if some data is found, it is used else the data from previous extraction remain.
ON“$City,$State”ONsearch the value of “City” and “State” document level fields extracted by previous extraction plugins and search for them in the learned Lucene content and if some data is found, it is used else the usual Fuzzy Extraction using HOCR content is done.

Dependency

  • Lucene engine is used over the SQL query for fetching every word in the html file as it provides an edge in terms of speed and efficiency. SQL query would be too slow and furthermore Lucene will provide results even if the OCR is not perfect on every character in the word.
  • It is possible that query might not give any results. In such cases, no document level field is updated.
  • It is possible that query might give multiple results. In such cases, the one with the highest confidence score entry will be used to populate document level fields.
  • The plug-in does not involve manual intervention and will be an automated step.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1CorruptIndexException while reading IndexThe lucene indexes are either locked or corrupted.
2The base fuzzy db index folder does not exist. So cannot extract database fields.Fuzzy database has not been learned yet.

HTML TO XML

Overview

Earlier to 3030 release HTML TO XML generation plugin creates HOCR xml file using HTML file created by RECOSTAR_HOCR/TESESERACT plugin. HOCR files are generated by thread pool executor but in 3030 RECOSATR_HOCR/TESSERACT plugins directly generate HOCR xml file corresponding to image file. So now this plugin is obsolete.

Other plugins use this HOCR xml file to read the image data.

Configuration

  • Property File:{Ephesoft-install-dir}/WEB-INF/classes/META-INF/dcma-core/dcma-core.properties/*
  • Property:thread.pool_size=5

 

Configurable property
Type of value
Value options
Description
thread.pool_sizeStringPositive integer valueThis field stores a string value for thread.pool_size field. This property will govern how many files will be processed simultaneously.

Dependencies

One of the below two specified plugins must be ON to generate HOCR Xml files:

  • RECOSTAR_HOCR
  • TESSEARCT_HOCR

Classify Images Plugin

Overview

This plugins is responsible for classifying the Ephesoft documents using image comparison algorithm using imagemagick.

This plugin is working on the two stages for classification of document:

  • Learning: Learning process is done generating indexes for documents. Generated indexes will be used as classifying the document. For further information of learning, please refer the document “Learning document”.
  • Classification: While classifications a document using classify images plugin, learnt data is used as reference data for classification of document. While classification a document type, this plugin use the image for super impose on the learnt images and generate confidence on the basis of it.

Configuration

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable propertyType of valueValue optionsDescription
Classify Image SwitchString
  • ON
  • OFF

 

This property is used to ON/OFF the plugin.Default ON.
Classify Image Max ResultIntegerNAThis property is used for storing the maximum result classified from the input image into the batch.xml
Classify Image Comparison MetricStringEx: RMSEThis property is used to comparison the learnt images with the input images provided for classification.
Classify Image Fuzz PercentageIntegerNAThis property is used to fuzz distance approach while classification image using image-magick.

This is shown in the screen shot given below:

BatchClassManagement ClassifyImagesPlugin.jpg

Steps of execution

  • This plug-in works in the page process phase of the application when all the import processing on the batch has been done and it’s ready to be page processing.
  • Learning should be done on the batch class before using this plugin.
  • The plug-in classifying the input images via image based classification via imagemagick.
  • After all the work is done, it writes the information into batch.xml file for the document type being classified.

Dependency

This plugin is part of page processing module and working after successful completion of import module.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1
<center>Exception while executing the compare command.<center>Configurable parameter is having invalid values.
2Learning not done for batch class sample folder path.Learning is not done for the batch class.

Import batch folder plugin

Overview

Import Batch Folder plugin copies the batch files from <Ephesoft shared folder>\<batch class UNC folder > to the <Ephesoft local folder >. This plugin creates a folder in <Ephesoft local folder> with name batch instance folder (BI<Batch Instance identifier>) and copies the batch instance files to that folder.

Only files with valid extensions will be moved to the UNC folder.

BatchClassManagement ImportBatchFolderPlugin.jpg

Configuration

Properties File

Properties file location: <Ephesoft installation path>\Application\WEB-INF\classes\META-INF\dcma-import-folder\dcma-import-folder.properties.

Properties Description:

 

Configurable propertyType of valueValue optionsDescription
import.invalid_char_listStringN-AList of characters ignored for file name is defined separated by semi colon.

Configurable properties

Following are the configurable properties available with the plugin:

 

Configurable propertyType of valueValue optionsDescription
Folder importer valid extensionsStringDefines a list of supported file extensions. Multiple values will be “;” separated. Default value “tif”.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1Invalid characters present in folder name.If folder/file name contains invalid character then batch will go into error.
2Could not find valid Extensions properties in the property file.“Folder Importer Valid Extensions” property doesn’t exist for the plug-in.

Import multipage files plugin

Overview

Import Multipage Files plugin is required when running a batch on multipage images. This plugin will break the multipage pdf’s and tiffs into multiple single page tiffs. Multipage pdf’s will be converted to single page tiffs using ghostscript whereas multipage tiffs will be converted to page single page tiffs using imagemagick.

BatchClassManagement ImportMultipageFilesPlugin.jpg

 

Configuration

UI Configuration

IMPORT_MULTIPAGE_FILES properties can be edited at following admin UI:

 

Configurable propertyType of valueValue optionsDescription
IM Convert Input Image ParametersStringN-AInput parameters for imagemagick command that should be used for multipage tiff to multiple single page tiffs conversion.
Multi Page ImportList of values
  • YES
  • NO

 

Switch for multipage files import plugin. If set to NO, multipage files (pdf and tiff) will not be converted to multiple single page tiffs.
IM Convert Output Image ParametersStringN-AOutput parameters for imagemagick command that should be used for multipage tiff to multiple single page tiffs conversion.
Ghostscript Image Parameters:StringN-AParameters for ghostscript command that should be used for multipage pdf to multiple single page tiffs conversion.

Property File Configuration

Property File location: <Ephesoft-Installation-Path>\ Application\WEB-INF\classes\META-INF\dcma-import-folder\dcma-import-folder.properties\*

 

Configurable propertyType of valueValue optionsDescription
import.folder_ignore_char_listStringN-ASemi colon separated of characters that are to be replaced in the file names encountered by the plugin.
import.ignore_replace_charStringN-ACharacter specified here that will replace the characters mentioned in “import.folder_ignore_char_list” for the file names encountered by the plugin.

Optimization parameters and results

“-sDEVICE” parameter

  • -sDEVICE=tiff12nc

Produces 12-bit RGB output

  •  -sDEVICE=tiff24nc

Produces 24-bit RGB output

  •  -sDEVICE=tiff48nc

Produces 48-bit RGB output

  •  -sDEVICE=tiff32nc

Produces 32-bit CMYK output

  • -sDEVICE=tiff64nc

Produces 64-bit CMYK output

  • -sDEVICE=tiffscaled24 -sCompression=lzw

Produces a 24 bit RGB image and allows the use of a special compression tag along with it which allows us to compress the size of the image.

  • -sDEVICE=tifflzw

Produces black-and-white output and can be combined with various compression options.

  • Following are the results of images produced by splitting a PDF with the given specifications under different Ghost Script parameters:

Results

  • PDF Size: 514Kb
  • Number of pages in PDF: 26

Note: PDF contained mixture of colored and B/W images

 

-sDEVICEType of outputSize per image produced(in KB)Total images size(in MB)
tiff12ncSame type of images12,241325
tiff24ncSame type of images25,446626
tiff48ncSame type of images51,1481258
tiffscaled24 -sCompression=lzwSame type of images250-4006.75
tifflzwAll images converted to B/W50-901.4

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1Invalid property file configurationThe following properties located in “<Ephesoft-Installation-Path>\ Application\WEB-INF\classes\META-INF\dcma-import-folder\ dcma-import-folder.properties” file in are empty:

  • import.folder_ignore_char_list
  • import.ignore_replace_char

 

2Converted Tiff files count not equal to the TIFF pages count.The number of pages in PDF/Multipage Tiff is not equal to the converted tiff files.

Key Value Learning Plugin

Overview

This plugin is used to generate Advanced KV pairs to make the data extraction more appropriate based on past data extracted by the user manually. It keeps track of the data which is extracted manually by the user by populating DLFs directly from the 3rd panel image. Based on this, it generates advanced KV pairs using regular expressions defined in property files and save it for corresponding document types. Its properties can be configured using an ON/OFF switch from admin UI and property files: ‘dcma-key-regex.properties’, ‘dcma-key-value-location.properties’ and ‘dcma-value-regex.properties’ defined in META-INF.

This plugin will iterate over each document level field of each document. First, it will match the value of document level field with the regex patterns defined in the properties file. Most matched regular expressions will become the value pattern for that field which is picked from the properties file. This document level field value is then searched in the OCR data {HOCR file} for that page of the document.

If value is found successfully, it will search key value in all the eight directions as a location and try to match it with the regex patterns defined in the properties file. Most matched regular expression will become the key pattern and as it is found in the left of value (i.e., value exists in right of the key), location will be set as RIGHT. If no value is present in left, plugin will consequently search its top, right, bottom and other locations and match it to the regex patterns in the properties to get the key pattern and accordingly set the location.

Note: Location is set here for processing purpose only. This location has no link with the ‘Location’ field displayed in Advanced KV pairs. Location field value will always be empty for generated advanced KV pairs.

  • If any value is not matched to any of the regex pattern, value itself will be set as the key pattern of this field.
  • Application will search the key locations in below order that can be configured through semi colon separated in the property files. As soon as it will able to find first value it will take that location:
  • LEFT
  • RIGHT
  • TOP
  • BOTTOM
  • TOP_RIGHT
  • TOP_LEFT
  • BOTTOM_RIGHT
  • BOTTOM_LEFT

Multi word support for KV Learning

Key Value Learning plugin in Export module automatically creates a Key Value field corresponding to a document level field.

This enhancement allows multi words to be used for generation for key pattern in Key Value Learning plugin in Export module. If any word is found close to the key, it will be appended to the key and will be used for the key pattern generation.

Note:

Keys will be appended left for location LEFT, BOTTOM, TOP, BOTTOM_LEFT and TOP_LEFT, and appended right for location BOTTOM_RIGHT and TOP_RIGHT.

Configuration

Property File Configuration

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-key-value-location.properties

 

Configurable propertyType of valueValue optionsDescription
key_value.location_orderStringLEFT;RIGHT;TOP;TOP_LEFT;TOP_RIGHT;It is a semi-colon separated list of location. It represents the order of location in which key will searched in the image. Locations specified are of key with respect to value.
key_value.max_number_recordIntegerNAIt represents maximum number of key value pairs that can be present for any DLF. If any DLF has already this maximum number of key value fields defined, this plugin will not add any more key value pair to this DLF. Default Value is 50
key_value.tolerance_thresholdInteger
  • A
  • B
  • C

 

Length and width of the value rectangle created by the plugin will be increased by this tolerance value (width + (width*tolerance)/100). For example, if calculated width of plugin is 100 pixels and tolerance specified is 10, resultant width will be 110 pixels.
key_value.multiplierIntegerInteger valueThis property holds an integer value which decides on <some logic>. (Also mention range if applicable)
key_value.fetch_valueString
  • FIRST
  • LAST
  • ALL

 

Fetch value for key value field that is being created by the plugin. Default Value supplied is FIRST.
key_value.min_key_char_countIntegerNAMinimum number of characters that must be present in the extracted key. Default value is 4.
key_value.gap_between_keysIntegerNAAny word found left or right (depending on the location of Key found with respect to Value) will be considered for key depending on its distance with respect to the key. Default value is 50. See below example.

Example:

Consider image contains following data:

Invoice Date: 05/02/2012Invoice Number: 99888888

Following is the location order specified in property file:

LEFT; RIGHT; BOTTOM_LEFT; BOTTOM_RIGHT; TOP; BOTTOM; TOP_RIGHT;

If 99888888 is a value for Invoice Number document level field, “Number” will be first extracted as a key. Algorithm will search for left of “Number”, if gap between “Invoice” and “Number” is less than the value specified forkey_value.gap_between_keys“Invoice Number” will be used for key pattern generation, and else only “Number” will be considered.

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-key-regex .properties

This property file contains regular expressions that can be used for key pattern generation.

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-value-regex .properties

This property file contains regular expressions that can be used for value pattern generation.

UI Configuration

Key value learning can be turned ON/OFF from at following UI:

BatchClassManagement KeyValueLearningPlugin.jpg

 

Configurable propertyType of valueValue optionsDescription
Key Value Learning SwitchList
  • ON
  • OFF

 

Set it to ON/OFF depending on whether plugin needs to be executed or not.

Dependencies

Key value learning plugin depends on following two plugins:

  • RECOSTAR_HOCR
  • TESSERACT_HOCR

One of the above plugins must be ON for key value learning as these plugins extract data from the image and create HOCR file which is required for the Key Value learning.

Frequently Asked Questions

Question: Key value field not added to the document level field after plugin execution.

Answer: There could be multiple reasons for key value field not created after plugin execution:

Reason 1: Maximum key value fields have been already been added to the document level field.

Solution: Check the value for key_value.max_number_record. Default value provided is 50.
Reason 2: Key found during extraction has less number of characters than minimum number of characters required for key.

Solution: Check for the key_value.min_key_char_count property. Its default value supplied is 4.

Reason 3: The key value location order property is not defined.

Solution: Check for the value of property key_value.location_order. It should have required location specified.
Question: Key value field added but is not accurate.

Reason: One possible reason for such an issue is location order specified is not as per the requirement.

Solution: Check for the key_value.location_order property. Most probable value for key with respect to value should be specified first in the list.

 

Key Value Extraction plugin

Overview

‘Key-value pair’ based extraction plug-in will be responsible for extracting document level index field values based on relative location of ‘value’ against a specified key. There are two modes for KV extraction: Simple and Advanced KV Extraction.

Plugin working

Input and output parameters

Input

  • Document Pages and corresponding HOCR
  • Document level fields
  • Plug-in Configuration
    • Key (Regular Expression)
    • Value (Regular Expression)
    • Location (left, right, top, bottom, top left, top right, bottom left, bottom right)

Output

Document level fields, values and alternative values updated in batch.xml.

Steps of execution

Plug-in execution for a batch instance will consist of following steps:

  • Extraction plug-in will iterate over all documents belonging to a batch instance and for every document, based on ‘document type’ it will fetch the list of document level index field. Every document level field will have association to multiple instances of extraction filters.
  • Pages (HOCR corresponding to page) belonging to that document will be parsed to generate an in-memory matrix having all word and corresponding co-ordinates, against each page (exact structure of matrix will be figured out while doing detailed design). Intent of generating this matrix is to improve performance, as this matrix will be generated once for all pages of a document and will be used for key / value pattern matching for all document level fields belonging to that document.
  • For every document level field, regular expression against ‘KEY’ will be searched against page level in-memory matrix already created in previous step
  • If regular expression based search for “KEY” returns one or more matched words, regular expression against ‘VALUE’ is evaluated against words located at specific relative location to the key (as governed by LOCATION attribute of extraction filter or by value zone created at advanced KV screen). This is done for all occurrence of KEY on every page level matrix.
  • Zero or more match found against VALUE regular expression will be used to update batch.xml as Document level field value (and alternate values).

Simple KV Extraction

Here ‘key’ and ‘value’ both are regular expression in itself.

Each key value field consists of following attributes in simple KV extraction:

  • Key Pattern: Regular expression pattern for the key.
  • Value Pattern: Regular expression pattern for the value.
  • Location: Specifies the location of value with respect to key. Possible values are left, right, top, bottom, top left, top right, bottom left, bottom right.
  • No of words: Specifies number of words that will be extracted to the right of value that is extracted by the value regular expression.

Example: Suppose there is document level field Date, and image contains following data:

Date: 01/01/2012

While defining the simple key value field for Date,

  • Date should be entered as key pattern.
  • [0-9]{2}/[0-9]{2}/[0-9]{2,4} should be entered as a value.
  • Location should be entered as right.

Advanced KV Extraction

Admin user can also define KV pair patterns using rectangular coordinates from Admin UI. Admin is provided with ‘Advanced Add’ and ‘Advanced Edit’ buttons to define and modify the KV patterns.
As soon as user will click on any of the above specified buttons, another UI will open up with following options with text boxes and labels displayed:
  • Key Pattern (regex or other pre-defined field)
  • Value Pattern (regex)
  • Multiplier (0 to 1; multiplied with confidence score value to calculate new confidence score)
  • Fetch Value (First, Last or All)
  • Page Value (First, Last or All)
  • Length of the rectangle (in pixels)
  • Width of the rectangle (in pixels)
  • x-offset (in pixels)
  • y-offset (in pixels)

Out of the above properties, the Key Pattern, the Value Pattern, the Multiplier (0 to 1), the Fetch Value and the Page Value are to be defined by user whereas, the length of the rectangle, width of the rectangle, x-offset and y-offset are auto generated.
Also there will be Capture Key and Capture Value buttons to define relative key and pattern coordinates respectively.
Page Value: User can specify following page value while defining advanced key value pair:

  • ALL: KV Extraction will be performed on all pages of the document.
  • FIRST: KV Extraction will be performed on first page of the document.
  • LAST: KV Extraction will be performed on last page of the document.

Fetch Value: User can specify following fetch value while defining advanced key value pair:

  • First: to extract only first data from the value zone matching the value pattern specified.
  • Last: to extract only last data from the value zone matching the value pattern specified.
  • All: to extract only all data from the value zone matching the value pattern specified.

Capturing Key and Value

Using browse button image can be uploaded for which coordinates of key and value are defined.

Table KVExtraction.jpg Overlay for key and value is captured using “Capture Key” and “Capture Value” button

On the basis of relative key and pattern coordinates, Document level field is extracted by KV extraction plugin.

Anchor Key Value

This functionality is added as an enhancement to existing advanced KV extraction. It aims to utilize the result of previously extracted document level fields for extraction of other document level fields. User can use previously defined field as a key while defining advanced key value field for some other document level field.
User can use previously defined field as a key while defining advanced key value field for some other document level field.

  • There is a “Use Existing Field For Key” checkbox present on advanced KV extraction UI.

UseExistingFieldForKey.jpg

  • On checking this, a list will be populated with the names of document level fields that can be used as a key.

DLFUsedAsKey.jpg

User can select any of those fields as key.

Note: Only those document level fields will be shown in drop down whose field order number is less than the field order number of the field for which key value pair is being defined.

  • While defining the advanced key value pair for the document level field, user needs to capture key and value rectangles.
  • If “Use Existing Field For Key” check box is selected, value of the field selected as key should be captured. This is required to calculate the X-Offset and Y-Offset for the KV field.

Example: Suppose there are two document level fields State and City, and image contains following data:

State: CALIFORNIA

City: LA

While defining the advanced key value field for City,

  • Use existing field for key should be checked.
  • State should be selected from the drop down for key pattern.
  • CALIFORNIA should be captured as key.
  • LA should be captured as a value.

Editing Overlays in Advanced KV Extraction:

Functionality to edit key and value overlays on the Advanced KV Extraction Screen is also there.

Once the key has been captured using the Capture Key button, the Edit Key button gets enabled. Similarly, once the value has been captured using the Capture Value button, the Edit Value button gets enabled.

Once “Edit Key” or “Edit Value” has been clicked, all the other options become disabled on the screen.

While editing overlays for key and value, only one side of the rectangle forming the overlay becomes free for editing. Hence, there are four sides (of the rectangle) that can now be edited. To edit any side, the user now needs to click closest to that side and in the area formed by the parallel lines formed by extending its adjacent sides.

The following snapshots explain a use case where a user intends to edit the right hand side of the overlay formed for the key:

The following snapshot shows a captured Key and Value pair:

KVExtraction CapturedKey.jpg

To edit the Key overlay the user will click on the “Edit Key” button and the screen will appear as shown in the following snapshot:

KVExtraction EditKey.jpg

User can now click on any side of the key to adjust its size. Similarly, value rectangle zone can be adjusted.

 

Configuration

These are the following configurable property for KV extraction

 

<center>Configurable property
Type of value
Value options
Description
Regex Confidence ScoreString0 to 100Regex confidence score for key value extraction
KV Extraction switchMulti select
  • ON
  • OFF

 

KV extraction switch

KVExtractionSwitch.jpg

Simple KV Extraction

Admin can configure the simple KV extraction rule by clicking Add or Edit from following UI:

KeyValueFieldsListingAddOrEdit.jpg

These are the following configurable property for simple KV extraction

 

Configurable property
Type of value
Value options
Description
Key PatternStringNARegular expression pattern for the key
Value PatternStringNARegular expression pattern for the pattern
LocationInteger0 to 100Specifies the location of value with respect to key. Possible values are left, right, top, bottom, top left, top right, bottom left, bottom right.
No of wordsintegerInteger valueSpecifies number of words that will be extracted to the right of value that is extracted by the value regular expression.

As soon as add or edit button is clicked, following screen is shown where user enter value for different fields. KVExtractionConfiguration.jpg

  • Key Pattern: Enter regular expression in text box.
  • Value Pattern: Enter regular expression in text box.
  • Location: Select location from drop down.
  • No of words: Enter an integer value in the text box. (Default value is 0)

Advanced KV Extraction

These are the following configurable property for advance KV extraction

 

Configurable property
Type of value
Value options
Description
Use existing Field For KeycheckboxNAEnable to use value of other field defined as key
Key PatternStringNARegular expression pattern for the key
Value PatternStringNARegular expression pattern for the pattern
MultiplierInteger0 to 100Non-mandatory field that can have values between 0 and 1. Its value is multiplied with confidence score value to calculate new confidence score during extraction.
Fetch ValueStringALL, FIRST, LASTDrop down with following possible values: ALL, FIRST, LAST. Default value: FIRST
Page ValueStringALL, FIRST, LASTDrop down with following possible values: ALL, FIRST, LAST. Default value: FIRST

Advanced KV extraction field is configurable from following UI:

KVExtractionConfigurationFromUI.jpg

To capture key and value, draw a rectangle on image using right button click of mouse. Overlay will be drawn at UI. After drawing a rectangle, user can need to click on Capture Key/Capture Value button.

Please note that user needs to capture key first before capturing value. If he attempts to capture value before capturing key, following message will be displayed:

Key not finalized. Finalize key first.

As soon as user captures both key and value, following fields will be populated automatically:

  • Length
  • Width
  • X-Offset
  • Y-Offset

Dependencies

Either one of the following must be on for KV extraction:

  • RECOSTAR_HOCR
  • TESSERACT_HOCR

Above specified plugins generate the HOCR content for an image which is used by KV extraction for extraction.

FAQs

Question: Data not extracted or incorrect data extracted for the field for which existing field is being used as key.

Answer: Check for value extracted for the field which is used as a key for this field. If incorrect value is extracted, correct the key value pair defined for that field. This can be tested via Test Adv. KV button on the Advanced KV screen.

 

Recostar Extraction plugin

Overview

The Recostar extraction plugin by default is a part of the extraction module. This plugin extracts the data for the document level fields for the particular document classified in the document assembler plugin.

Using these plugin document level fields is populating via reading XML file generated by the RSP project file with Recostar tool.

RSP file is has the following format:

RSPFileFormat.jpg

  • User should map the document level fields in the RSP file where the above screen shot having oval mark.
  • User can find further details for creating the RSP file for extraction in “Recostar Design Studio and Fixed Form documentation”.

Steps of execution

    • This plug-in works in the extraction processing phase of the application when all the document classification on the batch has been done properly.
    • This plugin extracts the document level field’s data of the image using Recostar tool.
    • This plugin uses the RSP file present on the <Ephesoft Shared Folder>\{Batch Class}\recostar-extraction\*.rsp otherwise file present in the bin folder of the {Application}\native\RecostarPlugin\bin\*.rsp file will be used.

Configuration

Configurable Properties

Following are the configurable properties available for the Fuzzy Db plugin:

 

Configurable property
Type of value
Value options
Description
Recostar color switchList of values
  • ON
  • OFF

 

If color switch is ON then PNG file will be used for OCRing.
Recostar Auto Rotate switchList of values
  • ON
  • OFF

 

This property is used to auto rotation of the input images on the basis of orientation provided by the recostar.
Recostar Extraction SwitchList of values
  • ON
  • OFF

 

This switch is used to turn this plugin ON/OFF.

Dependency

Apart from the above mentioned properties, there is a major configuration associated with this plugin. Recostar extracts values depending on the project file being used. Hence the project file is the important file for this plugin.

Since the project file maps document level fields with appropriate values (or patterns or barcodes), for extraction, it is purely document type specific. Hence instead of specifying the project file name at the plugin level, one needs to specify the project file name for each document type.

This mapping of each document type with the project file is provided in the BatchClassList>>BatchClass>>DocumentTypes on the Batch Class Management screen. Any “.rsp” file inside the “recostar-extraction” folder inside the batch class folder in shared folders appears in the dropdown and one can select the appropriate project file (.rsp file) in the following property: ‘Form Processing Project File’(See below):

DocumentType.jpg

This plugin only requires an image as an input (which is a PNG if color switch is ON and a TIFF if color switch is OFF). Hence one would require one of the plugins from: ‘Create OCR Input Plugin’/ ‘Create Display Image Plugin’ to run before it.

However, it is important that the document tag has been created in the batch.xml and also the document type has been selected appropriately for the batch. Hence, one should ideally place this plugin after page processing and document classification plugins are done with their processing and the manual Review stage has been crossed.

Dependency on shared folders

The batch class folder inside the main shared folder contains a folder by the name: ‘recostar-extraction’

This folder contains the project files from which a user can map the document type (for Recostar extraction).

Troubleshooting

S no.Error messagePossible root cause
1.Invalid License. So could not be verified.
  • Network connection failure.
  • Recostar command is not valid.
  • License is not installed or invalid.
  • Tomcat server is not started.

 

2.Problem in verifying LicenseUnable to connect with Ephesoft license server or some error occurred at Ephesoft license server side.
3.Unable to load Fpr.rsp fileRSP file used for processing is invalid.
4.Exception while reading from XMLUnable to process batch xml file or batch xml is invalid.
5.Image Processing or XML updating failedUnable to update batch xml.
6.File has invalid extensionFile processed by recostar has invalid extension.
7.Document Type could not be found for PageInvalid document being used for processing.
8.Unable to parse Orientation tag in Recostar xml file.Recostar xml file has invalid value for Orientation tag.
9.Unable to rotate the file:according to the values specified in its xmlRecostar xml file has invalid value for rotation.

Recostar HOCR Plugin

Overview

The Recostar HOCR plugin by default is the part of page processing module of Ephesoft application. This plugin uses Recostar for generating HOCR files. This plugin reads the image files listed in the batch xml of a batch instance and generates HOCR file for each one of them.

Barcode values can be decoded with this plugin using the barcode enabled project file.

Steps of execution

  • This plug-in works in the page processing phase of the application when all the import processing on the batch has been done.
  • This plugin extracts the contents of the image using Recostar tool.
  • This plugin uses the RSP file present on the <Ephesoft Shared Folder>\{Batch Class}\recostar-extraction\*.rsp otherwise file present in the bin folder of the {Application}\native\RecostarPlugin\bin\*.rsp file will be used.
  • If barcode switch is ON, then the RSP file should be barcode enabled.

Configuration

Configurable Properties

Following are the list of configurable properties for the plugin:

BatchClassManagement RecostarHOCRPlugin.jpg

 

Configurable property
Type of value
Value options
Description
Recostar Project File NameList of values
  • Fpr.rsp
  • Fpr_MutliLanguage.rsp

 

This option is used to specify the project file name used to be performing OCRing.
Recostar color switchList of values
  • ON
  • OFF

 

If color switch is ON then PNG file will be used for OCRing.
Recostar Auto Rotate switchList of values
  • ON
  • OFF

 

This property is used to auto rotation of the input images on the basis of orientation provided by the recostar.
Recostar SwitchList of values
  • ON
  • OFF

 

This switch is used to turn this plugin ON/OFF.
Barcode SwitchList of values
  • ON
  • OFF

 

This property is used to read the barcode from the input images using the barcode enabled recostar project file e.g. “FPR_Barcode.rsp”
Recostar Valid ExtensionsList of values
  • tif
  • gif
  • png

 

Recostar can allow the above three formats for OCRing. One can configure the allowable format of image for OCRing in that plugin.

Dependency

This plugin only requires an image as an input (which is a PNG if color switch is ON and a TIFF if color switch is OFF). Hence one would require one of the plugins from: ‘Create OCR Input Plugin’/ ‘Create Display Image Plugin’ to run before it.

Dependency on shared folders

The batch class folder inside the main shared folder contains a folder by the name: recostar-extraction. This file contains the “Recostar Project file” as specified by the first property. If the file selected does not exist, the default file by the selected name present inside Recostar will be used for Recostar OCRing.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the OCRing:

 

S no.Error messagePossible root cause
1.Invalid License. So could not be verified.
  • Network connection failure.
  • Recostar command is not valid.
  • License is not installed or invalid.
  • Tomcat server is not started.

 

2.Problem in verifying LicenseUnable to connect with Ephesoft license server or some error occurred at Ephesoft license server side.
3.Unable to load Fpr.rsp fileRSP file used for processing is invalid.
4.Exception while reading from XMLUnable to process batch xml file or batch xml is invalid.
5.No valid extensions are specified in resourcesNo valid extension is selected.
6.Image Processing or XML updating failedUnable to update batch xml.
7.File has invalid extensionFile processed by recostar has invalid extension.
8.Unable to parse Orientation tag in Recostar xml file.Recostar xml file has invalid value for Orientation tag.
9.Unable to rotate the file:according to the values specified in its xmlRecostar xml file has invalid value for rotation.

Regular Regex Extraction Plugin

Overview

This plug-in performs the functionality of extracting the document level field’s value according to the regex pattern given. User can give a set of values as the regex pattern separated by semicolon. While extracting data, plugin breaks the regex pattern with respect to semicolon and the last part is treated as the pattern. It first matches the last part, if it matches with some value found then all the other parts are searched going from right to left to the left of the value found. While the last part is compared as regex pattern, rest of the parts is compared as words. When all the parts are found then the value is extracted. If even any one value is not found then the value is not extracted.

Example

Consider following value is specified for the pattern field of a document level field:

Invoice;Date;\d{1,2}[/]\d{1,2}[/]\d{2,4}

Plugin will use last value in the semi-colon separated list, i.e., \d{1,2}\d{1,2}\d{2,4} for value extraction.

Consider following data is supplied as input data, i.e., present in an image:

Case 1: Input Data: Invoice Date 21/03/2012

Result: This will extract 21/03/2012 successfully as Date and Invoice both are found to the left of extracted value 21/03/2102.

Case 2: Input Data:Date 21/03/2012

Result: Regex pattern will be matched in this case but data won’t be extracted as Invoice is not found to the left of Date.

Configuration

Plugin Configurations

Regular regex extraction can be configured at following UI:

BatchClassManagement RegularRegexExtractionPlugin.jpg

Properties description:

 

Configurable propertyType of valueValue optionsDescription
Regular Regex Extraction SwitchString
  • ON
  • OFF

 

The switch that describes that plug-in has to run or not.Default ON.
Regular Regex Confidence ScoreInteger0 – 100Acts as a multiplier for the confidence score calculated by matching regex.

To add/edit the regular expression required for the Regular Regex Extraction, user needs Add/Edit the corresponding document level field at following UI:

AddOrEditDLF.jpg

Upon Adding/Editing the document level field, following screen will be presented where regular expression can be entered in Pattern field:
AddOrEditDLF PatternField.jpg

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1Invalid input pattern sequence.This occurs if the entered regex pattern is not a valid pattern or is not of proper format.
2No FieldType data found from data base for document typeThis happens when there are no field types initialized in a document.

Scripting Plugin

Overview

This plug-in reads the batch’s batch.xml file and works upon the given document as per the scripts given in the scripts folder. All the scripts are placed inside “<Ephesoft shared folder>\batch class folder (ex:-BC1)\scripts”.

For any script there are two ways to write it, either it could be written in “IScript” or “JDOM”. For running any type of script user needs to place the script inside the “scripts” folder inside batch class folder for the respective batch class on which the script needs to be run.

Configuration

Configurable properties

Following are the configurable properties available for the Scripting plugin in the dcma-scripting-plugin properties file in META_INF\dcma-scripting-plugin:

 

Configurable propertyType of valueValue optionsDescription
Script Parser TypeString
  • jdom
  • iscript

 

This value defines the type of scripts that will run. There could be two types of scripts that could be run i.e. JDOM and ISCRIPT. For script to run in JDOM user has to give the parser type as “jdom”. For script to run in ISCRIPT user has to give the value as “iscript”.Default jdom.
Script SwitchString
  • ON
  • OFF

 

This switch is used to set the execution of scripts on or off. If this switch is off then no script will run otherwise scripts will run.Default ON.

This is shown in the screen shot given below:

BatchClassManagement ScriptingPluginConfiguration.jpg

Steps for configuring the plugin

  • User can set the script switch to on/off for running the scripts and for skipping the execution of scripts respectively.
  • If the script switch is on then the parser type mentioned in the “Script Parser Type” property defines the type of scripts given.
  • If the parser type is jdom then the JDOM scripts will run and if any script is present that runs for ISCRIPT then it will give errors and vice versa.

Steps of execution

  • Configure the plugin switch in the below configuration file i.e.

META-INF/dcma-scripting-plugin/dcma-scripting-plugin.properties file. Also give the parser type for the script to run.

  • Enter the desired script in the scripts folder of the batch class in which user wants to run the script in. There are predefined scripts present in the scripts folder for each batch class. These are the dummy scripts.
  • There is a set format for the naming of the scripts which will be picked as their names are configured. Therefore the names of the scripts need to be the same as in the scripts folder. For running any custom script, user needs to make changes to the present script or make its own custom script with same name as predefined scripts and replace the existing script.

Dependency

There is only one dependency of this plug-in. The “import-batch-folder” plug-in needs to be executed before “scripting-plugin” to generate the files required for processing of “scripting-plugin”. If the batch goes into “Error” state then proper logs will be generated in log file kept at {Application}\dcma-all.log.

NOTE: There are some scripts placed in the “scripts” folder which are required for the system.

 

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1Script having invalid parser type or invalid arguments. Throwing workflow in errorThis occurs if the entered parser type and the present script types does not match for ex. If parser given is dom and user puts in an iscript script then this error occurs.
2Script error out. Throwing workflow in error.This happens when the custom script that has been put error out and needs to be corrected.

Search Classification Plugin

Overview

This plugins is responsible for classifying the Ephesoft documents using lucene based indexing for batch class.

This plugin is working on the two stages for classification of document:

  • Learning: Learning process is done generating indexes for documents. Generated indexes will be used as classifying the document. For further information of learning, please refer the document “Learning document”.
  • Classification: While classification a document using search classification plugin, learnt data is used as reference data for classification of document. While classification a document type, this plugin use the extracted HOCR content from the image and verifying the HOCR content to the learnt data in previous stage.

Using this plugin HOCR content should be generated in HOCR Generation plugin like “Recostar HOCR” and “Tesseract HOCR”.

Configuration

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable propertyType of valueValue optionsDescription
Lucene Valid Extensions:StringEx: html, xmlThese are the valid extension of the input file for classification document type from specified file format.Default html, xml
Lucene Min Term FrequencyIntegerNAThe frequency below which terms will be ignored in the source document.
Lucene Min Document FrequencyIntegerNASets the frequency at which words will be ignored which does not occur in at least this many documents.
Lucene Min Word LengthIntegerNAThe minimum word length below which words will be ignored from the HOCR content.
Lucene Min Query TermsIntegerNAThe minimum number of query terms that will be included in any generated query.
Lucene Top Level FieldStringNAThis property is used to configure default field for query terms.
Lucene No Of PagesIntegerNAThis property is used to specify the number of documents to be returned in a query search.
Lucene Index FieldsStringEx: summaryThis property is used as index field for searching document type using lucene.
Lucene Stop WordsStringEx: name; titleThis property is used to ignoring the word while classification of document.
Search Classification SwitchString
  • ON
  • OFF

 

This property is used for ON/OFF the search classification plugin.Default ON
Search Classification Max ResultsIntegerNAThe maximum number of results will be generated from query.
First Page Confidence Score ValueIntegerNAThis property is used for updating confidence score on the basis of the first page type.
Middle Page Confidence Score ValueIntegerNAThis property is used for updating confidence score on the basis of the middle page type.
Last Page Confidence Score ValueIntegerNAThis property is used for updating confidence score on the basis of the last page type.

This is shown in the screen shot given below:

BatchClassManagement SearchClassificationPlugin.jpg

Steps of execution

  • This plug-in works in the page process phase of the application when all the import processing on the batch has been done and it’s ready to be page processing.
  • Learning should be done on the batch class before using this plugin.
  • The plug-in classifying the input images via lucene based indexing.
  • After all the work is done, it writes the information into batch.xml file for the document type being classified.

Dependency

This plugin is dependent on the HOCR Generation plugin like Recostar, Tesseract. This plugin takes the HOCR file generated from Recostar and Tesseract as an input for Search Classification Plugin.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1No index files exist inside folderLearning is not done for the batch class.
2Page Types not configured in Database.Invalid indexes present in the index data for the batch class.
3CorruptIndexException while reading Index.Index data being corrupted in the index folder for the batch class.
4IOException while reading IndexIndex data is unable to open due to get index file corruption or having lock on it.
5No valid extensions are specified in resourcesPage contains invalid HOCR file for processing.
6No pages found in batch XML.Pages tag not found the input batch.xml

Table Extraction Plugin

Overview

The plug-in is responsible for extracting data from the batch involving tabular data in the form of tables.

Table extraction can be performed using either one or combination (AND/OR) of following extraction techniques:

  • Column Header Validation
  • Column Coordinates Validation
  • Regex Validation

User can select which of the above extraction technique is to be used for table extraction.

User need to specify start and end pattern for the table. Data between the start and end pattern will be considered as table data. If no data is found matching the start pattern specified, no data will be extracted.

Characteristics

  • Every document will have one or more pages in it and algorithm will extract all tables present on document.
  • Document is parsed to identify tables starting from the first page to the last page of the document.
  • One table may span one or more pages.
  • User will provide some start and end pattern which will decide the data that is to be considered for table extraction.
  • Based on the table extraction API specified, extraction will be done using one or more of the following extraction methods:
    • Column Coordinates Validation
    • Column Header Validation
    • Regex Validation

Column Header Based Extraction

To extract data using column header, admin needs to define Column Header Pattern parameter for the table column.

Based on the column header pattern specified by admin, plugin will first search the data matching that regex pattern and if found, all the data below that column header would get extracted for that particular column.

 

Column Coordinates Based Extraction

This extraction method will extract the data based on the column coordinates specified by the admin. Data below the column coordinates will get extracted for that column.

For this type of extraction, start and end coordinates for the column are need to be specified. Data between the

Regex Based Extraction

In case of regex validation, data will be extracted on the basis of regex patterns defined for that column i.e., Column Pattern, Between Right pattern and Between Left pattern. Data will be extracted between start and end pattern only.

  • Column Pattern: Data matching with this column pattern will be extracted for that column.
  • Between Right Pattern: Data that is extracted by the column pattern should have a data to the right matching this between right pattern. Pattern specified must be single word capturing pattern only.
  • Between Left Pattern: Data that is extracted by the column pattern should have a data to the immediate left matching this between left pattern. Pattern specified must be single word capturing pattern only.

Note

  • If between right or between left pattern is specified but is not matched with the immediate right or left data, data will be extracted as invalid data.
  • Only single word capturing patterns are allowed for between left and between right pattern.

Configuration

Table Configuration

Add/Edit/Delete Table Info

User can add/edit/delete any table information upon clicking the corresponding buttons at following UI:

TableInfoListing AddOrEditOrDelete.jpg

Upon clicking the Add/Edit button, following UI will be presented where user can enter values for any property:

TableInfoConfiguration.jpg

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable property
Type of value
Value options
Description
NameStringName for the data table involved
Start PatternStringA valid expressionA keyword or expression marking the beginning of the table. Correct start pattern must be specified for table data to be extracted. It can be validated using the check button.
End PatternStringA valid expressionA keyword or expression marking the end of the table. It can be validated using the check button.
Table Extraction APIDecides which automatic extraction API/APIs are to be used.

Table Column Configuration

Add/Edit/Delete Table Column Info

Table column information can be added/updated/deleted by clicking corresponding button at following UI:

TablecolumnInfo AddOrEditOrDelete.jpg

  • Upon clicking the add/edit button, following UI will be presented where user can add/edit table column fields:

TablecolumnInfoConfiguration.jpg

 

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable property
Type of value
Value options
Description
Column NameStringNAThis will keep the name of the column.
Column PatternRegular ExpressionValid regular expressionThis will keep the regex pattern for the column data.
Between LeftRegular ExpressionValid regular expressionThis will keep the regex pattern for validation for left column of the actual search column.
Between RightRegular ExpressionValid regular expressionThis column will keep the regex pattern for validation for right column of the actual search column.
Column Header PatternRegular ExpressionValid regular expressionHeader pattern for column.
Start CoordinateIntegerNAStart Coordinate for the column.
End CoordinateIntegerNAEnd Coordinate for the column.
RequiredRadio buttonTrueFalseIf radio button checked, each table row extracted must contain some valid data for that column. If invalid data is extracted for the column, corresponding row will not be added to table data.

Column Header Based Extraction

Enter column header regex pattern from following UI:

[Batch Class List]>>[Batch Class]>>[Document Type]>>[Table Info]>>[Table Column Info]>>Edit
ColumnHeaderBasedExtraction.jpg

There is a configurable property for table extraction using column header in

[Ephesoft-home]\WEB-INF\classes\META-INF\dcma-table-finder\*

tablefinder.gap_between_column_words=40

This value should be specified in pixels. In addition to words that are below the column header, all words (to the left or right) will also be extracted for the column in case gap between them and the extracted data is less than the value specified for gap_between_column_word.

Column Coordinates Based Extraction

Admin can set the column coordinates by clicking on Set Coordinates button at following:

[Batch Class List]>>[Batch Class]>>[Document Type]>>[Table Info]>>[Table Column Info]

ColumnCoordinatesBasedExtraction.jpg
On clicking the Set Coordinates button, new UI will open where user can select an image and select column coordinates by drawing a zone on image.
SetCoordinatesUI.jpg

  • User need to draw a rectangle to select the start and end column coordinates for selected column.
  • To select coordinates for other columns, select that column from the drop down list on left hand side. This drop down contains names of all the table columns for selected columns.
  • Clear Button: On clicking Clear button, coordinates for selected table column will be cleared.
  • Clear All Button: On clicking Clear All button, coordinates for all the table columns of the selected table will be cleared.

Regex Based Extraction

User needs to enter valid regex patterns for table and table columns for regex based extraction. Table should have valid start and end patterns whereas column pattern, between left pattern and between right patterns need to be specified for tables column.

Select table extraction technique to be used

Select any of three extraction techniques with AND/OR between them as shown below:

[Batch Class List]>>Edit Batch Class>>Edit Document Type>>Edit Table
TableExtractionTechnique.jpg

Dependencies

Table extraction plugin has following dependencies:

  • RECOSTAR_HOCR
  • TESSERACT_HOCR

One of the above plugins must be ON for key value learning as these plugins extract data from the image and create HOCR file which is required for the table extraction.

Troubleshooting

Following are few common areas for troubleshooting for table extraction plugin:

 

S no.Error messagePossible root cause
1Table info list is null or empty.No table is configured for the document type.
2TableColumnsInfo list is null or empty.No table column is defined for table.
3Invalid input pattern sequence.Patterns defined for table extraction are not valid.
4Skipping Table extraction. Switch set as off.Table extraction switch is set to OFF.

Tesseract HOCR Plugin

Overview

The Tesseract HOCR plugin by default is a part of page processing.

This plugin reads the image files listed in the batch xml (of a batch), generates HOCR file for each one of them and updates its batch.xml.

Configuration

Configurable Properties

Following are the list of configurable properties for Tesseract HOCR plugin from UI:-

BatchClassManagement TesseractHOCRPlugin.jpg

 

Configurable propertyType of valueValue optionsDescription
Tesseract SwitchList of values
  • ON
  • OFF

 

This switch is used to turn this plugin ON/OFF. If this switch is OFF, this plugin won’t do anything.
Tesseract color switchList of values
  • ON
  • OFF

 

Tesseract is unable to read colored TIFFs. Hence, in case of colored images (i.e. when one switches ON the color switch), it send the PNGs for OCRing instead.Hence switching the color switch ON would be helpful for batch classes where one expects to have colored TIFF images.
Tesseract LanguageStringNAThis option provides the user an option to select the language one wants to use for OCRing. At present Tesseract supports only single language per image file OCRing.E.g.: specify ‘eng’ for English, ‘tur’– for Turkish etc.
Tesseract VersionStringNAThis option provides the user an option to define the Tesseract version installed in system.E.g.: specify ‘tesseract_version_3’ for Tesseract 3.0, ‘tesseract_version_2’– for Tesseract 2.0 etc.
Tesseract Valid ExtensionsMulti-select
  • tif
  • gif
  • png

 

This property holds an integer value which decides on <some logic>. (Also mention range if applicable)

Steps of execution

  • This plug-in works in the page process phase of the application when all the import processing on the batch has been done and it’s ready to be page processing.
  • The plug-in does OCRing for all the input images.
  • After all the work is done, it writes the name of each HOCR file in its batch.xml and generates HOCR output in the form of html and HOCR.xml.

Dependency

This plugin only requires an image as an input (which is a PNG if color switch is ON and a TIFF if color switch is OFF). Hence one would require one of the plugins from: ‘Create OCR Input Plugin’/ ‘Create Display Image Plugin’ to run before it.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no.Error messagePossible root cause
1Tesseract Base path not configured.Environment variable for Tesseract is either not set or path is configured incorrectly.
2Space found in the name of image: xyz.png. So it cannot be processedPlease remove spaces from image name and restart the batch from page process module.
3No valid extensions are specified in resourcesValid Extensions for input image files is not specified.
4Image Processing or XML updating failed for image: xyzImage file given as input is having an extension other than specified in property ‘Tesseract Valid Extensions’

Create Display Image Plugin

Overview

This plugin performs the functionality of creating the display png files for the images being processed. This plugin takes all the images and create png files for each to be shown on the UI. It uses ImageMagick for converting files to png which will be used for OCRing when color switch is “ON”.

 

Configuration

Steps for configuring the plugin

  • User can select the page process and navigate to create display image plugin configuration page as shown below:

PDCreateDisplayImagePlugin.jpg

These are the configurations that are required for creating display image. The properties are non editable as the files created by using are required for further plugins.

Configurable Properties

This plugin has no configurable properties either on UI or in META-INF.

 

Steps of execution

  • Plug-in uses the type of file extension given in plugin properties.
  • While executing, ImageMagick parameters are used to generate the display png thumbnail files and the tif files used for comparing and OCRing, if color switch is on.
  • These files are then copied to the batch instance folder and their respective entries are made into batch.xml file.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

S no.Error messagePossible root cause
1No valid extensions are specified in resources.There are some corrupt values present in the database forthe extension types give in the configuration.
2Problem generating thumbnails. Setting batch Status to error state.If ImageMagick encountered any issue during converting files to desire file types.

Create OCR input Plugin

Overview

This plugin is used for generating PNG files corresponding to input files. These input files may be tiff files or multipage tiff files. These PNG files are used for further processing and OCRing.  It uses ImageMagick for converting files to PNG which will be used for OCRing.

 

Configuration

Steps for configuring the plugin

  • User can select Page Process module and navigate to Create OCR input plugin configuration page as shown below:

PDCreateOCRinputPlugin.jpg

User cannot edit the above settings by clicking on “Edit” in order to change the settings as per his requirements.

Configurable Properties

This plugin has no configurable properties either on UI or in META-INF.

Steps of execution

  • Plug-in uses tiff files as input.
  • While executing, ImageMagick parameters are used to generate the OCR display PNG thumbnail files and the tiff files used for comparing and OCRing.
  • These files are then copied to the batch instance folder and their respective entries are made into batch.xml file.

Dependency

The plugin assumes the incoming batch has been imported properly and batch.xml is created successfully.

 

Troubleshooting

Following are few common error messages received due to malfunctioning of the plugin:

S no.Error messagePossible root cause
1.Problem in generating PNG files.Some error occurred in generating PNG files.
2.Improper Folder Specified folder name->Batch instance folder name is incorrect or does not exist. Make sure that sharedfolder path is mentioned correctly.
3.Problem generating list of filesBatch instance folder name or path is incorrect.
4.command cannot be runImageMagick is not working or ImageMagick configuration is not correct.

CSV File Creation Plugin

Overview

This plugin enables users to export the extracted metadata for a batch in a CSV format. It captures the extracted document level fields like “subpeona” on per output document basis and some batch specific fields like the date of processing, document type name etc. The generated CSV file will be exported to location configured by the “CSV Creation Final Export Folder” property. If the switch is ON then a csv file for batch is created.

 

Configuration

Steps for configuring the plugin

  • User can select the Export module and navigate to CSV File Creation plugin configuration page as shown below:

PDCSVFileCreationPlugin.jpg

The User can edit the above settings by clicking on “Edit” in order to change the settings for their requirements.

Configurable Properties

Following are the configurable properties available for the CSV File Creation plugin:

Configurable propertyType of valueValue optionsDescription
CSV Creation Final Export FolderString path to export folderEx : C:\ephesoft-data\csv-export-folderFolder in which the csv file created will be exported.
CSV Creation SwitchList of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

 

Steps of execution

  • This plug-in works in the export phase of the application when all the processing on the batch has been done and it’s ready to be exported.

 

  • The plugin works to form a csv file is the plugin switch is “ON” otherwise no csv file is created.

 

  • The plugin uses the batch.xml file to create the csv file for the batch. It exports the document level fields present in the batch.xml to csv file.

 

  • After creation the file is copied to the final “CSV Creation Final Export Folder” configured.

Dependency

The plugin assumes the extraction for the incoming batch has been done properly and just changes the results of provided batch.xml in a desired format.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no.Error messagePossible root cause

1CSV File Creation Export Folder value is null/empty from the database. Invalid

initializing of properties.

If the path configured is not valid.

2Batch Document List is null or empty.If there are no documents in the batch.

 

Filebound Export Plugin

Overview

This plugin is used for uploading data to the file bound content management solution. It transforms the batch xml to a document file and exports it to the configured repository. We can upload multipage tiff and multipage pdf to the file bound content management solution.

 

Configuration

Steps for configuring the plugin

  • User can select the export module and navigate to Filebound export plugin configuration page as shown below:

PDFileboundExportPlugin.jpg

The User can edit the above settings by clicking on “Edit” in order to change the settings for their requirements.

Configurable Properties

Following are the configurable properties available for the Filebound Export plugin:

Configurable propertyType of valueValue optionsDescription

File Bound Connection URLStringEx : C:\ephesoft-data\csv-export-folderLocation url for the filebound repository.

File Bound User NameStringEx : adminThe username for the repository authentication.

File Bound PasswordStringEx : passwordThe password for the repository authentication.

Filebound Project NameStringNAThe project name for which the filebound repository is used.

Filebound index fieldStringNAThe indexing field that will be used from batch.xml to create indexes.

Filebound divisionStringNAThe division type that will be used for crating the document.

Filebound separatorStringNAThe separator that will be used for breaking the docment.

Filebound Export FormatList of values
  • pdf
  • tif
  • Determines which format of files has to be exported.
  • Default pdf.

File Bound SwitchList of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

 

Steps of execution

  • This plug-in works in the export phase of the application when all the processing on the batch has been done and it’s ready to be exported.

 

  • The plugin works to form a document file and export to the repository given if the plugin switch is “ON”.

 

  • Then the document is made by using the configured filebound export format i.e. tif or pdf.

 

  • The created document is indexed by using the configured index field.

 

  • The document is then exported to the given url repository.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no.Error messagePossible root cause

1Document Level fields are null. So cannot upload documents of batch

instance

If there is no document level fields in the batch class.

2Project Name must not be nullIf there is no project name configured.

3Connection url must not be nullIf there is no Connection url configured for repository.

4Username must not be nullIf there is no username configured for authentication.

5Password must not be nullIf there is no password configured for authentication.

6Index Field must not be nullIf there is no index field configured.

7Division must not be nullIf there is no division configured.

8Separator must not be nullIf there is no separator configured.

9Non-zero exit value for filebound command found.If the export of document on the server is unsuccessful.

 

IBM CM Plugin

Overview

This plugin is used to export batch XML in IBM content management schema format. Basically this plugin transforms batch xml to another XML acceptable by IBM Content Management.

 

Configuration

Steps for configuring the plugin

  • User can select the Export module and navigate to IBM CM plugin configuration page as shown below:

PDIBMCMPlugin.jpg

Users can edit the above settings by clicking on “Edit” in order to change the settings as per their requirements.

Configurable Properties

Following are the configurable properties available for the IBM CM plugin:

Configurable propertyType of valueValue optionsDescription

IBM CM Final Export FolderString path to export folderEx -C:\Ephesoft\SharedFolders\ibm-cm-export-folderFolder in which the file will be exported after transformation in desired format.

IBM CM SwitchList of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

Property File Configuration

Property file:  {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-ibm-cm/ dcma-ibm-cm.properties

Configurable propertyType of valueValue optionsDescription

ibm.cmod_app_groupStringNAValue for setting parameter cmod app group’s value in XML.

ibm.cmod_appStringNAValue for setting parameter cmod app’s value in XML.

ibm.user_nameStringNAValue for setting parameter user name’s value in XML.

ibm.emailStringNAValue for setting parameter email’s value in XML.

ibm.supplying_systemStringNAValue for setting parameter for DAT file name’s value in XML.

Steps of execution

  • Plug-in uses batch xml file inside batch instance folder.
  • Batch XML is transformed in IBM content management schema format .This format is acceptable by IBM Content Management system. This plugin creates 3 files as the result of processing. Plugin creates one ctl file, one dat file and one xml file. Name of these files will be as below given format-
“name of batch folder” + “_”+ “batch instance identifier”+.ctl/.dat/.xml 
  • These files are then copied in IBM CM final export folder in a fixed format. Ex- Let user has 5 batch folders to import, named as ABC1, ABC2, ABC3, ABC4, and ABC5. After batch processing, IBM CM plugin creates ABC folder inside IBM CM export folder and subfolders for each batch instance on the basis of batch instance identifier. So in this ABC folder will has 5 subfolders BC1, BC2, BC3, BC4 and BC5. Each of these subfolder will has one ctl one dat and one xml file.

Dependency

The plugin assumes the extraction for the incoming batch has been done properly and just changes the results of provided batch.xml in a desired format.

 

Troubleshooting

Following are few common error messages received due to malfunctioning of the plugin:

S no.Error messagePossible root cause

1.IBM Content Management Export Folder value is null/empty from the database. Invalid initializing of

properties.

IBM CM Export folder path is incorrect in database.

2.Could not find xsl file in the classpath resourceibmCMTransform.Xsl file is not present in classpath resource. May be jar file is not valid.

3.targetXMLPath is null. Unable to create directoryUnable to create file in specified folder. Check for permission issue.

4.Unable to create directoryIBM CM Export folder is not present and unable to create the folder.

5.Error in creating output xml fileFile not found at specified place.

6.Could not transform ibmCMTransform.xsl fileError occurred while transforming batch xml file.

7.Failed exporting batch instance for IBM Content ManagementIf any of the above error occurred then this message will be logged in files.

NSI Export Plugin

Overview

This plugin is used for exporting zipped file for a batch. It transforms the batch xml to another xml format acceptable by NSI CMS and zips it along with multipage tiff and multipage pdf to NSI export folder location. This plugin is used when we need batch instance folder and a specific format for the batch xml created. NSI export transforms the batch xml to a specific format which is specified by NSI CMS. The batch instance folder is then zipped and along with the formatted batch xml and exported to the export location given in “NSI Export Folder” parameter.

 

Configuration

Steps for configuring the plugin

  • User can select the export module and navigate to NSI export plugin configuration page as shown below:

PDNSIExportPlugin.jpg

The User can edit the above settings by clicking on “Edit” in order to change the settings for their requirements.

Configurable Properties

Following are the configurable properties available for the NSI Export plugin:

Configurable propertyType of valueValue optionsDescription

NSI Export FolderStringEx : C:\ephesoft-data\NSI-export-folderFolder in which the zipped file will be exported after transformation in desired

format. If NSI State Switch is “ON”.

Final NSI XML NameStringEx : _NSI.xmlName of the batch xml finally created after transformation into desired format.

NSI State SwitchList of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

Steps of execution

  • This plug-in works in the export phase of the application when all the processing on the batch has been done and it’s ready to be exported.
  • The plugin works only if “NSI State Switch” property is “ON”.
  • The plug-in makes use of a predefined xsl to convert the batch xml file into a NSI supported format. And name the new xml file according to the user specified value in “Final NSI XML Name” property.
  • The converted batch.xml and the batch instance folder are used for exporting. The export path is given in the following property: NSI Export Folder.

Dependency

The plugin assumes the extraction for the incoming batch has been done properly. It is also dependent on “Create multipage plugin”. The NSI Export plugin requires the processing of “Create Multipage Plugin” and extraction module.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no.Error messagePossible root cause

1NSI Export Folder value is null/empty from the database. Invalid initializing of

properties.

NSI Export Folder is either null or empty

2Could not find xsl file in the classpath resourceNSITransform.xsl cannot be located within the classpath

 

Tabbed PDF Plugin

Overview

This plugin is used to merge all multipage PDFs to form a single tabbed PDF based on Placeholder. Basically this plugin creates bookmarked PDF in configured export folder by merging all multipage PDFs.

 

Configuration

Steps for configuring the plugin

User can select the Export module and navigate to Tabbed PDF plugin configuration page as shown below:

PDTabbedPDFPlugin.jpg

Users can edit the above settings by clicking on “Edit” in order to change the settings as per their requirements.

Configurable Properties

Following are the configurable properties available for the Tabbed PDF plugin:

Configurable propertyType of valueValue optionsDescription

Tabbed PDF SwitchList of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

Tabbed PDF Export FolderString(Folder path)Ex – C:\Ephesoft3\SharedFolders\tabbed-pdf-export-folderFolder in which the output tabbed PDF and multipage tiff file (if created) will be stored

Tabbed PDF PlaceholderList of values
  • YES
  • NO
  • If this switch is YES this plug-in will create the document map on the basis of order or priority of document types defined in export-script.properties  file. In this case a tab will be created in

document map for each document type and load the error PDF if the document is not present in batch xml.

  • If this switch is NO this plugin will create a document map on the basis of documents present in batch xml. Tabs will be created only for documents that are present in batch xml.
  • Default NO.

Tabbed PDF Property fileString(File path)Ex – C:\Ephesoft\SharedFolders\property\export-script.propertiesThis property file has document types and their priority in a predefined format.

Tabbed PDF Creation ParametersStringEx. -q -dBATCH -dNOPAUSE -sDEVICE=pdfwriteThese parameters will be used by ghost script at the time of PDF creation.

Tabbed PDF Optimization ParametersStringEx. -q -dNODISPLAY -P- -dSAFER -dDELAYSAFER — pdfopt.psThese parameters will be used by ghost script at the time of PDF optimization.

PDF Optimization switchList of values
  • ON
  • OFF
  • Determines whether the PDF optimization will be performed or not.
  • Default ON.

 

Property File Configuration

Property file: 
{Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-tabbed-pdf/dcma-tabbed-pdf.properties

Configurable propertyType of valueValue optionsDescription

tabbed_pdf.ghost_script_commandString
  • gswin64c
  • gswin32c
Ghost script command for Linux and Windows.

tabbed_pdf.unix_ghost_script_commandString
  • gs
Ghost script command for Unix.

 

Steps of execution

  • Plug-in uses batch xml file, multipage PDF files and multipage tiff files inside batch instance folder.
  • Batch XML is changed as per the processing done by the plug-in.
  • All multipage PDF files are combined to create single PDF document and then copied to tabbed-pdf-export-folder inside shared folders. Name of PDF file will be as below given format-

“name of batch folder” + “_”+ “batch instance identifier”+.pdf

 

  • Finally first multipage tiff will be copied to tabbed-pdf-export-folder inside shared folder and rest of multipage tiffs will be lost.

Dependency

The plugin assumes that multipage PDFs are created by batch preprocessing. So create multi page files plugin should be in workflow of batch class because create multi page files plugin is responsible for creating multi page PDFs. Besides this export-script.properties file should be at right place with correct information.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no.Error messagePossible root cause
1.{folderPath} is not a Directory.Folder path configured by user is not valid.
2.Property file for documents not valid.Same priority is defined for more than one document type.
3.File does not exist. File Name=”{file-name}”Invalid PDF file name is mentioned in batch xml.
4.Error in writing pdfMarks file.Some error occurred while processing.
5.Enviornment Variable GHOSTSCRIPT_HOME not setGHOSTSCRIPT_HOME is not set in environment variable or in startup.bat file inside

{Ephesoft-home}\JavaAppServer\bin folder.

6.No ghostcript command specified in properties file.Either dcma-tabbed-pdf.properties file is not present or properties are missing in properties file.
7.Sample PdfMarks file not provided.pdfmarks.dat is not present at specified location.