“El agua es vida, Cuidémosla”
FIGURA 5. HORIZONTE ESTRATÉGICO
4.2.3 Definición de ejes, objetivos, alcances y lineamientos
On the left icon bar in FastDoc, choose the Configure Workflow option. You should see a window similar to Figure 7-2.
You do not make any changes to these workflows now, but you need to understand what each component does to master the use of the template.
7.3.1 The Learning template jobs
At the time of this writing, there are six jobs available for you. They are all similar in
processing, yet they differ in the types of images that they can process and how the images are ingested into the system:
DemoSingleTiffs
This job takes single page TIFF images from the
\datacap\appname\images\Input_SingleTIFFs folder. Even though it takes single TIFF images, by using the separator sheet from the APT project, you can input multiple page documents. Verify is set up to use DCDesktop, Datacap Web Services, and FastDoc clients.
DemoWebScan
All processing and verification is the same as DemoSingleTiffs, but the input is done with Datacap Web Services with an additional Upload task.
DemoMultiFormat
This job takes every image that can currently be converted from the
\datacap\appname\images\Input_MultFormat directory. The processing difference is that it uses the input image to determine document structure, one document per image. For instance, if you put in a three-page PDF image, all three pages will be treated as one document.
FlexIDSingleTiffs
This is identical to DemoSingleTiffs except that a FlexID task is inserted between scanning and processing to manually identify pages.
FlexID MultiFormat
This one is identical to DemoMultiFormat but with the FlexID task inserted.
FlexIDWebScan
This is identical to DemoWebScan but with ProtoID inserted to manually identify pages before processing.
7.3.2 The Learning template tasks
The green boxes in the middle of Figure 7-2 on page 149 (shown previously) identify the rulesets that are in every task:
Import Files
This task contains the compiled user interface (UI) ruleset to input files from the disk.
Convert Files to Images
This one is not shown in Figure 7-2 on page 149, but it is available by clicking either of the MultiFormat jobs. These are ready for use and configured to convert all image types that Datacap can currently convert to 300 dpi Group 4 bi-tonal TIFF images for processing.
ManagedRotation
This task automatically rotates the images with the ScanSoft recognition engine. It is run in a managed fashion, so if there is an error, the recognition engine automatically restarts and continues processing.
Image Enhancement
This task enhances the image by deskewing, removing lines, shaded backgrounds, and so on. Be conservative when setting this up, becauses it will apply to all images. In this example, we must make sure that it does not erase any lines in the bar code.
PageID
This identifies pages according to the order of known pages. By default, it is expecting particular document separator sheets (those used with Flex and APT). It can also create documents based on the input image structure, making one document per multipage image. Because our application is using the bank’s own letter for document separation, we need to reconfigure this ruleset.
CreateDocuments
This task creates the document structure, based on the pages that you set up and configure. In our example, the package that the customer submits with the cover letter will be the first page of the document, followed by the first page of the attached bank
statement, designated as Main_Page, followed by any additional pages of the bank statement, which will be designated as Trailing_Pages.
Recognize Pages and Fields
This runs a managed full-page OCR recognition of all pages (except attachments). In our application, everything is named CoverLetter, Main_Page, or Trailing_Page, so there will not be attachments designated in PageID.
Fingerprint
This task tries to match Main_Page to all known instances of Main_Page. If there was a previous example of a particular bank statement recorded, the Learning template matches against the existing fingerprint for that document so that we can use known zones for extracting the data. If there is not a previous example of a particular bank statement, the Learning template creates a new fingerprint automatically so that zones can be saved to the systemfor future encounters of statements from that bank.
Locate
Locate extracts data from zones if they are known for a particular statement and uses keyword searches or regular expressions to try to locate data if zones are unavailable or not found. This ruleset varies widely for different types of data that you need to capture, so in the template, we need to set this up.
Lookup
Lookup returns the fingerprint class of known fingerprints into the Fingerprint_Class field. In this application, we store fingerprints by the particular bank name that is associated with them. Therefore, this Fingerprint_Class field will hold the Institution, one of the date elements that we want to capture. Often, these types of values are in logos that are unrecognizable by OCR, so this is a more reliable way of populating institution names.
Validate
Like the Locate ruleset, this ruleset must be set up after the fields are added to the project. The purpose of the ruleset is to make sure that data conforms to the business rules. Every field should be validated in some way, to ensure compliance with data types and length restrictions in whatever system we are exporting to. Notice that Validate runs at least twice: Once during the Profiler task and again in the Verify task each time that an operator clicks Submit on a document.
Routing
This is used to prepare the documents for the Verify operator. Pages that violate a business rule are automatically shown to operators, but we need to also check that the data did not fall below the required confidence for the greatest accuracy, even though it might have passed a business rule. This ruleset is also commonly used to clean up any fields before they go to Verify. In our case, when the Lookup ruleset runs, it puts the <New>
value in the Fingerprint_Class field when the lookup occurs, and we will be looking for that value and blanking it so that the operator does not have to erase before typing in the correct value.
SetStatuses
Different clients set rejection statuses in various ways. In our application, the rejection statuses are Delete, Rescan, and Review. Those statuses can be set directly in some but not all clients, so the Learning template also provides a drop-down menu to set them at Verify time. Regardless of which method an operator chooses to reject a document, this makes sure that the status is set correctly so that the rejected documents will not go to the repository and the appropriate people are notified.
PreExport
This is where the zones are saved to any <New> fingerprint that was created. Because it runs after Verify, we should now know where all of the required data is located on this particular type of bank statement. This ruleset is also usually configured to make the output image required by the system that we export to.
Export
This ruleset outputs data in a simple way to a text file. For production systems, Datacap offers many different types of export rulesets, and you can export to as many systems as you want.
Process Exceptions
This is where we handle any documents that were rejected by the operator (Delete, Rescan, Review). Customer requirements can vary widely, but typically this ruleset would be configured to export to some business management workflow so that appropriate action can be taken, but some customers just want notification so that documents can be fixed as necessary and inserted into future batches.