Definición de ejes, objetivos, alcances y lineamientos

“El agua es vida, Cuidémosla”

FIGURA 5. HORIZONTE ESTRATÉGICO

4.2.3 Definición de ejes, objetivos, alcances y lineamientos

On the left icon bar in FastDoc, choose the Configure Workflow option. You should see a window similar to Figure 7-2.

You do not make any changes to these workflows now, but you need to understand what each component does to master the use of the template.

7.3.1 The Learning template jobs

At the time of this writing, there are six jobs available for you. They are all similar in

processing, yet they differ in the types of images that they can process and how the images are ingested into the system:

򐂰 DemoSingleTiffs

This job takes single page TIFF images from the

\datacap\appname\images\Input_SingleTIFFs folder. Even though it takes single TIFF images, by using the separator sheet from the APT project, you can input multiple page documents. Verify is set up to use DCDesktop, Datacap Web Services, and FastDoc clients.

򐂰 DemoWebScan

All processing and verification is the same as DemoSingleTiffs, but the input is done with Datacap Web Services with an additional Upload task.

򐂰 DemoMultiFormat

This job takes every image that can currently be converted from the

\datacap\appname\images\Input_MultFormat directory. The processing difference is that it uses the input image to determine document structure, one document per image. For instance, if you put in a three-page PDF image, all three pages will be treated as one document.

򐂰 FlexIDSingleTiffs

This is identical to DemoSingleTiffs except that a FlexID task is inserted between scanning and processing to manually identify pages.

򐂰 FlexID MultiFormat

This one is identical to DemoMultiFormat but with the FlexID task inserted.

򐂰 FlexIDWebScan

This is identical to DemoWebScan but with ProtoID inserted to manually identify pages before processing.

7.3.2 The Learning template tasks

The green boxes in the middle of Figure 7-2 on page 149 (shown previously) identify the rulesets that are in every task:

򐂰 Import Files

This task contains the compiled user interface (UI) ruleset to input files from the disk.

򐂰 Convert Files to Images

This one is not shown in Figure 7-2 on page 149, but it is available by clicking either of the MultiFormat jobs. These are ready for use and configured to convert all image types that Datacap can currently convert to 300 dpi Group 4 bi-tonal TIFF images for processing.

򐂰 ManagedRotation

This task automatically rotates the images with the ScanSoft recognition engine. It is run in a managed fashion, so if there is an error, the recognition engine automatically restarts and continues processing.

򐂰 Image Enhancement

This task enhances the image by deskewing, removing lines, shaded backgrounds, and so on. Be conservative when setting this up, becauses it will apply to all images. In this example, we must make sure that it does not erase any lines in the bar code.

򐂰 PageID

This identifies pages according to the order of known pages. By default, it is expecting particular document separator sheets (those used with Flex and APT). It can also create documents based on the input image structure, making one document per multipage image. Because our application is using the bank’s own letter for document separation, we need to reconfigure this ruleset.

򐂰 CreateDocuments

This task creates the document structure, based on the pages that you set up and configure. In our example, the package that the customer submits with the cover letter will be the first page of the document, followed by the first page of the attached bank

statement, designated as Main_Page, followed by any additional pages of the bank statement, which will be designated as Trailing_Pages.

򐂰 Recognize Pages and Fields

This runs a managed full-page OCR recognition of all pages (except attachments). In our application, everything is named CoverLetter, Main_Page, or Trailing_Page, so there will not be attachments designated in PageID.

򐂰 Fingerprint

This task tries to match Main_Page to all known instances of Main_Page. If there was a previous example of a particular bank statement recorded, the Learning template matches against the existing fingerprint for that document so that we can use known zones for extracting the data. If there is not a previous example of a particular bank statement, the Learning template creates a new fingerprint automatically so that zones can be saved to the systemfor future encounters of statements from that bank.

򐂰 Locate

Locate extracts data from zones if they are known for a particular statement and uses keyword searches or regular expressions to try to locate data if zones are unavailable or not found. This ruleset varies widely for different types of data that you need to capture, so in the template, we need to set this up.

򐂰 Lookup

Lookup returns the fingerprint class of known fingerprints into the Fingerprint_Class field. In this application, we store fingerprints by the particular bank name that is associated with them. Therefore, this Fingerprint_Class field will hold the Institution, one of the date elements that we want to capture. Often, these types of values are in logos that are unrecognizable by OCR, so this is a more reliable way of populating institution names.

򐂰 Validate

Like the Locate ruleset, this ruleset must be set up after the fields are added to the project. The purpose of the ruleset is to make sure that data conforms to the business rules. Every field should be validated in some way, to ensure compliance with data types and length restrictions in whatever system we are exporting to. Notice that Validate runs at least twice: Once during the Profiler task and again in the Verify task each time that an operator clicks Submit on a document.

򐂰 Routing

This is used to prepare the documents for the Verify operator. Pages that violate a business rule are automatically shown to operators, but we need to also check that the data did not fall below the required confidence for the greatest accuracy, even though it might have passed a business rule. This ruleset is also commonly used to clean up any fields before they go to Verify. In our case, when the Lookup ruleset runs, it puts the <New>

value in the Fingerprint_Class field when the lookup occurs, and we will be looking for that value and blanking it so that the operator does not have to erase before typing in the correct value.

򐂰 SetStatuses

Different clients set rejection statuses in various ways. In our application, the rejection statuses are Delete, Rescan, and Review. Those statuses can be set directly in some but not all clients, so the Learning template also provides a drop-down menu to set them at Verify time. Regardless of which method an operator chooses to reject a document, this makes sure that the status is set correctly so that the rejected documents will not go to the repository and the appropriate people are notified.

򐂰 PreExport

This is where the zones are saved to any <New> fingerprint that was created. Because it runs after Verify, we should now know where all of the required data is located on this particular type of bank statement. This ruleset is also usually configured to make the output image required by the system that we export to.

򐂰 Export

This ruleset outputs data in a simple way to a text file. For production systems, Datacap offers many different types of export rulesets, and you can export to as many systems as you want.

򐂰 Process Exceptions

This is where we handle any documents that were rejected by the operator (Delete, Rescan, Review). Customer requirements can vary widely, but typically this ruleset would be configured to export to some business management workflow so that appropriate action can be taken, but some customers just want notification so that documents can be fixed as necessary and inserted into future batches.

7.4 Building the document structure for the MktBankStmt

In document POLÍTICA NACIONAL PARA EL SUBSECTOR DE AGUA POTABLE DE COSTA RICA (página 82-93)