Now that you know what you want to capture, you need to decide how to capture it. There are four ways of getting information from an image:
Zonal
Regular expression
Keywords
Operator entry
We need to decide how to get each piece of information from all the different bank statements that we will encounter (thousands of them, all different formats). Because this is a
LearningTemplate application, we have two tasks:
Learn how to find information about documents that we have never encountered before.
Table 7-1 lists the fields that we need to capture with the methods that we want to use to capture them for this example.
Table 7-1 Fields and method of capture
At the document level of the Locate ruleset in the template, the CCO files for Main_Page and Trailing_Page are merged so that all of the data from the bank statement is in one
easy-to-search place.
At the page level of Main_Page, the zones are read in, if they exist. Remember, zones only exist on bank statements that have made through the export process and the positions saved in an FPXML file. If an FPXML file does not exist with the zonal information with the field, the position on the fields remains 0,0,0,0.
Now, try to capture the fields. The Fingerprint_Class, Routing_Instructions, and
Add_New_Fingerprint fields should be set up to work for you already, by virtue of using the LearningTemplate. For the Fingerprint_Class field, The operator must enter the data on the first encounter, but it should automatically populate thereafter when a similar bank statement is found. The Add_New_Fingperrint and Routing_Instructions fields are simply drop-down menus, and we set their default values.
First, we work to get the bar code from the Cover_Letter onto the Customer_Number field on the Main_Page. There are several ways to do this, but the easiest is to just copy the bar code value from the Cover_Letter page to a document variable, and then fill the Customer_Number field on the Main_Page with that value.
Use rrSet(@P.GetBarCode, @D.CustomerNumber) to copy the bar code value to the document level into a variable called Customer_Number. When the following page runs (Main_Page), the document-level variable can be copied into the field by using rrSet(@D.CustomerNumber,@F).
Field to capture Method for first encounter Method thereafter
Fingerprint_Class Operator Entry Database Lookup from Fingerprint
database
AccountNmber Keyword Zonal, failing to keyword
Date Keyword Zonal, failing to keyword
Balance Keyword Keyword, failing to Zonal
Customer_Number Read from Cover_Letter bar code Read from Cover_Letter bar code
Name Database lookup from
Customer_Number field
Database lookup from
Figure 7-10 shows the two rules added to the Locate ruleset and bound to the appropriate DCO objects.
Figure 7-10 Capturing the Customer_Number into a field on Main_Page You’ll build the rulesets for AccountNumber and Date nearly identically.
When you use the PopulateZNField action, if there is no zone configured for the data. it fails, and the trailing function begins to execute. So it is safe to look for data zonally even on the first encounter of the fingerprint, because if no zone exists yet, the action runs rules to find the data programmatically by searching the merged CCO.
Therefore, this should be the first function on these two fields:
PopulateZNField
If there was a zone for a field and if there was data found at that location, the data is pulled into the field and there is no further searching. The trailing functions can then look for data programmatically.
In our example we want to use a keyword search:
FindKeyList(Keyfile name) GoRightWord(1)
Check the data type, depending on which field it is Updatefield
Figure 7-11 shows the Locate rule that is bound to Account_Number to capture the field.
Figure 7-11 The AccountNumber field configured and bound for capture
Next, we make a key file that contains a list of the labels that you want to search for, in the order that you want to find them.
We create a new text file called AcctNum.Key in the dco_MktBankStmt directory and enter the following values:
Account Number: Primary Account: Primary Account Account number: PREMIER PLUS CHECKING
As new words or phrases are encountered as labels in future statements, users can add to this file, and these labels will be checked when programmatically searching for the data. Order your list from most-specific to least-specific to get the best matches. For instance, “Primary Account:” needs to appear in the list before “Primary Account” (without the colon). As FineKeyList searches, if it looked for the value without the colon first, it would match and never check for the one with a colon. Similarly, if you find a statement in the future that just says “Account” with the number to the right, it would be best to add it toward the bottom of the list so that it will check for the more specific “Account Number” and “Account number:” first. With this structure, we search the CCO for each word in the key file until we find one. The search is through the entire document for the first word, and then the entire document for the second keyword, and so on. You can use AggregateKeyList to search the CCO for all words at one time, so be careful about adding non-specific types of labels, such as just “Account” if you are using AggregateKeyList. If a statement has “Your Account Summary” as a title, it would never search down low enough to get your account number.
You have to balance. Too far one way you will match incorrectly to the wrong word, but too far the other and you will miss some values that might have been found. With the Learning template, it is best to be toward the later end of the spectrum to perhaps miss some of the less-encountered words and phrases, because after this document has gone through Verify and the operator has provided a location for this value on a particular banking statement, the system will use that zone for future encounters with this type of statement rather than a keyword match.
Going through the other logic of the action, if the keyword is found, we go to the next word to the right and check its data type. If there is no word to the right, or if whatever is to the right is the wrong data type, it fails, and we can check a different direction from the found keyword. If everything is OK when we get the value to the right of the keyword, we update the value of the field, and we are finished.
If it fails, we try something similar, but perhaps looking below or two words to the right. With rules, you can search all around a found keyword for your proper data type.
With the Date field, we use the same structure by copying the rule from AccountNumber and just make the changes required:
1. Make a new keyword file, with the labels based on the date. 2. Change the data check to IsDateValue().
3. Be able to group the words for the date and test it.
A date such as July 4 2015 normally just goes to the word
July
. GroupWordsRight(1.5)groups any words in vertical proximity (within 1.5 spaces) to the right of first word found. This is shown in Figure 7-12. It is nearly identical to the AccountNumber rule, so you can begin with a copy.
Figure 7-12 The Date rule bound to the Date field
Balance is similar. However, because totals on these types of documents tend to float around, depending on the activity of the account, do the keyword searches first. If the search fails, use the zone from the first encounter. As before, you can start by copying the AccountNumber rule and modifying it to look at a different keyword file and check for a different data type.
Figure 7-13 shows the Balance field rule. Notice that the PopulateZNField is the third function on this field, not the first.
Figure 7-13 Configuring and binding the Balance rule
We also populate the Name field before it goes to a data entry operator, but we add that function in another ruleset.