Design of the Digit Recognizer - New Application 3: Hand Printed Digit Recognition

1. Choosing an appropriate feature space for representing and detecting faces

3.5 New Application 3: Hand Printed Digit Recognition

3.5.2 Design of the Digit Recognizer

The digit recognition problem involves constructing a 10-class pattern classier for identify- ing new input patterns as one of the 10 digit classes. We have argued in Chapter 1 that one can implement an ^N-way pattern classier as ^N single-class pattern recognizers operating in parallel, with a special arbitration stage to resolve class label conicts. In this application, we adopt a similar design approach by implementing our 10-class digit recognizer as 10 single-class digit recognizers. Each single-class digit recognizer identies one of the 10 digit patterns from among the other 9 digit classes. Our arbitration scheme assumes that

Figure 3-10: Some sample handwritten digit patterns from the United States Postal Servicestraining database.

all input patterns belong to exactly one of the 10 digit classes, and simply returns the class label for the single digit recognizer with the strongest response.

We now describe how we build the individual single-class digit recognizers in our 10-class digit recognition system. Basically, we treat each single-class digit recognizer as a 2-way pattern classier, similar in spirit to the pattern identication stage within our proposed object and pattern class detection framework. In this approach, the key design issue is to appropriately model the target pattern distribution of each digit class in a suitable feature space for pattern matching purposes. Because each digit class has a common overall image structure with minor shape variations between individual patterns, one can reasonably assume that the target pattern distribution for each digit class is continuous and smoothly varying in a view-based feature space. Our implementation models each target digit class directly in the original 1616 pixel image feature space of normalized digit patterns. For our particular task, we do not need to mask away \irrelevant" pixels in our chosen view- based feature space, because all the hand-segmented digit patterns we are dealing with do not contain unwanted background structure.

In the distribution-based modeling stage, each single-digit recognizer approximates its target distribution with patterns from the USPS training database that belong to its target digit class. Because the USPS training database contains, on the average, fewer than 1000 pattern samples per digit class in a 256 dimensional view-based feature space, our actual distribution-based model represents each target pattern class with only 4 multi-dimensional Gaussian clusters to avoid over-tting the available data with too complex a model. So, each single-class digit recognizer in our overall system has a distribution-based model that contains only 4 Gaussian mixture clusters for representing its target digit class.

We use a slightly dierent \boot-strap" procedure to obtain \near-miss" pattern samples, and to synthesize \near-miss" distribution models for the 10 single-class digit recognizers. Recall that in this demonstration, we only have available 7291 positive and negative training examples from the USPS training database to construct each single-digit classier.

Because we are using the USPS training database as our sole data source, one can expect to nd only a very small number of \useful" distractor patterns to approximate the \near-miss"

distribution for each digit class. In fact, there may not even be enough \useful" distractor patterns for each digit class to reasonably construct a Gaussian mixture \near-miss" distribution model. One solution is to have each single-digit recognizer treat all its non-target

Figure 3-11: Design overview of our 10-class digit recognizer. We implement our 10-class digit recognizer as 10 single-class digit recognizers, each separating a given digit from the other 9. For each digit class, we construct a distribution-based model with 4 multi-dimensional Gaussian clusters. The MLP classier for each digit recognizer receives 2-value distances from all 4 model clusters belonging to its target digit class. It also receives 2-value distance measurements from \relevant" clusters in the other 9 digit classes that help model its \near-miss" distribution. When classifying new digit patterns, the arbitration stage returns the class label of the recognizer with the strongest response.

digits in the USPS training database as \useful" distractor patterns. Each single-digit recognizer can then have a \near-miss" distribution model with 36 Gaussian clusters, obtained directly from the target class models of the other 9 single-digit recognizers. Unfortunately, such an approach leads to a set of very high dimensional learning problems in the nal pattern classication stage, where we must now train each digit recognizer to identify its target digit class from input feature vectors of 40 2-value distance measurements.

We instead adopt an intermediate approach that also uses additional non-target digit patterns from the USPS training database to help approximate the \near-miss" pattern distribution for each digit class, without indiscriminately using the entire set of non-target patterns. Basically, the idea is to use only those non-target pattern samples near each actual

\useful" distractor pattern to help locally approximate the \near miss" distribution. The intermediate technique for rening each single-digit recognizer works in two steps. In

step one

, we collect all the false positive mistakes each initial single-digit recognizer (i.e. whose distribution-based model contains only positive clusters describing the target class distribution) makes on the USPS training digit database. This step is identical to the example selection phase in our original \boot-strap" procedure. In

step two

, we approximate the local \near-miss" distribution near every false positive example obtained from

step one

, using only a \relevant" subset of Gaussian clusters from the other 9 target class models.

We determine which Gaussian clusters are \relevant" for the given single-digit recognizer as follows: For each false positive example from

step one

, we look up its actual digit class and pick the nearest Gaussian cluster from its digit class model to approximate the local \near- miss" distribution. The set of all clusters so chosen approximates the particular recognizer's overall \near-miss" distribution, and the nal distribution-based model contains Gaussian clusters that describe both the recognizer's target digit distribution and its \near-miss"

distribution.

In the nal pattern identication stage, each single-digit recognizer uses a trained multi- layer perceptron net to identify instances of its target digit class, based on distance feature measurements between the input digit pattern and the recognizer's nal distribution-based model. Each input feature vector is an ordered set of 2-value distances between the given test pattern's location and all the recognizer's model centroids in the normalized 1616 pixel view-based feature space. We train each single-digit recognizer on distance feature vectors with appropriate output class labels from all the digit patterns in the USPS training

database.

In document Learning and Example Selection for Object and Pattern Detection (página 115-120)