Comparison with Other Techniques - New Application 3: Hand Printed Digit Recognition

1. Choosing an appropriate feature space for representing and detecting faces

3.5 New Application 3: Hand Printed Digit Recognition

3.5.3 Comparison with Other Techniques

database.

bias term. In general, building a Gaussian RBF network for a given learning task involves (1) determining the total number of Gaussian basis functions to use for each output class and for the entire system, (2) locating the Gaussian basis function centers, (3) computing the cluster variance for each Gaussian basis function, and (4) solving for the weight coecients and bias in the summation term. One can implement a 2-way pattern classier on input vectors ^~^x as a Gaussian RBF network by dening an appropriate output threshold that separates the two pattern classes.

In this rst system, we implement each individual digit recognizer as a spherical Gaussian RBF network, trained with a classical RBF algorithm. Given a specied number of Gaussian basis functions for each digit class, the algorithm separately computes the Gaussian centers and variances for each of the 10 digit classes to form the system's RBF kernels. The algorithm then solves for an optimal set of weight parameters between the RBF kernels and each output node to perform the desired digit recognition task. Our actual training process constructs all 10 digit recognizers in parallel so one can re-use the same Gaussian basis functions among the 10 digit recognizers. To avoid overtting the available training data with an overly complex RBF classier connected to every Gaussian kernel, we use a

\boot-strap" like operation that selectively connects each recognizer's output node to only a \relevant" subset of basis functions. The idea is similar to how we choose relevant \near- miss" clusters for each individual digit recognizer in the original system. The full training procedure proceeds as follows:

1. The rst training task is to determine an appropriate number of Gaussian kernels for each digit class. This information is needed to initialize our clustering procedure for computing Gaussian RBF kernels. Because the support vector algorithm in the second system automatically computes an \optimal" number of RBF kernels for each digit class, we simply use the same gures from the second system to initialize our clustering procedure (see Table 3.3).

2. Our next task is to actually compute the desired number of Gaussian kernels for each digit class. We do this by separately performing classical k-means clustering on each digit class in the USPS training database. Each clustering operation returns a set of Gaussian centroids and their respective variances for the given digit class. Together, the Gaussian clusters from all 10 digit classes form the system's RBF kernels.

Digit Class 0 1 2 3 4 5 6 7 8 9 Number of Kernels 172 77 217 179 211 231 147 133 194 166 Table 3.3: Number of Gaussian kernels in each digit class used for initializing the classical RBF digit recognition system. These are the number of distinct example patterns from each class that the second system chooses assupport vectors.

3. For each single-digit recognizer, we build an initial RBF network using only Gaussian kernels from its target class. We then separately collect all the false positive mistakes each initial digit recognizer makes on the USPS training database.

4. In the nal training step, we augment each initial digit recognizer with additional Gaussian kernels from outside its target class to help reduce mis-classication errors.

We determine which Gaussian kernels are \relevant" for each recognizer as follows: For each false positive mistake the initial recognizer makes during the previous step, we look up the mis-classied pattern's actual digit class and include the nearest Gaussian kernel from its class in the \relevant" set. The nal RBF network for each single- digit recognizer thus contains every Gaussian kernel from its target class, and several

\relevant" kernels from the other 9 digit classes. Because our nal digit recognizers have fewer weight parameters than a naive system that fully connects all 10 recognizers to every Gaussian kernel, we expect our system to generalize better on new data.

Support Vector Gaussian RBFs

In the classical RBF system, we used a clustering technique that computes Gaussian kernels irrespective of the exact recognition task to be solved. One can view the clustering operation as building a separate distribution-based model for each digit class using spherical Gaussian clusters. The RBF digit recognizers classify new digit patterns by determining how \similar"

they are to each of the 10 digit manifolds, based on distance measurements to the Gaussian kernels. In this second system, we build a similar Gaussian RBF-based 10 class digit recognizer using a dierent initialization technique, called the support vector algorithm [26], that concentrates Gaussian kernels at feature space locations critical for the recognition task at hand. The support vector algorithm is a general procedure that sieves through example databases for useful data subsets relevant to a given learning task. The algorithm works for many dierent learning machine architectures, and the resulting data subsets (i.e. the

Digit Recognizer 0 1 2 3 4 5 6 7 8 9

# Support Vectors 274 104 377 361 334 388 236 235 342 263 Table 3.4: Number of support vectors for each each digit recognizer. Notice that for each digit recognizer, thesupport vector set contains both positive and negative example patterns, i.e. patterns from within and outside the target class. The same digit pattern can be a support vector for two or more recognizers. Table 3.3 shows the number of distinct patterns from each digit class selected as support vectors.

support vector sets) for dierent architectures are often almost identical. Interestingly, for RBF networks, the support vector sets also serve well as locations for Gaussian centers. We shall only briey describe the support vector algorithm with particular emphasis on its role as a mechanism for dening and locating Gaussian kernels in RBF networks. The interested reader should refer to the following papers for further details: [102] [14] [26].

The support vector algorithm is based in part on the idea of structural risk minimization [102], whose motivation can be summarized follows: In example-based function- approximation learning, the goal is to synthesize an approximation function that (1) maps input examples onto their respective output values, and (2) reasonably predicts output values at input locations where no examples are available. This second property is commonly known as the learner's generalization ability. Together, one can quantify the above two constraints in terms of a risk measure that depends on the number of training examples and the VC-dimension [100] [101] [1] (i.e. complexity) of the approximation function class.

We refer the reader to [80] for a more detailed and mathematical treatment of structural risk minimizationand function-approximation learning.

When available training data is limited, one must constrain the learning machine's structural complexity in order to minimize risk and generalize reasonably. Structural risk minimizationchooses the function of \optimal" complexity from an approximation function class so that the resulting risk is minimal. The support vector algorithm essentially performs structural risk minimization on an approximation function class whose structure is a set of hyperplanes. For spherical Gaussian RBF networks, the algorithm minimizes risk by determining the number of Gaussian kernels that leads to best generalization. In our current RBF support vector algorithm formulation, we deal with a structure in which all Gaussian kernels must have the same xed user-specied variance.

We use the support vector algorithm to construct 10 RBF-based single-digit recognizers with xed Gaussian variances of ² = 38^:4, each trained to separate a given digit from

Classication Error Rate

USPS Database Original Classical RBF Support Vector RBF

Training (7291 patterns) 0.33% 1.73% 0.01%

Test (2007 patterns) 5.33% 6.73% 4.88%

Table 3.5: 10-class digit recognitionerror ratesfor three dierent system architectures. The rst system is based on the pattern identication framework within our proposed object and pattern class detection approach. The other two are the Gaussian RBF-based systems we trained, one with a classical RBF algorithm and the second with the support vector algorithm. The test results show that our proposed pattern identication framework compares reasonably well against classical digit recognition architectures, hence suggesting that it is indeed general enough to even model and capture pattern variations in problem domains that are essentially patternrecognitionin spirit.

the other 9. We experimented with several ² values and chose the setting with the best recognition result on the USPS test database. For each single-digit recognizer, the support vector algorithmselects a set of positive and negative example digit patterns from the USPS trainingdatabase as Gaussian kernel centers. Table 3.4 shows the number of support vectors selected for each recognizer. Notice that the same digit pattern can be chosen as a support vector for two or more digit recognizers. In Table 3.3, we show the number of distinct patterns from each digit class that have been selected as support vectors. We use these gures in the rst classical RBF system as an appropriate number of Gaussian kernels for each digit class.

In document Learning and Example Selection for Object and Pattern Detection (página 120-124)