1. Choosing an appropriate feature space for representing and detecting faces
3.5 New Application 3: Hand Printed Digit Recognition
3.5.4 Results
Classication Error Rate
USPS Database Original Classical RBF Support Vector RBF
Training (7291 patterns) 0.33% 1.73% 0.01%
Test (2007 patterns) 5.33% 6.73% 4.88%
Table 3.5: 10-class digit recognitionerror ratesfor three dierent system architectures. The rst system is based on the pattern identication framework within our proposed object and pattern class detection approach. The other two are the Gaussian RBF-based systems we trained, one with a classical RBF algorithm and the second with the support vector algorithm. The test results show that our proposed pattern identication framework compares reasonably well against classical digit recognition architec- tures, hence suggesting that it is indeed general enough to even model and capture pattern variations in problem domains that are essentially patternrecognitionin spirit.
the other 9. We experimented with several 2 values and chose the setting with the best recognition result on the USPS test database. For each single-digit recognizer, the support vector algorithmselects a set of positive and negative example digit patterns from the USPS trainingdatabase as Gaussian kernel centers. Table 3.4 shows the number of support vectors selected for each recognizer. Notice that the same digit pattern can be chosen as a support vector for two or more digit recognizers. In Table 3.3, we show the number of distinct patterns from each digit class that have been selected as support vectors. We use these gures in the rst classical RBF system as an appropriate number of Gaussian kernels for each digit class.
Chapter 4
Active Example Selection for
Function Approximation Learning
One key feature in our proposed object and pattern detection approach is the \boot-strap"
idea of sieving through extremely large training data sets for useful examples relevant to the learning problem. We have seen in our face and eye detection scenarios that it can be very dicult to manually obtain a small and representative sample of \non-face" and
\non-eye" patterns as training examples. Without a reasonable example selection strategy, the negative example sets in both these scenarios can grow hopelessly large, making the learning problems intractable.
In this chapter, we take a more formal look at the problem of selecting high utility examples for training pattern detection systems. The example selection problem falls under a newly emerging general area of research, called active learning, that investigates how learners can pose intelligent queries to teachers under various learning scenarios, to achieve
\better" learning results. Active learning diers from traditional example-based learning paradigms in the following way: Rather than passively accepting training examples that randomly describe a target concept, an active learner uses information derived from its current state and prior knowledge about the target concept to intelligently gather useful examples from specic input space locations for further training. By carefully generating intelligent queries instead of performing random sampling, one can expect active learning techniques to have faster learning rates and better approximation results than traditional example-based learning algorithms.
Our main focus in this chapter is on active example selection strategies for a func- tion approximation based learning framework. Specically, we address the following three questions:
1. Given a function approximation based learning task and some prior information about the target function, are there principled strategies for selecting useful training data in some \optimal" fashion?
2. Assuming such principled data selection strategies do exist, do these active strate- gies require fewer examples than classical learning techniques to approximate target functions to the same degree of accuracy?
3. Can one directly apply these active example selection strategies to real-world function approximation learning tasks like our pattern detection scenarios, or easily adapt them into more feasible forms without losing too much of their original avor?
We begin by proposing an active example selection formulation for function approxima- tion learning to show that one can indeed select high utility examples for a given task in a principled and \optimal" fashion. While the formulation we propose is computationally intractable in its original form for a wide range of approximation function classes, we see it as a possible benchmark for evaluating other active example selection schemes. We next show how the general formulation can be used to derive precise data selection algorithms for three specic approximation function classes: (1) unit step functions, (2) polynomial ap- proximators and (3) Gaussian radial basis function networks. For all three function classes, we provide either theoretical or empirical results suggesting that the active strategy learns the target function with fewer data examples than random sampling. Finally, we consider a reduced version of the original active learning formulation that essentially hunts for new data where approximation \error bars" are high. We show how such a scheme, with mi- nor modications, leads to the \boot-strap" example selection strategy we have adopted in our object and pattern class detection approach. Although the \boot-strap" strategy loses some of the original active learning avor and may thus be \sub-optimal" in its choice of new examples, we show empirically that it still outperforms random sampling in training a frontal face detection system, and is therefore still an eective means of dealing with unmanageably large data sets to make learning tasks tractable.