Learning and Example Selection for Object and Pattern Detection

While there have been some successful object and pattern detection systems in the past, most such systems only deal with specific rigid objects or patterns that can be accurately described by fixed geometric models or image templates. Knowledge about the desired task is encoded in a set of examples provided by the \programmer".

Problem Denition

Detecting Spatially Well-Dened Pattern Classes

Ideally, an object and pattern detection approach should be able to deal with the full spectrum of non-rigid, highly articulated and arbitrarily shaped objects, as well as highly varied classes of objects and patterns in different contextual settings. Instead, we will investigate a reduced version of the general object and pattern detection problem, which deals only with image patterns whose spatial boundaries can be well estimated a priori.

Formulation

Goals

Pattern Detection, Recognition and Classication

Pattern Detection and Recognition

Here, the task is to determine the identity of a person from an input image of an isolated human face. Recall that the face recognition problem is identifying a person from an input face image.

Pattern Detection and Recognition as Classication Problems

The input class X is the set of all suitably segmented human face images, and the set of output class labels is W = fPerson1;:::;PersonN;Unknown. Each person's face recognizer identifies all face images of a given person from an input domain of all suitably segmented human face images.

Diculty

The set of output class labels is W =fFace;Non,Faceg, and the face detector is simply a classifier that performs the following mapping: F(x) =Faceif x2X is a face window model, and. Therefore, it can be represented as a 2-class model classification problem whose input classX is the set of all properly segmented face images and whose output class labels are W =fKnownPerson;UnknownPerson .

Previous Work in Recognizing and Detecting Spatially Well-Dened Patterns

View-based Correlation Templates
Sub-Space Methods
Deformable Templates
Image Feature Invariants

A closely related extension of view-based correlation proposals is the linear combination approach for modeling object appearances [97]. As with template-based and linear combination approaches, the technique collects a large set of views for a target object or class of patterns sampled under all the different conditions we wish to consider.

Example-based Learning for Object and Pattern Detec- tion

Example-based Learning and Function Approximation
Pattern Classication as a Function Approximation Problem
Exploiting Prior Knowledge in Learning
Selecting Useful Examples

Most learning researchers today would agree that the availability of training examples is perhaps the most critical limitation in an example-based learning problem. In example-based learning, how well a learner ends up doing depends largely on the quality of training examples he receives.

Thesis Outline and Contributions

Finally, we show how simplification of the original formulation leads to practical examples of selection strategies like the \boot-strap" paradigm used by our approach for training object and pattern detection. We describe the construction of a generic human face detection system based on our proposed object and pattern detection technique.

The Face Detection Problem

Motivation

Application wise, face detection has direct relevance to the face recognition problem because the first important step of a fully automatic human face recognizer is usually one of identifying and detecting faces in an unknown image. From the point of view of this thesis, we are interested in face detection because faces constitute a natural and challenging class of spatially well-obscured image patterns to demonstrate and test our object detection methodology.

Diculty

A successful methodology for finding faces should generalize well to other spatially well-defined pattern and feature detection problems.

Dierences in Facial Appearance and Expression. Although most faces are similarly structured with the same facial features arranged in roughly the same spatial

Presence or Absence of Common Structural Features. Face detection is also made dicult because certain common but signicant features, such as glasses or a

Approach and System Overview

Detecting Faces by Local Pattern Matching
The Face Classication Procedure

Each window pattern is classified as \a face" or \not a face" based on a set of local image metrics. Based on a set of \dierence" measurements, the trained classifier identifies a new pattern as \a face" or \not a face".

Choosing an appropriate feature space for representing and detecting faces

More Ecient Search Strategies

For the second component, our current implementation uses a very naive search paradigm that fully scans an image for faces across all possible locations and scales. Apart from computational efficiency, our extended search paradigm is an excellent test framework for the window identification procedure.

Dening a Stable Feature Space

Illumination Gradient Correction: This is a normalization operation that subtracts the best brightness plane from the unmasked window pixels. Histogram Equalization: This is another normalization operation that adjusts for several geometry-independent sources of window pattern variation.

A Distribution-based Face Model

Identifying the Canonical Face Manifold
Representing the Face Distribution
Modeling the Distribution of \Face" Patterns | Clustering for Positive Prototypes
Modeling the Distribution of \Non-Face" Patterns | Clustering for Negative Prototypes
Summary and Remarks

Our \non-face" data samples are specially selected patterns that lie near the boundaries of the canonical face manifold. These \non-face" clusters separate negative regions around the \face" clusters that do not correspond to face patterns.

Matching Patterns with the Model

A 2-Value Distance Metric
The Normalized Mahalanobis Distance
The First Distance Component | Distance within a Normalized Low-Dimensional Mahalanobis Subspace
The Second Distance Component | Distance from the Low-Dimensional Mahalanobis Subspace
Relationship between our 2-Value Distance and the Mahalanobis Distance
Relationship to Probabilistic Models
Relationship between our 2-Value Distance and some PCA-based Classical Distance Measures

The approach represents the space of all face images (i.e., \face space") as a linear combination of several orthonormal eigenimages. Similarly, one can determine whether or not a given pattern is a face by measuring how well or poorly eigen-images reconstruct the model | i.e., the distance of the model (din Figure 2-11) from the \face space" [69].

The Classier

A Multi-Layer Perceptron Classier
Generating and Selecting Training Examples

There are 4150 positive examples of \face" patterns in the database, and the rest are \non-face" patterns. This \boot-strap" strategy reduces the number of \non-face" patterns needed to train a very robust face detector. Start with a small and possibly very unrepresentative set of \non-face" examples in the training database.

Experimental Results

We use this database to obtain a ``best case'' detection rate for our system on high-quality input patterns. For the first database, our system correctly classifies 96:3% of all face patterns and makes only 3 false detections. Although this database mainly contains face images with simple background patterns, it is still a good test set to evaluate the ability of our system to successfully identify face patterns because this operation does not depend on the background appearance of the image.

Other Systems

Sinha
Moghaddam and Pentland
Rowley, Baluja and Kanade

The study is part of an effort to identify and characterize the key components of our face detection system, and more generally, our approach to detecting objects and classes of well-defined patterns in space. In this chapter, we generalize our front-view human face detection approach to a scheme for detecting spatially well-grounded objects and pattern classes in images. We begin by reviewing the main architectural components of our face detection system within the framework of a general object class and pattern detection paradigm.

Overview of the General Object and Pattern Detection Approach

Dening a Suitable Feature Space for Representing Target Patterns
Modeling the Target Pattern Distribution
Learning a SimilarityMeasure between New Patterns and the Distribution- based Target Model

Given a new detection task, how can a suitable feature space be found for modeling the target pattern distribution. Ideally, we want an exact model that represents the actual target pattern distribution in the chosen feature space. Our approach uses a piecewise smooth Gaussian mixture density estimation technique to represent the empirical target pattern distribution in the chosen feature space.

Analyzing the System's Components

The Existing Human Face Detection System
The Experiments
Performance Statistics and Interpretation

The matching stage computes the same 2-valued distance metric that our original system uses; that is, for each Gaussian clustering, D1 is the normalized Mahalanobis distance in a subspace of the 75 largest eigenvectors, and D2 is the Euclidean distance between the test pattern and the subspace. Why does the nearest neighbor classifier have a much lower face detection rate than the other two network-based classifiers when used with our 2-valued distance metric. Both the multi-layer perceptron net and single perceptron classifier systems produce better performance statistics with our 2-value distance metric than with the other three distance measures, especially on images from the second test database.

New Application 1: Variable Pose Human Face Detec- tion

Implementation Details
Results

The mask does a reasonable job of eliminating background pixels in both left-rotated and frontal face patterns. The non-frontal face patterns in the database account for pattern variations due to changes in posture. However, we acknowledge that there are more robust methods to eliminate background pixels from non-frontal face patterns.

New Application 2: Human Eye Detection

Implementation Details
Results

The system misses Kirk's right eye and makes a false detection on his stage. The system makes only one false detection in this rather complex scene (lower left edge of the bowl). A false detection in this image where the system mistakes a button on the girl's sleeve for an eye.

New Application 3: Hand Printed Digit Recognition

The United States Postal Service Digit Database
Design of the Digit Recognizer
Comparison with Other Techniques
Results

The first training task is to determine an appropriate number of Gaussian kernels for each digit class. Our next task is to actually calculate the desired number of Gaussian kernels for each digit class. We use these indicators in the first classical RBF system as an appropriate number of Gaussian kernels for each digit class.

Background and Approach

Regularization Theory and Function Approximation | A Review
A Bayesian Framework

Unfortunately, an obvious problem with the approach is that both the IMSE and the analytical expression for its decrement (not shown) assume a known objective function. We want a strategy to determine at which input location to sample the next data point, (xn+1~ ;yn+1), to obtain the \best possible Bayes optimal approximation of the unknown target function g with our concept class F. Reject what we mean by the \best possible Bayes optimal approximation of an unknown objective function.

Formalize mathematically the task of determining where in input space to sample the next data point. We express the above mentioned optimality criterion

The Active Learning Formulation

An Optimality Criterion for Learning an Unknown Target Function
Evaluating a Solution to an Unknown Target | The Expected In- tegrated Squared Dierence
Selecting the Next Sample Location
Summary of the Active Learning Procedure

An approximation function, ^g, for two sets of data points sampled from two (possibly different) unknown target functions. Bottom row: graphs showing the a-posteriori probability distribution of the unknown target in the approximation function space for the two systems. Consider the same two systems in the top row of Figure 4.2, where ^g is the regularized solution for the two unknown objective functions g1 and g2.

Comparing Sample Complexity

Unit Step Functions
Polynomial Approximators
Gaussian Radial Basis Functions

The approximation function class uses a higher degree polynomial with larger Gaussian variances on its coefficients (K = 9 and j = 0:9j +1) versus (K = 8 enj = 0:8j +1). Here the class of approximation functions is less complex and favors smooth estimates more strongly than the target class. Bottom graph: The target and approximation function classes have slightly different center locations and priors on model parameters.

Active Example Selection and the \Boot-strap" Paradigm

A Simpler Example Utility Measure
Example Selection in a Real Pattern Detection Training Scenario
The \Boot-strap" Paradigm
The \Boot-strap" Paradigm and Epsilon Focusing
Sample Complexity of \Boot-strapping"

To limit the number of \non-face" patterns in our training database, we introduced in Section 2.6.2 a \boot-strap" paradigm that incrementally selects \non-face" patterns that are highly relevant to the learning problem. Man can reason around the \boot-strap" paradigm as a variant of the simplified sample selection heuristic discussed earlier. This is because we have used \boot-strapping" to select only better \non-face" patterns, while the total number of \face" patterns is unchanged.

Improving Detection Rates by Combining Classiers

Network Boosting
Maintaining Independence between Classiers

The first and second classifiers disagree (with probability 2(1,)), and the third classifier mislabels the input pattern (with probability). Draw from the stream the required number of data samples and train the first network classifier. Use the first network classic along with the \innite" example stream to generate a second training set as follows: Flip a fair coin.

Building Hierarchical Architectures from Simple De- tectors

Combining Sub-Pattern Detection Results with Multi-Layer Per- ceptron Nets
Handling Sub-Patterns with Variable Position and Orientation
Finding Sets of Sub-Patterns from an Articulate Target

For a highly structured and relatively inarticulate target class such as human faces, there is usually very little variation in the position and orientation of its subpattern components. However, there can be considerable variation in the location and orientation of each subpattern component. We believe that one can use the same general techniques developed in this area to also limit the search for the subpattern components of an articulated target class.

Polynomial Approximators

The A-Posteriori Function Class Distribution
The Polynomial EISD Measure
The Total Output Uncertainty Measure
The Polynomial Smoothness Prior

For our polynomial approximation function class, the optimal estimate has givenDn model parameters ^~a(Equation A.6), since P(~ajDn) has a global maximum here. Equation A.5 shows that this only depends on the polynomial function class priors F, the output noise variances2 and the previously sampled input locationsfx1;x2;:::;xng. However, recall from equation A.5 that for this polynomial function class, the EISD between g and its estimate ^g depends only on the input xi values in Dn and not on the observed yi values.

Gaussian Radial Basis Functions

The A-Posteriori Function Class Distribution

This means that we can simply remove the two \constant terms from the exponent and insert into equation A.21 the appropriate normalizing constants soP(~ajDn) becomes a standard probability distribution on~a:. A:24) Thus, RBF a The -posterior distribution is a multivariate Gaussian centered at ^~a (Equation A.23) with covariance n (Equation A.22).

The RBF EISD Measure

For our class of RBF approximation function, the optimal estimate relaxed Dn has model parameters ^~a (equation A.23), since the a-posterior distribution P(~ajDn) has a global maximum. Consider equation A.22, which depends only on the RBF function class before F, K fixed RBF Gaussian kernels fGi()ji= 1;:::;Kg, the variance of the output noise 2s and the previously sampled input locations fx1;x2 ;:: ::xng. In other words, previously observed data values of y in Dn do not affect the EISD measurement (equation A.27) for this concept class of Gaussian RBF.

The RBF Total Output Uncertainty Measure

Recognition and localization of overlapping parts from sparse data in two and three dimensions.