While there have been some successful object and pattern detection systems in the past, most such systems only deal with specific rigid objects or patterns that can be accurately described by fixed geometric models or image templates. Knowledge about the desired task is encoded in a set of examples provided by the \programmer".
Problem Denition
Detecting Spatially Well-Dened Pattern Classes
Ideally, an object and pattern detection approach should be able to deal with the full spectrum of non-rigid, highly articulated and arbitrarily shaped objects, as well as highly varied classes of objects and patterns in different contextual settings. Instead, we will investigate a reduced version of the general object and pattern detection problem, which deals only with image patterns whose spatial boundaries can be well estimated a priori.
Formulation
Goals
Pattern Detection, Recognition and Classication
Pattern Detection and Recognition
Here, the task is to determine the identity of a person from an input image of an isolated human face. Recall that the face recognition problem is identifying a person from an input face image.
Pattern Detection and Recognition as Classication Problems
The input class X is the set of all suitably segmented human face images, and the set of output class labels is W = fPerson1;:::;PersonN;Unknown. Each person's face recognizer identifies all face images of a given person from an input domain of all suitably segmented human face images.
Diculty
The set of output class labels is W =fFace;Non,Faceg, and the face detector is simply a classifier that performs the following mapping: F(x) =Faceif x2X is a face window model, and. Therefore, it can be represented as a 2-class model classification problem whose input classX is the set of all properly segmented face images and whose output class labels are W =fKnownPerson;UnknownPerson .
Previous Work in Recognizing and Detecting Spatially Well-Dened Patterns
- View-based Correlation Templates
- Sub-Space Methods
- Deformable Templates
- Image Feature Invariants
A closely related extension of view-based correlation proposals is the linear combination approach for modeling object appearances [97]. As with template-based and linear combination approaches, the technique collects a large set of views for a target object or class of patterns sampled under all the different conditions we wish to consider.
Example-based Learning for Object and Pattern Detec- tion
- Example-based Learning and Function Approximation
- Pattern Classication as a Function Approximation Problem
- Exploiting Prior Knowledge in Learning
- Selecting Useful Examples
Most learning researchers today would agree that the availability of training examples is perhaps the most critical limitation in an example-based learning problem. In example-based learning, how well a learner ends up doing depends largely on the quality of training examples he receives.
Thesis Outline and Contributions
Finally, we show how simplification of the original formulation leads to practical examples of selection strategies like the \boot-strap" paradigm used by our approach for training object and pattern detection. We describe the construction of a generic human face detection system based on our proposed object and pattern detection technique.
The Face Detection Problem
Motivation
Application wise, face detection has direct relevance to the face recognition problem because the first important step of a fully automatic human face recognizer is usually one of identifying and detecting faces in an unknown image. From the point of view of this thesis, we are interested in face detection because faces constitute a natural and challenging class of spatially well-obscured image patterns to demonstrate and test our object detection methodology.
Diculty
A successful methodology for finding faces should generalize well to other spatially well-defined pattern and feature detection problems.
Dierences in Facial Appearance and Expression. Although most faces are similarly structured with the same facial features arranged in roughly the same spatial
Presence or Absence of Common Structural Features. Face detection is also made dicult because certain common but signicant features, such as glasses or a
Approach and System Overview
- Detecting Faces by Local Pattern Matching
- The Face Classication Procedure
Each window pattern is classified as \a face" or \not a face" based on a set of local image metrics. Based on a set of \dierence" measurements, the trained classifier identifies a new pattern as \a face" or \not a face".
Choosing an appropriate feature space for representing and detecting faces
More Ecient Search Strategies
For the second component, our current implementation uses a very naive search paradigm that fully scans an image for faces across all possible locations and scales. Apart from computational efficiency, our extended search paradigm is an excellent test framework for the window identification procedure.
Dening a Stable Feature Space
Illumination Gradient Correction: This is a normalization operation that subtracts the best brightness plane from the unmasked window pixels. Histogram Equalization: This is another normalization operation that adjusts for several geometry-independent sources of window pattern variation.
A Distribution-based Face Model
- Identifying the Canonical Face Manifold
- Representing the Face Distribution
- Modeling the Distribution of \Face" Patterns | Clustering for Positive Prototypes
- Modeling the Distribution of \Non-Face" Patterns | Clustering for Negative Prototypes
- Summary and Remarks
Our \non-face" data samples are specially selected patterns that lie near the boundaries of the canonical face manifold. These \non-face" clusters separate negative regions around the \face" clusters that do not correspond to face patterns.
Matching Patterns with the Model
- A 2-Value Distance Metric
- The Normalized Mahalanobis Distance
- The First Distance Component | Distance within a Normalized Low-Dimensional Mahalanobis Subspace
- The Second Distance Component | Distance from the Low-Dimensional Mahalanobis Subspace
- Relationship between our 2-Value Distance and the Mahalanobis Distance
- Relationship to Probabilistic Models
- Relationship between our 2-Value Distance and some PCA-based Classical Distance Measures
The approach represents the space of all face images (i.e., \face space") as a linear combination of several orthonormal eigenimages. Similarly, one can determine whether or not a given pattern is a face by measuring how well or poorly eigen-images reconstruct the model | i.e., the distance of the model (din Figure 2-11) from the \face space" [69].
The Classier
- A Multi-Layer Perceptron Classier
- Generating and Selecting Training Examples
There are 4150 positive examples of \face" patterns in the database, and the rest are \non-face" patterns. This \boot-strap" strategy reduces the number of \non-face" patterns needed to train a very robust face detector. Start with a small and possibly very unrepresentative set of \non-face" examples in the training database.
Experimental Results
We use this database to obtain a ``best case'' detection rate for our system on high-quality input patterns. For the first database, our system correctly classifies 96:3% of all face patterns and makes only 3 false detections. Although this database mainly contains face images with simple background patterns, it is still a good test set to evaluate the ability of our system to successfully identify face patterns because this operation does not depend on the background appearance of the image.
Other Systems
- Sinha
- Moghaddam and Pentland
- Rowley, Baluja and Kanade
The study is part of an effort to identify and characterize the key components of our face detection system, and more generally, our approach to detecting objects and classes of well-defined patterns in space. In this chapter, we generalize our front-view human face detection approach to a scheme for detecting spatially well-grounded objects and pattern classes in images. We begin by reviewing the main architectural components of our face detection system within the framework of a general object class and pattern detection paradigm.
Overview of the General Object and Pattern Detection Approach
- Dening a Suitable Feature Space for Representing Target Patterns
- Modeling the Target Pattern Distribution
- Learning a SimilarityMeasure between New Patterns and the Distribution- based Target Model
Given a new detection task, how can a suitable feature space be found for modeling the target pattern distribution. Ideally, we want an exact model that represents the actual target pattern distribution in the chosen feature space. Our approach uses a piecewise smooth Gaussian mixture density estimation technique to represent the empirical target pattern distribution in the chosen feature space.
Analyzing the System's Components
- The Existing Human Face Detection System
- The Experiments
- Performance Statistics and Interpretation
The matching stage computes the same 2-valued distance metric that our original system uses; that is, for each Gaussian clustering, D1 is the normalized Mahalanobis distance in a subspace of the 75 largest eigenvectors, and D2 is the Euclidean distance between the test pattern and the subspace. Why does the nearest neighbor classifier have a much lower face detection rate than the other two network-based classifiers when used with our 2-valued distance metric. Both the multi-layer perceptron net and single perceptron classifier systems produce better performance statistics with our 2-value distance metric than with the other three distance measures, especially on images from the second test database.
New Application 1: Variable Pose Human Face Detec- tion
- Implementation Details
- Results
The mask does a reasonable job of eliminating background pixels in both left-rotated and frontal face patterns. The non-frontal face patterns in the database account for pattern variations due to changes in posture. However, we acknowledge that there are more robust methods to eliminate background pixels from non-frontal face patterns.
New Application 2: Human Eye Detection
- Implementation Details
- Results
The system misses Kirk's right eye and makes a false detection on his stage. The system makes only one false detection in this rather complex scene (lower left edge of the bowl). A false detection in this image where the system mistakes a button on the girl's sleeve for an eye.
New Application 3: Hand Printed Digit Recognition
- The United States Postal Service Digit Database
- Design of the Digit Recognizer
- Comparison with Other Techniques
- Results
The first training task is to determine an appropriate number of Gaussian kernels for each digit class. Our next task is to actually calculate the desired number of Gaussian kernels for each digit class. We use these indicators in the first classical RBF system as an appropriate number of Gaussian kernels for each digit class.
Background and Approach
- Regularization Theory and Function Approximation | A Review
- A Bayesian Framework
Unfortunately, an obvious problem with the approach is that both the IMSE and the analytical expression for its decrement (not shown) assume a known objective function. We want a strategy to determine at which input location to sample the next data point, (xn+1~ ;yn+1), to obtain the \best possible Bayes optimal approximation of the unknown target function g with our concept class F. Reject what we mean by the \best possible Bayes optimal approximation of an unknown objective function.
Formalize mathematically the task of determining where in input space to sample the next data point. We express the above mentioned optimality criterion
The Active Learning Formulation
- An Optimality Criterion for Learning an Unknown Target Function
- Evaluating a Solution to an Unknown Target | The Expected In- tegrated Squared Dierence
- Selecting the Next Sample Location
- Summary of the Active Learning Procedure
An approximation function, ^g, for two sets of data points sampled from two (possibly different) unknown target functions. Bottom row: graphs showing the a-posteriori probability distribution of the unknown target in the approximation function space for the two systems. Consider the same two systems in the top row of Figure 4.2, where ^g is the regularized solution for the two unknown objective functions g1 and g2.
Comparing Sample Complexity
- Unit Step Functions
- Polynomial Approximators
- Gaussian Radial Basis Functions
The approximation function class uses a higher degree polynomial with larger Gaussian variances on its coefficients (K = 9 and j = 0:9j +1) versus (K = 8 enj = 0:8j +1). Here the class of approximation functions is less complex and favors smooth estimates more strongly than the target class. Bottom graph: The target and approximation function classes have slightly different center locations and priors on model parameters.
Active Example Selection and the \Boot-strap" Paradigm
- A Simpler Example Utility Measure
- Example Selection in a Real Pattern Detection Training Scenario
- The \Boot-strap" Paradigm
- The \Boot-strap" Paradigm and Epsilon Focusing
- Sample Complexity of \Boot-strapping"
To limit the number of \non-face" patterns in our training database, we introduced in Section 2.6.2 a \boot-strap" paradigm that incrementally selects \non-face" patterns that are highly relevant to the learning problem. Man can reason around the \boot-strap" paradigm as a variant of the simplified sample selection heuristic discussed earlier. This is because we have used \boot-strapping" to select only better \non-face" patterns, while the total number of \face" patterns is unchanged.
Improving Detection Rates by Combining Classiers
- Network Boosting
- Maintaining Independence between Classiers
The first and second classifiers disagree (with probability 2(1,)), and the third classifier mislabels the input pattern (with probability). Draw from the stream the required number of data samples and train the first network classifier. Use the first network classic along with the \innite" example stream to generate a second training set as follows: Flip a fair coin.
Building Hierarchical Architectures from Simple De- tectors
- Combining Sub-Pattern Detection Results with Multi-Layer Per- ceptron Nets
- Handling Sub-Patterns with Variable Position and Orientation
- Finding Sets of Sub-Patterns from an Articulate Target
For a highly structured and relatively inarticulate target class such as human faces, there is usually very little variation in the position and orientation of its subpattern components. However, there can be considerable variation in the location and orientation of each subpattern component. We believe that one can use the same general techniques developed in this area to also limit the search for the subpattern components of an articulated target class.
Polynomial Approximators
- The A-Posteriori Function Class Distribution
- The Polynomial EISD Measure
- The Total Output Uncertainty Measure
- The Polynomial Smoothness Prior
For our polynomial approximation function class, the optimal estimate has givenDn model parameters ^~a(Equation A.6), since P(~ajDn) has a global maximum here. Equation A.5 shows that this only depends on the polynomial function class priors F, the output noise variances2 and the previously sampled input locationsfx1;x2;:::;xng. However, recall from equation A.5 that for this polynomial function class, the EISD between g and its estimate ^g depends only on the input xi values in Dn and not on the observed yi values.
Gaussian Radial Basis Functions
The A-Posteriori Function Class Distribution
This means that we can simply remove the two \constant terms from the exponent and insert into equation A.21 the appropriate normalizing constants soP(~ajDn) becomes a standard probability distribution on~a:. A:24) Thus, RBF a The -posterior distribution is a multivariate Gaussian centered at ^~a (Equation A.23) with covariance n (Equation A.22).
The RBF EISD Measure
For our class of RBF approximation function, the optimal estimate relaxed Dn has model parameters ^~a (equation A.23), since the a-posterior distribution P(~ajDn) has a global maximum. Consider equation A.22, which depends only on the RBF function class before F, K fixed RBF Gaussian kernels fGi()ji= 1;:::;Kg, the variance of the output noise 2s and the previously sampled input locations fx1;x2 ;:: ::xng. In other words, previously observed data values of y in Dn do not affect the EISD measurement (equation A.27) for this concept class of Gaussian RBF.
The RBF Total Output Uncertainty Measure
Recognition and localization of overlapping parts from sparse data in two and three dimensions.