1. Choosing an appropriate feature space for representing and detecting faces
2.4 A Distribution-based Face Model
2.4.2 Representing the Face Distribution
A Single Linear Sub-Space Representation | A Poor Model
One can model the \face" pattern distribution by tting the face data sample with a single multi-dimensional Gaussian cluster, consisting of a centroid location and a full covariance matrix. The view-based eigen-space approach to face detection by Pentland et. al. [69]
is a special case of this modeling technique. The eigen-space approach assumes that all face patterns occupy a low dimensional linear sub-space in the 1919 pixel view-based feature space, and penalizes test patterns according to their Euclidean distance to the lin- ear sub-space. This linear sub-space description is equivalent to a single multi-dimensional Gaussian cluster approximation with innitely large eigenvalues in a few eigenvector di-
Figure 2-4: Our distribution-based canonical face model. Top Row: We use a representative sample of canonical face patterns to approximate the volume of canonical face views in amasked1919 pixel image vector space. We model the \face" sample distribution with 6 multi-dimensional Gaussian clusters. Center Row: We use a selection of non-face patterns to help rene the boundaries of our Gaussian mixture approximation. We model the \non-face" sample distribution with 6 Gaussian clusters. Bottom Row:
Our nal model consists of 6 \face" clusters and 6 \non-face" clusters. Each cluster is dened by a centroid and a covariance matrix. The 12 centroids are shown on the right. Note: All the distribution plots are ctitious and are shown only to help with our explanation. The 12 centroids are real.
Figure 2-5: (a):An illustration to show that asingleGaussian cluster can be a very poor representation for an arbitrarily shaped \face" pattern distribution. (b): The two distance components we use in our scatter plots. The rst componentD1 is a distribution dependent distance between a test pattern and the multi-dimensional Gaussian centroid in a subspace of the cluster's larger eigenvectors. The second componentD2is the Euclidean distance between the test pattern and the subspace of larger eigenvectors.
rections (those spanning the linear sub-space of faces) and equal nite eigenvalues in the remaining eigenvector directions.
As illusrated by our hypothetical example in Figure 2-5(a), a single Gaussian cluster can be a very poor representation for the \face" pattern distribution if the actual distribu- tion is not unimodal. We conducted the following experiment to show that this is indeed the case | i.e., a single Gaussian distribution poorly describes the space of canonical face views. Using a face sample of 4150 patterns, we modeled the \face" distribution as a single multi-dimensional Gaussian cluster. We then chose a small number of the cluster's largest eigenvectors as basis vectors for spanning a \face space", similar in spirit to the eigen-space approach for representing faces by Pentland et. al. [69]. For each face pattern in the sample, we resolved its displacement vector from the cluster centroid into two complemen- tary components (see Figure 2-5(b)). The rst component is a Distance within Face Space measure, described in [69]. This component is computed by projecting the face pattern onto the subspace of larger eigenvectors (i.e. the \face space"), and taking a distribution dependent distance between its projection and the cluster centroid. The second component is a Distance from Face Space measure, also descibed in [69]. We use a Euclidean distance
Figure 2-6: Scatter plots to show that a single Gaussian cluster approximation poorly describes the space of canonical face views.
between the face pattern and the subspace of larger eigenvectors. We also collected some non-face patterns and similarly resolved their displacement vectors from the \face" cluster centroid into the same two distance components.
Finally, we plotted the face and non-face sample distributions in a feature space of the two distance components. Figure 2-6 shows that there is a signicant amount of overlap between the face and non-face distributions in this two distance feature space. The huge overlapping region suggests that a single Gaussian cluster poorly represents the face distri- bution, because one cannot separate the face and non-face pattern classes well using simple feature measurements derived from the representation scheme. We repeated the experiment by varying the number of eigenvectors spanning the \face space". The distribution scatter plots were qualitatively very similar for all cases.
A Piecewise Smooth Gaussian Mixture Representation
Our approach approximates the \face" pattern distribution in a piecewise smooth fashion using a few multi-dimensional Gaussian clusters (6 in our case). This model is reasonable as long as the actual face pattern distribution is locally linear, even though its global shape may be arbitrarily complex. Qualitatively, one can view the six \face" clusters as a coarse distribution-based representation of the canonical face manifold. We also use six \non-face"
clusters to help dene boundaries in and around the manifold by carving out nearby regions in the vector space that do not correspond to face patterns. Each prototype cluster is a multi-dimensional Gaussian with a centroid location and a covariance matrix that describes
the local data distribution.
We believe our piecewise smooth modeling scheme is reasonable because the actual face pattern manifold appears continuous and smoothly varying in our multi-dimensional image feature space. More often than not, a face pattern with minor spatial and/or grey- level perturbations still looks like another valid face pattern. Similarly, a non-face pattern with minor variations would most likely still appear as a non-face pattern. The piecewise smooth modeling scheme serves two important functions. First, it performs generalization by applying a prior smoothness assumption to the observed data sample distribution. This results in a stored data distribution function that is well dened even in regions of the image vector space where no data samples have been observed. Second, it serves as a tractable scheme for representing an arbitrary data distribution by means of a few Gaussian basis functions.
The bottom right image of Figure 2-4 shows the 12 cluster centroids in our canonical face model. The six \face" prototypes are synthesized by clustering the database of canonical face patterns, while the six \non-face" prototypes are similarly derived from the database of non-face patterns.