7.2 Introducir destino
7.2.4 Destinos especiales
In this section, three different evaluations are proposed to investigate the recognition performance of different facial patches, providing the motivation to extract the discriminative features for 3D nose identification. The first of these is to calculate the within-class discrepancy
over many kinds of expression to demonstrate that the nose region is relatively rigid and its structure is more stable when expression variations occur on the facial surface.
The other two parts concentrate on evaluating the recognition performance using both large and small scale patches on the human face. This also proves that the nose region outperforms other facial parts and suffers from very few natural occlusions. Both the depth and surface normal maps are considered in the following three evaluations and results presented for all the captures in the Bosphorus database excluding those with occlusions and large pose variations. 2.4.4.1 Within-class Dissimilarity under Different Expressions
Dissimilarity maps were computed for globally registered faces using the point set features [33]. The maps were built by subtracting the captures with expressions from neutral captures of the same subject. Although these dissimilarity maps provide a good representation of the whole face and prove that the human nose is relatively stable under different expressions, only depth information was considered. Li et al. [6] used three components of surface normal (SNx,
SNy and SNz) calculated from 3D point clouds and explored the expression invariant
discriminative features for recognition, demonstrating the potential of using surface normals. Therefore, three components of surface normals are also considered in this experiment. As a preprocessing step, the pose variations are first corrected and all the captures are translated so that they are centred on the nose tip.
Combined Anger Disgust Fear Happy Sadness Surprise Facial Captures Depth SNx SNy SNz
Figure 2.6: Dissimilarity maps calculated from the captures with different expressions and the neutral one using four components of 3D data on the Bosphorus database. Darker regions show greater dissimilarity on the face. Combined expression contains facial surface deformations on the upper and lower units.
The Bosphorus database [13] provides a good choice for the expression invariant feature extraction, which contains Face Action Units that describe the facial surface changes when expressions occur on different parts of face (lower, upper and combined). It also contains some basic human expressions, including anger, disgust, fear, happiness, sadness and surprise. All captures with expression variations are considered in this evaluation.
To be more specific, all the Face Action Units are used for calculation which results in 32 dissimilarity maps of each component (depth, SNx, SNy and SNz), including 2 combined units,
20 lower units, 4 upper units and 6 basic expressions. To demonstrate the changes on the face, 2 combined units and 6 basic expressions are illustrated in Figure 2.6. As can be seen from the maps, some small patches on the face show different variance in each component under specific expressions. For example, the cheek bone part is widely regarded as a non-rigid region which suffers more changes under expressions [5]. The depth maps shown in Figure 2.6 indicate that the cheek bone region has limited stability. However, the surface normals calculated on the cheek are more consistent, which motivates the investigation of different types of discriminative features extracted from the non-rigid regions. In general, as can be seen from Figure 2.6, the nasal region demonstrates higher stability under various expressions.
2.4.4.2 Large Scale Patches Evaluation Using Selected Landmarks
(a) 12 patches
(b) depth (c) SNx (d) SNy (e) SNz
Figure 2.7: Landmarks based large scale patches evaluation on the depth and three components of surface normals. The brighter patches denote higher recognition performance. In general, features extracted from nasal region outperform other patches, which is more salient in the surface normals maps.
In previous studies, both 2D and 3D facial data are usually divided into fixed sized patches and for their recognition performance evaluation, LBP [115] is the most popular descriptor to extract features on each patch [6, 51, 116, 117]. For example, in [6] all the captures are first resampled to a fixed size and different scales of patches are used for recognition performance evaluation. In addition, the three components of the surface normals are also used to calculate the dissimilarity maps.
However, the main problem of these methods is that different subjects or their captures may contain varying content in the target facial patch. The underlying reason for this is that human faces possess their own characteristics (e.g. the size and curvatures) and their structure and distribution might be varying. Although such discrepancies can preserve the within-class similarity, they can have a great influence on the between-class dissimilarity. Therefore, in this section, an improved evaluation strategy to correct the content discrepancy is proposed. In Figure 2.7(a), using the nose tip as a reference seven further landmarks are automatically detected: (1) the nose tip, root and two alar grooves [7]; (2) two cheek landmarks; (3) middle nose bridge (middle point between nose root and tip) and middle subnasal (symmetrical to middle nose bridge). On the basis of these landmarks, 12 patches are found on the central facial region and each patch is resampled to a fixed size.
A set of LBP values is calculated for each patch and the resulting LBP histogram is used to build the final feature set. The recognition performance of each patch tested under identification scenarios is shown in Figure 2.7(b)-(e), where the brighter regions indicate a higher recognition performance. Compared to the other facial parts, the nasal and adjoining regions are more discriminative and has more potential to produce a good recognition performance.
2.4.4.3 Local Patches Evaluation on the Main Part of the Human Face
In addition to the large scale patches for the central face evaluation, 56 local patches, as shown in Figure 2.8(a), are used to further evaluate the discriminatory power on the nasal and surrounding regions. The local shape difference descriptor is used to extract features in each patch, which will be further explained in Section 5.4.1. The extracted features of each patch are evaluated in identification scenarios and the resulting discriminatory maps of four components are shown in Figure 2.8(b)-(e).
(a) 56 local patches
(b) depth (c) SNx (d) SNy (e) SNz
Figure 2.8: The discriminatory power maps of local patches on the main part of the human face. The brighter patches denote higher recognition performance. Similar to the larger patches evaluation, patches on the nasal and adjoining regions produce better recognition performance.
As before, the brighter regions indicate a good recognition performance. Compared to other patches, patches from the nasal region generally perform better than those on the depth, SNx
and SNy maps, especially for the lower nasal part. For the SNz map, the nasal region produces
a better recognition performance than the eye and upper mouth regions but worse than the adjoining cheek region.