Conclusiones - Retos y oportunidades del

Existing commercial non-contact biometric solutions need cooperation with the user and require controlled conditions for accurate results. This section describes existing studies evaluating their sensitivity to more challenging situations.

For faces, the FRVT included an examination of the effects of pose, lighting and res-olution on verification performance. The 2002 evaluation showed a very large drop in performance when subjects were recorded with pose variations or under unconstrained exterior lighting. In 2002, with 45 degrees of pose variation, the best performing system achieved an FRR of 0.6 with a FAR of 0.01. Also in 2002, the best performing system under varying outdoor lighting achieved an FRR of 0.5 with the same FAR (0.01). How-ever, the 2006 study demonstrated that techniques had improved to a point where, with unconstrained lighting, the best algorithm achieved an FRR of around 0.1 with a FAR of 0.01. Also in 2006, the performance on a low resolution dataset (average between-eye distance of 75 pixels) was evaluated. The best performing algorithm in this case achieved an FRR of 0.02, much closer to the high resolution performance. This information is summarised in Table 2.8.

Recent work on face recognition has focused on improving robustness as well as address-ing variations due to ageaddress-ing, expression and image quality [118] [107].

Iris recognition typically requires significant user cooperation [70] and there are cur-rently no systems that attempt recognition without such cooperation. The closest un-constrained approach is a system developed by Mateyet al., which identifies subjects as they walk through a controlled sensor [70]. However, this approach is still far from being practical in unconstrained environments.

In terms of gait recognition, techniques have been developed to recognise subjects walk-ing at different angles to the camera. In addition, recent work has improved gait

per-formance for subjects carrying objects [66]. However, gait recognition is still at an experimental stage and robust commercial systems may not be available for many years.

Table 2.8: Recognition results for unconstrained factors

Evaluation Factor Number of

In comparison with the face, iris and gait, ear recognition is a relatively young field.

Initial work has concentrated on demonstrating high accuracy using controlled datasets.

Most of these datasets have minimal noise, little pose variation and uniform lighting. In addition, they constrain the probe image to a single profile head, removing the problem of detection within background clutter. In many cases, ear registration is performed manually, with the techniques focusing on the development of robust distance measures [56]. Manual registration has also been used in the comparison of ear and face recog-nition. These studies highlight the potential for combining the biometric information from both face and ear to improve recognition results [29]. Of particular significance is a study by Theoharis et al. which has shown that the face and ear shape have very low statistical correlation, with a Pearson correlation coefficient of 0.161 [100]. More re-cently, fully automated recognition systems have been produced [8][61]. These are more representative of true recognition performance as initial detection can be a significant source of error [8].

Recent work has also started to use less constrained datasets, which highlight a sensi-tivity to pose and lighting variation [39]. In particular, a study by Chang et al. shows a drop in recognition performance from 90% to 34% when the gallery and probe images have pose variations of 22.5 degrees [29]. This sensitivity has led to the development

of techniques that use 3D laser scans of the ear shape. These 3D approaches have very accurate recognition results on datasets with small pose and lighting variations [59][112].

In addition, techniques have been developed to estimate 3D ear shape from video se-quences and shape from shading approaches [26]. These techniques have the potential to improve pose and lighting robustness without requiring specialised 3D sensors.

In terms of evaluation, most research on ear biometrics has concentrated on recognition rates rather than the verification rates used in the more established biometrics. Recog-nition rates measure the percentage of subjects who are correctly identified from the subjects in the gallery. Most systems produce multiple candidate identities for each test image, which are then ranked according to the estimated likelihood that they match the image. If the true subject is in the top n returned identities the result is considered to be correct to rank n. The rank n recognition rate is then the percentage of test images that are correct to rank n.

Many techniques use different datasets for their evaluation and so accurate comparisons of performance are not possible. However, five datasets have been used in multiple studies: XM2VTS [73], USTB [111], UND E [29], UND J2 [112] and FRGC [91]. The XM2VTS dataset was created by the University of Surrey and includes 295 high quality and high resolution head profile images. The USTB ear dataset was produced by the University of Science and Technology Beijing. It contains 79 subjects recorded with pose variations and a subset of 77 with lighting variations. The UND E, UND J2 and FRGC datasets were all produced by the University of Notre Dame. The first set, UND E, contains 114 2D profile ear images. UND J2 is a larger dataset consisting of profile 3D colour and range scans of 415 subjects. Both datasets have small variations in pose and lighting. Finally, the FRGC is one of the largest 3D face datasets available. It includes scans of 324 people who have also been recorded in UND J2 and can therefore be used for combined ear and face recognition experiments.

Tables 2.9, 2.10, 2.11 and 2.12 provide a summary of the main results in ear recogni-tion. The tables show the relative degrees of robustness that have been obtained. The tables list research in 2D, 3D and combined face and ear recognition approaches. The research has been broadly sorted based on its robustness to pose and occlusion and on the difficulty of the evaluation datasets. For each algorithm the base recognition rate and the performance under pose variation and occlusion have been included. None of the other factors have been explicitly evaluated in the existing work. 15 techniques based on 2D recognition with manual registration have been compared, as well as an additional six fully automated 2D approaches. For both manual and fully automated techniques, recognition rates of over 90% have been achieved using relatively constrained datasets.

However, on the more challenging UND E dataset, recognition rates are generally within the 80%-90% range. The main exception is the technique of Naseem et al. [79] which achieves 98% recognition rate performance on a small subset of the UND E images.

In addition to the 2D techniques, five 3D techniques have also been produced, one using manual registration, two of which are fully automated and a further two which combine both face and ear. Of these techniques, the combined face and ear approaches achieve the best results with 98% (and above) recognition rates on datasets consisting of over 300 subjects.

The next section examines each of the existing ear recognition techniques in detail.

In document Retos y oportunidades del (página 81-84)