De la Austeridad y Disciplina Presupuestaria

Sample quality measures properties of a sample that affect system performance, and pertains to both probe samples and gallery samples. Sample quality can be measured at various different levels:

• Lower-level quality metrics may be useful by themselves, but are more likely to be of value when combined into an aggregate quality metric:

° Generic image quality metrics provide measurements using broadly applicable image quality metrics that are not specifically oriented towards a biometric system. A large number of generic methods are available. ° Feature-specific metrics and localized quality metrics are based on the

reliability of individual features within the sample.

• Aggregate quality metrics are based on combinations of lower-level quality metrics:

° Representation-based metrics are aggregate measures of the data content of the sample.

° Matcher prediction metrics are aggregate measures that use other types of measurement to predict the scores of one or more matching algorithms. • Matcher score analysis is the process of using matcher scores from multiple

metrics) are estimates of matcher performance, the actual matcher scores can be seen as the definitive measure of data quality.

• Stage-specific metrics measure the fidelity of a stage of processing, such as data compression. They differ from the other measures in that they do not measure the overall quality of the sample, but the degradation of quality that can be attributed to a given stage of processing, such as scanning or compression.

It should be noted that there is not universal agreement as to what should be considered a quality problem, or what characteristics should be measured as inputs to quality metrics. To use fingerprints as an example, is the overall size a quality-related metric? Is the degree to which the image is centered? Is rotation? For quality metrics that have the end goal of maximizing the correlation with matcher score, any such factors that affect the matcher score need to be considered among the various inputs to the quality metric.

6.1.1 Generic image quality metrics

Since many biometric samples are images, generic image quality approaches that are not specific to biometrics can be used to assess their quality.8_These approaches grew out of digital image processing techniques and analyses of the human visual system (HVS) and use measurement methods that can be applied to any type of image [Schalkoff, Chalmers]. Examples of generic image quality metrics include the brightness, contrast, entropy (pixel variance), spectral magnitude/phase, and the like (the complete list is quite extensive). Since many biometric systems are designed to extract information from images, measuring and controlling generic image quality can improve system performance. Some generic image quality metrics can be effective when used in assessing biometric sample quality, such as the use of an image's spatial frequency power spectrum as a measure of quality [IQM]. Even those generic image quality metrics that are not of great value in measuring sample quality by themselves can be of use as inputs to an aggregate quality metric. For example, the number of colors (or shades of gray) in an image is not generally an effective quality metric by itself, but is a valuable input to aggregate measures, which can use limited color space as one of many indications of poor quality.

8_{One of the reasons to refer to “sample quality” rather than “image quality” is that some people argue} that the term “image quality metric” should be used solely to refer to generic image metrics rather than biometric-specific metrics.

6.1.2 Feature-specific metrics and localized quality metrics

Feature-specific metrics are based on the reliability of individual features within the sample. These are likely to be by-products of feature extraction algorithms. For example, a fingerprint minutiae detector may determine the relative certainty that the minutia is valid, and have an area of uncertainty around the defined location. The same encoder may assign a quality value to regions or even every pixel in the image: these are known as localized quality metrics. These feature- specific metrics and localized quality metrics are used in aggregate or global metrics. In addition, these metrics are used in the fault-tolerant logic of matchers, so that the relative importance of a specific feature in matching is based on the quality of that specific feature.

6.1.3 Representation-based aggregate metrics

Representation-based metrics are aggregate measures of the data content of the sample. Representation-based measurement makes an overall assessment of the quality of the sample, generally based on a combination of individual feature- (or stage-) specific metrics. For example, a low-ridge count whorl fingerprint sample could be expected to contain two deltas; if one or both are missing, the sample may be missing fingerprint pattern area and could be considered poor quality. Examples of representation-based metrics include:

• Fingerprints – pattern location, number of minutiae, number of minutiae of quality X, overall minutiae quality, pattern area quality, ridge flow consistency, overall ridge quality, core quality, and ridge frequency [IQS] [Chen1].

• Iris – camera quality and imaging conditions, iris boundaries, eye-lid occlusion, imaging angle, image blur, and wavelets [Daugman05] [Chen2] [Dorairaj] [Kalka].

• Face – head and face location and cropping, imaging angle, identification of eye glasses and other obstructions, and eye-finding [IDX].

6.1.4 Matcher prediction aggregate metrics

Matcher prediction metrics are aggregate measures that use other types of measurement to predict the scores of one or more matching algorithms. Many matcher algorithm vendors have developed matcher prediction metrics for their own purposes. There are a variety of matcher prediction metrics that have been implemented over the years. For example:

• When IAFIS was engineered, metrics were designed and tuned to predict the behavior of individual algorithms within the matching process; these matcher prediction metrics have been critical to the efficient operation of the system. • The IDENT/IAFIS Image Quality Study [IQS] evaluated fifteen fingerprint

quality metrics (stage-specific or representation-based) for the purpose of predicting IAFIS performance on fingerprints of different types. A combination of rule-based and linear regression methods were used to define an optimal combination of four of these metrics, resulting in a Unified Image Quality Metric that uses the same scale as the target matcher scores. The metrics used by UIQM were themselves aggregate metrics [NIST-IQS], based on multiple feature-specific metrics (minutiae quality), localized quality metrics (assessment of localized ridge flow quality and consistency), and general image quality metrics (contrast).

• [NFIQ] introduced a model and implementation for generic matcher prediction, based on approaches can be applied to any matcher, including an elegant method of using a neural net to associate several lower-level quality metrics with match scores from multiple matchers. NFIQ is based on counts of minutiae of differing qualities, and aggregate measures of ridge flow quality and consistency.

It is important to note that there are limits to the effectiveness of using sample quality to predict matching accuracy [IQS]. Matching accuracy depends on the following factors:

• Sample quality for both the probe sample and the gallery sample. Since matchers compare probe and gallery samples, poor quality in one or both hurts matcher accuracy.

• Sample correspondence between the probe and gallery. For example, two poor-quality face samples can match with a high score if the pose, lighting, and expressions correspond closely; two distorted fingerprints can match well if distorted in similar ways. Measurement of such correspondence requires use of both probe and gallery samples, and a matcher. It can be argued that the degree of correspondence should be considered a quality issue. For example, face matcher scores between captures will vary based mainly on trivial variations in lighting, face angle/pose, lens type, and expression: extreme problems with any of these factors are clearly quality problems, so these correspondence issues can be seen as minor or immeasurable cases of quality issues.

• Metadata quality. Metadata errors are insidious because they are difficult to detect, cannot be detected by evaluation of a single sample, and can make a match fail even if all of the other factors are perfect.

A matcher prediction metric is only as good as the care spent in its training: if the matcher(s) used do not fully represent all target matchers, or if the samples used in training do not fully represent the range of samples to be used, the metric cannot be expected to be reliable. When training, is may be wise to include non- biometric samples in the training set: the authors have seen quality metrics that designate completely blank images or images of trees as good quality!

The factors that affect the accuracy of one matcher may not affect the matcher score for all matchers, so a single metric will not correlate equally to matcher performance for all matchers. Generally, more accurate matchers gain that accuracy by being tolerant of many characteristics that would cause some less accurate matcher to fail: the differences in accuracy between matchers can mostly be attributed to differences in the processing of poor or marginal quality samples. Therefore, a metric tuned to those characteristics that cause less- accurate matchers to fail may identify as poor quality many cases that would match without problem in more-accurate matchers; a metric tuned solely to more accurate matchers would only identify a subset of the cases that would be problematic for less-accurate matchers.

Human examiner-based measurements are a variation of matcher prediction metrics that predict human examiner matching performance by quantifying certainty associated with their decisions about a sample (ranging from easy to identify to unidentifiable). This approach asks examiners to judge whether a sample can be accurately matched. Low variance in their response indicates agreement among the examiners and more certainty whereas high variance indicates less certainty. Classifying samples by data quality type could allow for extrapolations beyond the survey set of data (Section 6.1.6).

6.1.5 Matcher score analysis

The gold standard of quality is the actual measurement of matcher scores. In practice, this can be complex. Multiple samples are used to limit sample-specific factors (a good image can match poorly if the gallery image is bad). Having only two samples per subject will not permit isolating quality as a variable (one match score needs two samples), so three or more samples are necessary. Likewise, multiple matchers should be used to avoid matcher-specific characteristics.

Matcher score analysis was used to determine ground-truth quality in [NFIQ] and [Goats].

6.1.6 Stage-specific metrics

A similar category of metrics includes those measuring the fidelity of a stage of processing. For example, a compression algorithm can use a peak signal-to-noise ratio to express a Compression Fidelity Score, a classification algorithm can assign a value to the certainty of its determination of class, or scanner certification procedures [Nill] quantify the fidelity of image capture (discussed specifically in Section 6.3).

In document LEY FEDERAL DE PRESUPUESTO Y RESPONSABILIDAD HACENDARIA (página 43-45)