FLUJO DE EXTRADICIÓN ACTIVA
Artículo 46. Asistencia judicial recíproca Párrafo 10 y
The only publications that report the use of LRs for multiple speech parameters in FSC casework are those of Rose (2012; 2013b) in connection with a fraud case in Australia. The case of R v. Hufnagl (2008) revolved around a large-scale telephone fraud of AUS$150 million, where a criminal sent a fax to JP Morgan Chase bank, asking to transfer $150 million from the Australian Commonwealth Superannuation Scheme to accounts in Switzerland, Greece, and Hong Kong. Before the close of business, the criminal called the bank asking for confirmation of the details in the fax he had sent. When the Australian Commonwealth Superannuation Scheme realized their account was short by $150 million, an investigation followed. A suspect was identified, and Rose was asked to compare the recording of the fraudulent telephone call with recorded telephone calls known to have been made by the suspect. The analysis and report were produced five years prior to Rose’s publications about it (2012; 2013b), so he presents the original analysis that was carried out as well as a retrospective critique of his analysis.
In the original analysis, he identified many tokens of the word yes in both the criminal and suspect recordings, as well as the utterance not too bad in the criminal recording and multiple occurrences of the same phrase in the suspect
65
recordings. Therefore, the majority of the analysis and the resulting LR were based on phonetic/linguistic parameters measured from these words. In order to establish the typicality of the criminal’s speech Rose defined the relevant population to be adult male speaker[s] of General Australian English” (Rose 2013b, p. 284). He then collected relevant speech samples from 35 adult males, who served as the background population. The analysis of similarity was comprised of formant measurements from /je/ in the word yes at three designated time-points, the fundamental frequency (F0) in not too bad taken from four designated time-points, categorical classification of high and low tones in not too bad, formant measurements of the vowels in not too bad, and the frequency cut-off in /s/ from the word yes (Rose, 2013b). After (intentionally) naïvely combining the individual LRs from the parameters, an overall LR (OLR) of around 11 million was calculated. Rose (2013b) explicitly states that 11 million was an over-estimation of the strength of evidence, since some degree of correlation had to exist between the parameters. For this reason, parameters that were assumed to have some degree of correlation with one another (e.g. formant measurements for certain vowels) were thrown out, and a more conservative LR of 300,000 was reached.
Five years after the conclusion of the case, Rose provided a critique of the analysis and presentation of the evidence under an LR framework. He notes a number of developments made in the field since the R-v-Hufnagl case that could have made a significant difference in his analysis. These include vowel (and consonant) parameterization (e.g. formant dynamics; McDougall, 2004), quantification of accuracy and precision (validity and reliability; e.g. Cllr and EER for validity measures), and - most importantly - techniques to handle
66
between-parameter correlations for calculating OLRs (e.g. fusion). If any of these developments were to have been implemented in R v. Hufnagl (2008), it can confidently be said that the strength of the numerical LR would not be identical to that presented in Rose (2013b; also shown through his reanalysis of the case material); most likely, the strength of the LR would weaken as correlated parameters were accounted for during the combination of speech evidence (acknowledged in Rose, 2013b).
The final portion of Rose (2013b) commented upon the court’s reaction to the presentation of evidence in the form of a numerical LR, which is something rarely discussed in forensic phonetics. The expert testimony did not include a complete tutorial on the LR approach; rather, it offered a more abstract presentation of the strength of evidence (the LR). Rose (2013b) condensed his analysis into two main points for the jury, which he emphasized on multiple occasions: (1) the LR is for estimating the strength of the evidence and not the probability that the suspect is the criminal, and (2) the jury should not give much weight to the specific value allocated to the LR, just that it was very big. Whether Rose’s testimony made an impression on the triers of fact in R-v-Hufnagl is unknown. However, the jury did return a guilty verdict (Rose, 2013b). Rose also notes that it was perhaps vital to his testimony that the judge was encouraging towards his approach and that this helped him (Rose) to articulate to the court the strength of the speech evidence. It can be assumed that not all judges would act in the same manner, and presenting the same testimony in front of a different judge might have been more challenging without such support.
67
Overall, it is encouraging to see an example of a real case in which a numerical LR framework was used. The introduction of Rose’s paper provides a nice backdrop to the case and the type of speech material Rose chose to analyze. The critique at the end of the paper is a positive contribution, as it shows how the field has evolved in the past five years since the case analysis was completed. The paper also shines light on the reception of the LR in a court, which again often goes without attention in the literature. However, the paper perhaps brings up more questions (both theoretical and practical) about the implementation of the LR framework (as used by Rose) than it answers. For instance, how does an expert begin to select parameters for analysis under an LR framework? How can an expert argue why s/he has selected certain parameters for analysis over other parameters? How is an expert to incorporate qualitative/categorical parameters? And how many parameters need to be analyzed to consider the evaluation to be complete?
Despite raising a new set of questions, Rose (2013b) makes three pertinent statements with respect to LRs. These statements are particularly relevant to the remainder of this thesis. The first is that real-world cases are never the same” and there is no one-size-fits-all” with regards to methodology (Rose, 2013b, p. 318). This means that the LR calculation is not the same in every FSC case, or for every phonetic/linguistic parameter selected for analysis. Therefore, the analysis that leads to an LR will always have to be adapted on a case-to-case basis. The second statement asserted by Rose is that FSC might lend itself more readily to a verbal LR over a numerical LR8. The reason for this
8 A verbal LR is simply a verbal, rather than numerical, statement of the probability of obtaining
68
is that precise figures may be misleading in that numerical LRs may be difficult for the trier(s) of fact to interpret9 (Rose, 2013b, p. 305). The final statement comes from Judge Hodgson (2002) but is reiterated by Rose (2013b): since not all types of evidence in a trial can be sensibly assigned a LR there is no way of mathematically combining à la Bayes the LR-based evidence with the non- numerically based evidence” (Rose 2013b p. 316-317). This leaves one to ponder whether there is really an explicit need for speech evidence to be represented in numerical LR form. For example, would a phonetician ever be able to quantify the exact tongue shape of a speakers’ /ɹ/? In this instance, a qualitative description of /ɹ/ will typically be more useful than a quantitative one that is not completely transparent in its description. Should these types of evidence always be unsuitable for expression in a numerical LR, will it be the case that other phonetic-linguistic parameters can be made to fit the mold in the form of LR algorithms that dictate specific quantitative forms? It is also important to consider that if a numerical LR is used, only a partial assessment of the speech evidence is feasible, given that numerical LRs cannot currently be calculated for all speech parameters (because of the lack of appropriate algorithms and/or the qualitative nature of certain parameters), and the lack of population statistics in general.
2.5.2.1 Research Question 4
The literature review provided in the previous sections revealed a number of limitations and difficulties that can occur when applying the
given the defense hypothesis. For example the verbal statement could be presented as ‘it is extremely more probable to obtain the given evidence under hypothesis x than y.’
9 For example, is there really much of a difference between an LR of 1.1 x 1014 and an LR of 1.11
69
numerical LR framework to FSCs, which are largely due to the complexity of speech data. If the field is to continue in its efforts to align itself with more advanced forensic disciplines (in terms of conclusion frameworks) that have already adopted the LR framework (e.g. DNA), various aspects of the actual calculation of an LR in a FSC should be reviewed and improved (e.g. modeling techniques, population statistics, combining parameters for OLRs).
(4) For this reason, it is essential to ask: What are the practical limitations/implications that need to be considered when using the numerical LR framework in FSCs?
a. What recommendations, if any, can be provided following attempts to implement the numerical LR framework?
b. What can a human-based (acoustic-phonetic) system tell the field in respect of the ease with which a numerical LR can be computed for FSCs?
The practical limitations and implications associated with the implementation of a numerical LR will be discussed throughout this thesis. It is only through empirical testing that these questions can be addressed.