• No se han encontrado resultados

Procedimientos, instrumentos y criterios de calificación, de manera general

Different lines of research have devised metrics of phonological similarity, from purely psycholinguistic studies to language engineering.

Many methods to measure phonological similarity consider the segmental level. Focusing on speech production, speech error analyses such as slip-of-the-tongue studies provide information on the importance of different segments for overall word-form similarity. By comparing which phonemes are replaced by which in speech errors, Stemberger (1991) composed a confusion matrix that quantifies the degree of confusability between each phoneme pair. The more confusable two phonemes are, the more similar they are assumed to be in a language's phonological representation map.

Stemberger's confusion matrix has been used to test the accuracy and psychological plausibility of other similarity metrics. For instance, Frisch (1996) used it to support his choice of a phoneme similarity metric based on

phoneme representations derived from Broe's (1993) structured specification theory.

The priming paradigm has been used in psycholinguistics to study word recognition (see an overview in Zwitserlood, 1996). A target word is preceded by a prime word that shares one parameter with the target. If the prime affects the processing of the target, then the parameter they share must be involved in lexical access. In priming studies of phonological similarity, when the prime and target shared the initial segments the results are conflicting (see review in Radeau, Morais & Seguí, 1995), but sharing the final segments, particularly if they rhyme, has been shown to facilitate target processing (see Dumay et al., 2001, for review).

Phonological similarity has also been measured in relation to the phonological similarity effect described by Conrad and Hull (1964), who found that when people are asked to recall a list of words, they perform worse if the words sound similar to each other. (Although Lian & Karslen, 2004, recently found that the effect depends on the type of phonological similarity considered, as reviewed in § 6.2.3.3 in chapter six). This effect is also found when words are read instead of heard, which is best explained by Baddeley and Hitch (1974) model of working memory that includes a component that recodes visual (orthographic) information into a phonological representation. In order to study the phonological similarity effect, researchers needed sets of phonologically similar and dissimilar stimuli. One method used to quantify phonological dissimilarity is Psimetrica (Phonological SImilarity METRIC Analysis), developed by Mueller, Seymour, Krawitz, Kieras and Meyer (2003) to test models of verbal working memory – yielding results in support of Baddeley’s model. For each word pair, Psimetrica returns a multi-dimensional vector that includes information about dissimilarity along a number of parameters such as rhyme, stress pattern or syllable onset match. This technique first defines each word in terms of a number of parameters or dimensions, it then aligns the two words

and quantifies the level of matching for each dimension and finally, the results are averaged over all the word pairs to yield the mean phonological dissimilarity profile of the word set.

Several methods for measuring word similarity across languages were developed with the purpose of automatic cognate identification. Cognates are words from different languages that share the same etymological origin, such as 'pronounce' in English and 'pronunciar' in Spanish, both derived from the Latin verb ‘pronuntiare’. These methods look for orthographically or phonetically similar words across different languages. This task involves searching and matching, including finding the word alignment that yields the best possible similarity score. Some of these methods measure the similarity of orthographic forms, such as the Longest Common Subsequence Ratio or LCSR (Melamed, 1999), which divides the length of the common subsequence (common characters in the same order) by the length of the longest of the two strings; and Dice’s coefficient, used by Brew and McKelvie (1996) which equals the number of shared bigrams multiplied by two divided by the sum of bigrams from the two strings. Other methods measure the similarity of phonological forms, such as ALINE (Kondrak, 2000), that uses a list of parameters based on phonological features ranked by salience and then finds the optimal alignment of strings. The best parameter values for finding cognates are found by a hill-climbing search that optimises the values for the task at hand (in this case, cognate matching).

McMahon and McMahon (2003) propose that quantitative methods drawn from the field of genetics should be applied to language classification into families. They used measures of phonological similarity between cognates to generate an unrooted phylogeny tree for Indo-European languages. Another quantitative approach is that of Kirby and Ellison (in preparation), who carried out a study of language phylogeny based on similarity within and between languages. They created vector representations of the phonological lexicons of 95 different languages (using edit-distances to compare words

within each language – and not cognates across languages). They then compared the 95 languages using the divergence of their distributions of confusion probabilities. Finally, using a neighbour joining algorithm, they constructed a language phylogenetic tree that reflected a plausible evolution of the Indo-European language family.

Phonological similarity is also used in a spoken document retrieval method (Crestani, 2003) that combined phonological and semantic similarity of the term used in a search with the terms contained in the documents to be searched. Crestani used a metric of phonological similarity between two words devised by Ng (1999) that uses the values in a phone confusion matrix (how liable is each phoneme to be misperceived or used instead of another one).

The last few paragraphs present many studies that have measured phonological similarity, some focusing on individual segments and some on whole words, for a variety of purposes, briefly summarised in Table 3.1.

Parameters Paradigm Results Shared phonemes Speech errors Phoneme confusion matrix Shared segmental positions Priming Determines impact of

parameters on lexical processing

Various at different levels (rhyme, stress, shared sequences)

Quantitative methods

Quantifies impact of parameters on phonological similarity

Table 3.1. Summary of metrics of phonological similarity.

The next section presents a metric of similarity between whole word-forms based on identity at the segmental level that measures the relative importance of the position, the stress pattern and the syllabic structure. This metric is different from the ones described above in several respects. First, unlike the phoneme similarity studies, I measure whole word similarity rather than single segments. Second, using a psycholinguistic methodology means that, as in the case of priming studies, I am not measuring pure phonological similarity, but rather word-form similarity, since other factors

such as morphology may affect the results. Third, unlike cognate identification and document matching, this empirical metric is not looking for certain types of similarity with a specific purpose in mind. Rather, I offer parameter combinations in a forced choice task and analyze people’s responses. Finally, this method does not take into account the identity of the phonemes compared, as spoken document retrieval systems do. Instead, I consider the positions in the words together with information on whether they are consonants or vowels and whether they are stressed or not. I then measure the impact of these parameters on the estimation of the overall perceived similarity between word-forms.

The next section describes the study and discusses the results in the light of current psycholinguistic theories.

3.2.2 Word-form similarity perception in Spanish: an empirical

Documento similar