• No se han encontrado resultados

Cuando una actividad ya establecida y no autorizada por el presente Código, sea tolerada mientras desaparece, no podrá servir de

ZONA VII. ZONA INSTITUCIONAL DEPORTIVA, CULTURAL Y RECREATIVA (Z.I.D.C.R)

PARÁGRAFO 3 Cuando una actividad ya establecida y no autorizada por el presente Código, sea tolerada mientras desaparece, no podrá servir de

That visual speech could enhance our perception of auditory speech was first demonstrated behaviourally over 60 years ago (Sumby and Pollack, 1954, O’Neill, 1954). Specifically, it was shown that intelligibility was enhanced in noise, equivalent to an increase of up to 15 dB in signal-to-noise ratio (SNR; Sumby and Pollack, 1954), with a 1-dB improvement in SNR leading to a 5–10% increase in intelligibility, depending on speech materials (Miller et al., 1951). This led to the impression that visual speech only enhanced hearing in suboptimal listening conditions and (in line with the principle of inverse effectiveness) that this effect was inversely related to SNR and hearing ability (Erber, 1969, Erber, 1975, Erber, 1971, McCormick, 1979, Neely, 1956, Binnie et al., 1974). However, a re-examination of Sumby and Pollack’s findings (Remez, 2005) showed that the benefit of AV speech is not limited to degraded acoustic environments. It has also been shown using extended passages of natural speech (instead of discrete tokens) that AV speech is beneficial in easy-to-hear (but hard-to- understand) environments, increasing the speed at which participants could repeat

29

words in real time (Reisberg et al., 1987) as well as improving comprehension (Arnold and Hill, 2001). The former finding fits with studies that have observed faster RTs in response to AV syllables (Besle et al., 2004a, Klucharev et al., 2003). Note that enhanced intelligibility has been demonstrated at every level of speech, including syllables (Bernstein et al., 2004b), words (Sumby and Pollack, 1954) and sentences (Grant and Seitz, 2000).

It was recently argued (Ross et al., 2007a) that many of the early behavioural studies that demonstrated an inverse relationship between AV enhancement and SNR (i.e., inverse effectiveness) may have oversimplified this assumption. Several of these studies used a delimited set of word stimuli that were presented to the participants beforehand in the form of checklists (Sumby and Pollack, 1954, Erber, 1969, Erber, 1975). Thus, it is likely that speech-reading scores were artificially high due to familiarity, particularly at lower SNRs where intelligibility is more susceptible to ceiling effects (Ross et al., 2007a, Holmes, 2009, Bernstein et al., 2004a). Furthermore, measures of multisensory gain can be erroneously high depending on how it is calculated, i.e., absolute value versus relative percentage (Ross et al., 2007a, Holmes, 2009), and also by the density of the lexical neighbourhood of the word stimuli (Tye- Murray et al., 2007). To circumvent these shortcomings, Ross et al. (2007a) conducted a word recognition task (as opposed to detection) at multiple SNRs between 0 dB and auditory threshold (−24 dB). They presented a much larger set of words so that each presentation was unique and there were no checklists available to participants, which greatly reduced speech-reading accuracy (< 10%). In doing so, they demonstrated two very important behavioural aspects of AV speech: (1) the enhancement conferred by AV speech is far greater than that accounted for by speech-reading ability, i.e., it reflects multisensory interactions and (2) AV gain is greatest at an intermediate SNR (−12 dB), not at threshold (−24 dB). In other words, AV gain does not follow the principle of inverse effectiveness beyond a certain SNR.

Aside from studying how AV speech integration is impacted by background noise, many studies have investigated the impact of the sematic congruency between the auditory and visual streams. Much of this work was inspired by an influential study that accidentally demonstrated an interesting AV speech illusion, known as the McGurk effect (McGurk and MacDonald, 1976). The McGurk effect is a phenomenon whereby a particular incongruent pairing of auditory and visual syllables can produce the perceptual illusion of a syllable that was neither heard nor seen. For example, they

30

found that when an auditory /ba/ was dubbed to a visual /ga/, the syllable perceived by participants was consistently /da/. The effect has since been replicated by numerous studies and in numerous different languages (Jiang and Bernstein, 2011, Summerfield and McGrath, 1984, Green and Kuhl, 1989, Sekiyama and Tohkura, 1991, Massaro et al., 1995). The McGurk illusion is even insensitive to knowledge of its basis (Campbell, 2008), although the nature of the fusion effect has been shown to be subject-dependent (Schwartz, 2010). Other than influencing how we perceive speech, incongruent AV pairings can alter our performance, delaying RTs relative to congruent AV speech and unimodal speech (Klucharev et al., 2003). While the McGurk effect has advanced out understanding of how the human brain integrates AV speech, it is usually perceived in a controlled experimental setting with well-synchronised AV stimuli and is not an illusion typically encountered in everyday life. It has been suggested that the spatial and temporal coherence of such incongruent stimuli may be a strong cue to their co- processing and ‘binding’ (Campbell, 2008).

The visual component of speech that contributes towards enhanced behaviour is not limited to just the mouth, but also movements of the head and eyebrows (Yehia et al., 2002, Munhall et al., 2004a, Thomas and Jordan, 2004), and even haptic movements (Fowler and Dekle, 1991). Furthermore, it has been shown that humans typically perceive desynchronised AV syllables as occurring simultaneously for audio leads of up to 90 ms and audio lags of up to 170 ms, and perceive McGurk fusion effects for audio leads of up to 30 ms and audio lags of up to 170 ms (Fig. 2.9; van Wassenhove et al., 2007, Miller and D'Esposito, 2005, Grant et al., 2004). This ~250 ms window of integration corresponds roughly to the average length of a syllable, thus it has been suggested that syllables may be an important unit of computation in AV speech processing (van Wassenhove, 2013). Furthermore, it has been shown that this window is narrower and more asymmetric for speech versus non-speech stimuli, in support of the notion that this tolerance is fine-tuned to the natural statistics of AV speech (Maier et al., 2011). It has also been suggested that the brain tolerates AV asynchronies because of the differences in the speeds of sight and sound, as well as differences in transduction times and neural latencies (Alais et al., 2010). Although asynchrony detection has not been shown to reflect speech reading ability (Grant and Seitz, 1998), it has been shown to predict susceptibility to the McGurk effect (Stevenson et al., 2012b).

31

Figure 2.9: Temporal window of encoding and integration in AV speech.

The temporal encoding window (top) represents the time necessary for speech encoding and the temporal integration window (bottom) represents the encoding window plus the tolerated temporal noise leading to suboptimal encoding performance (Adapted from van Wassenhove, 2013).