Capítulo VI Resultados del objetivo 2: los efectos observables del feedback
6.1 Categorías predefinidas y emergentes de los efectos del feedback
Binaural cues are based on the comparison of the level and timing of signals re-ceived by the left and right ears, as depicted in Figure 2.7. The interaural level difference (ILD) cue described in Figure 2.7a is the first main binaural cue de-scribed. In the image, a low frequency sound can be seen travelling left-to-right across the page, with a wavelength sufficiently long as to pass around the listener
without trouble and present an approximately equal sound level to each ear. On the other hand, a high frequency sound, here travelling right-to-left, cannot diffract around the listener’s head. This gives rise to an acoustic shadow at the far ear, and a large ILD between the ears. Particularly apparent for high frequency sounds, this cue has been linked to what is termed the ‘better ear’ effect. This is thought to be a process by which the brain can assess the two ears independently and use the ear providing the better SNR (Edmonds and Culling, 2006).
The binaural timing cue, the interaural time difference (ITD), arises from the fact that a sound from off to one side will take longer to reach the ear on the further side of the head (cf. Figure 2.7b). On the other hand, a sound source directly in front (or behind) of the listener will travel an identical path length to each ear, and will arrive simultaneously at each ear. This cue is linked to the concept of
‘binaural unmasking’ which relates to the suppression of signals arriving with a given ITD between the two ears, perhaps via the equalisation-cancellation model described by Durlach (1963). At frequencies below 1.5 kHz, ITDs are thought to arise from a comparison of the two signals themselves while at higher frequencies it was suggested that the ITD cue is conveyed by the envelope of the signal instead (Wightman and Kistler, 1992).
Taken together, ITD and ILD provide cues required to locate a sound source. The ITD cue provides a strong clue as to the azimuth of the source, i.e., its position left to right, particularly for low frequency sources such as the voice (Wightman and Kistler, 1992). Additionally, recent research suggests that processing of the ILD cue in the central auditory system may be involved with perception of distance of a source (Jones et al., 2013). However, these binaural cues are unable to provide any information in regard to the elevation of the source since both ITD and ILD will be zero for a source located ahead, irrespective of its height above or below the listener. For this, monaural cues (or head movements) are required.
In the presence of reverberation, these binaural cues become less reliable (Rakerd and Hartmann, 1985). Judgements of ITD are disrupted as the reflected energy increases the decorrelation between the two signals received at each ear. Addition-ally, ILD cues may alter unpredictably since they depend on the listener’s exact position within the enclosure. Thus the spectral balance of the various sound com-ponents may vary dramatically depending on the location of the source and listener relative to the room modes and in proximity to particularly reflective or absorbent surfaces. Interestingly, distance perception has been reported to improve in rever-berant conditions, while direction accuracy worsens (Shinn-Cunningham, 2000).
The latter finding may be explained by the fact that lateralisation is cued largely by the temporal fine structure of the ITD which is decorrelated in the presence of reverberation (Smith et al., 2002). Indeed, Devore and Delgutte (2010) suggest
that ILDs may provide more reliable directional information than envelope ITDs for localising high frequency sounds in reverberant conditions.
Monaural localisation cues exist in the form of directional filtering that results from the interaction of the sound source with the various parts of the human anatomy.
The pinnae in particular provides a spectral cue for elevation, as the original source sound is modified with a different frequency profile depending on ‘where’ it orig-inates: level with, above or below the ear (Pickles, 1988). Secondly, the pitch percept, or its physical correlate, F 0, might be thought of as a monaural cue as-sisting with ‘what’ is heard. Maintained throughout the various stages of auditory processing, differences of pitch assist listeners in segregating concurrent sounds, and similar pitch contours assist the grouping sounds originating from a single source (Darwin, 1984). However, Culling et al. (2003) report that reverberation additionally hinders listeners’ ability to capitalise on F 0 cues. Their experiments showed that monotonised speech (in which there is no F 0 modulation, and thus reduced prosodic information) was harder for listeners to segregate when stimuli were reverberated rather than presented in anechoic conditions.
Integration of cues
As stated earlier, listeners are remarkably robust to the effects of real-room rever-beration. In part, this is likely to be due to the fact that listeners are typically active participants in their acoustic environments. By moving, tilting and rotating the head, a great deal of ambiguity in where the signal originated can be resolved. For example, by tilting the head, an elevation cue may be transformed into an azimuth cue which is easier to resolve with a greater degree of accuracy. The ways in which such cues are swapped or integrated has not yet been well-studied, however, since the bulk of the relevant research to date has used headphone presentation which does not allow this kind of cue-swapping to occur.
Moreover, it is not yet understood how monaural and binaural aspects of hear-ing combine together, or take-over from one another, particularly in reverberant listening tasks. The two binaural cues discussed earlier are thought to contribute to the phenomena termed spatial release from masking (SRM), combining better-ear listening and binaural unmasking (Lavandier and Culling, 2010; Zurek, 1993).
While having two ears is clearly a benefit as it allows SRM, it is still not clear whether that benefit arises from switching between the two monaural systems (left then right individually, using the better ear effect which favours the ear with the higher SNR) or from actually synthesising concurrent information from the two ears simultaneously (as occurs in binaural unmasking arising from the ITDs). Us-ing modulated noise maskers with simulated room reverberation, ongoUs-ing research in this area aims to better understand the relative importance of these contributions
in the context of speech intelligibility (see e.g., Culling and Mansell, 2013; Weller et al., 2014).
Though questioned recently by Jones et al. (2013), the duplex theory has stood for around a hundred years, suggesting that localisation relies on time-based cues for low frequency sounds, and level-based cues for the high frequency sounds. Clearly, to achieve good speech identification a listener must also integrate information across frequency and through time. Culling et al. (2006) examined contributions of monaural and binaural cues, separately and in combination, on speech percep-tion. In their study, the binaurally-derived information was helpful only in the low-frequency regions (below around 1.2 kHz) whereas monaurally-derived cues were beneficial throughout the entire frequency region. In other words, binaural information helped (alongside monaural cues) to uncover the pitch and formant structure of the speech in this task, whereas the high-frequency consonant identifi-cation was achieved by (only) the monaural system.