4.2 Objetivo de la propuesta
4.4.2 Módulo II: Plan de Acción
4.4.2.1. Estrategias para mejorar la comunicación interna
The ability of the human auditory system to detect a sound is determined by the phys- ical properties of sound waves in air and their interaction with the structures of the outer, middle, and inner ear. The following sections cover these topics in order to iden- tify guidelines to which an auditory display must adhere to ensure a sound is at least detectable. Unless otherwise noted, the descriptions and values in this section come from Gelfand (Gelfand, 2004). For reference, Figure 4.1 gives a rough outline of the anatomy of the ear.
Figure 4.1: Outer, middle, and inner ear. Public domain image retrieved from http: //en.wikipedia.org/wiki/Image:HumanEar.jpg.
Physical acoustics
In physical terms, a sound is a longitudinal pressure wave traveling through an elastic medium (i.e., air). A simple sinusoidal sound wave may be characterized along three basic dimensions: amplitude, frequency, and duration. The amplitude of a sound wave is the instantaneous magnitude of air molecule displacement caused by the wave at a single point in space. The frequency of a sound is rate at which fluctuations occur at that location. The duration of a sound is the length of time over which the pressure oscillations continue at that location. The first two of these concepts can be extended to concepts useful for characterizing more complex sounds: intensity and spectrum.
Intensity is the power exerted by the sound wave on an area. The intensity of a sound falls off with the square of the distance from a sound source in afree field, or environment with no sound wave reflections. The spectrum of a sound describes the magnitude of its sinusoidal frequency components as determined by Fourier’s theorem. Individual sound spectra can be combined into a single spectrum representing the intensities of all frequencies reaching a point.
Given just these fundamental properties, it is possible to name some rough absolute thresholds and just noticeable differences (JNDs) for sound detection based on empirical population data. The least intense sound a human listener is able to hear is 10−12Watts per m2
(0 dB IL) while the most intense sound a listener can tolerate without pain is around 1 W per m2
(120 db IL). The JND for intensity is approximately 1 dB though both it and the thresholds are frequency dependent. The minimum frequency a listener can hear is 20 Hz and the maximum is roughly 20,000 Hz, though the recommended limits for auditory display are 80 Hz to 10 kHz (Walker and Kramer, 2004). The JND for frequency can be expressed linearly using the unit ofcents, where 1¢= 1200∗log2(f2/f1). With this measure, the JND is roughly 5¢ though both the JND and thresholds for frequency are intensity dependent. The minimum amount of time a sound must persist for detection is negligible while the maximum amount of time is practically infinite.
However, the ability of a listener to determine the tonal properties of a sound requires that it continue for roughly 10 to 30 ms. Again, these times are frequency dependent (Beament, 2001). Any auditory display produce output well within these limits.
Outer Ear
The outer ear is a set of physical structures that shape the spectrum of all sound waves reaching the listener. The shoulders, head, pinnae or outer folds of cartiledge and skin, and ear canals all contribute to the head related transfer function (HRTF). The HRTF is the Fourier transform of the impulse response at the eardrum to a sound located at some azimuth, elevation, and distance from the head. It acts as a filter defining which frequency components are attenuated and which are amplified before reaching the tympanic membrane and passing through to the middle and inner ear. For reference, Figure 4.2 depicts the general shape of the outer ear transfer function for a sound presented directly in front of the listener.
Figure 4.2: Impulse response of the head related transfer function for a source located at 0° azimuth.
The roughly spherical shape of the head acts to obstruct some sound waves and to introduce a delay between the arrival of others at the two ears. When a source is placed directly in front of or behind the head (0° or 180° azimuth), its sound waves arrive at
the two ears in phase. When the source is offset so it is closer to one side of the head than the other, the sound waves incident on the ears are affected in two ways. If the wavelength of the sound is shorter than that of the diameter of the head (0.18 m or 1.9 kHz), it is blocked by the head. This sound shadow results in the attenuation of high frequency sounds at the far ear, creating aninteraural intensity difference (IID). On the other hand, if the wavelength of the sound is longer than that of the diameter of the head, it diffracts and wraps around to reach the other ear. The extra distance traveled to reach the opposite ear results in a phase delay as compared with the wave received by the near ear, creating an interaural phase difference (IPD). Both of these effects are important cues in the localization of the azimuth of a sound source. Nevertheless, neither helps in distinguishing sounds to the front of the head from those behind: the cues are symmetric about the dorsal plane through the ears. Head movement is required to effectively determine if a sound is coming from the front or the back.
The two pinnae, what are often called the ears themselves, feature complex folds of skin and cartiledge. These folds cause numerous reflections of sound waves reaching the pinnae, resulting in constructive and destructive interference of the waves. The primary result of this interference is the creation of a large attenuation, or notch, in the HRTF impulse response at about 10 kHz. This notch shifts depending on the positioning of a sound source above the azimuthal plane, and is the primary cue for determining source elevation.
Sound waves incident on the shoulders are reflected in various directions, including upward toward the head and pinnae. The extra distance traveled by these reflected waves causes them to arrive slightly later than the primary wavefront. The slight echo is a secondary cue in the determination of sound source elevation.
The two ear canals, one per ear, are S shaped tubes leading to the middle ear. A single ear canal is open peripherally near the center of the pinna and closed medially by the tympanic membrane. Besides providing a conduit for sound to reach the middle
ear and protecting the ear drum from damage, the closed tube anatomy of the canal produces standing waves at frequencies f = nc/4L where c is the speed of sound, L
is the length of the tube, and n is the harmonic number. For the average adult with ear canal length of 2.3 cm, sounds at approximately 3700 Hz resonate in the ear canal, intensifying waves at this frequency reaching the ear drum.
Producing sounds that appear to originate in three-dimensions outside the head re- quires that a computer auditory display faithfully model HRTFs. Thankfully, computer software and hardware are now capable of interpolating HRTF responses created us- ing far-field (> 1 meter) pseudo-impulse sources around a generalized head to simulate sound in 3 dimensions (Brewster et al., 2003). However, commodity products are still limited in their ability to produce believable elevation cues as well as to support the head tracking required to aid front-back localization (see Chapter 2). Therefore, an auditory display requiring no special hardware is practically constrained to placing virtual sound sources between -90° and 90° azimuth in front of the user at 0° of elevation.
Middle Ear
The primary function of the middle ear is to rectify the impedance mismatch between air and the fluids of the inner ear. Without the middle ear, all sound waves reaching the inner ear would be attenuated by roughly 20-30 dB, severely decreasing the dynamic range of human hearing. The middle ear overcomes this problem in two ways. First, the structure of the ossicle bones in the middle ear makes them an effective lever arm. The mechanical advantage results in a gain in force on the entrance of the inner ear over the force exerted by the sound wave on the ear drum. Second, the surface area of the tympanic membrane driving the ossicular chain is an order of magnitude greater than that of the bony plate pushing on the inner ear. This areal relationship results in an additional gain.
A secondary consequence of the anatomy of the middle ear is that it acts as a high- pass filter. The ossicular chain can be modeled as a spring-mass system with friction. The mass of the system is trivial considering that the ossicles are the tiniest bones in the body and the friction is negligible since the ossicular chain is suspended in air by only a few ligaments and tendons. The remaining component of the system, springiness or conversely stiffness, dominates the definition of the middle ear transfer function. The stiffness component allows high frequency sound waves to pass most easily while attenuating low frequency sounds. Figure 4.3 depicts the general shape of the transfer function of the middle ear.
Figure 4.3: Middle ear transfer function with input at the tympanic membrane and output at the oval window of the cochlea.
The stiffness of the middle ear is not constant and is directly affected by theacoustic reflex. Shortly after an intense sound is first heard, the tendons attached to the ossicu- lar chain tighten to protect the delicate structures of the inner ear from damage. The resulting increase in stiffness of the system further attenuates the intensity of low fre- quency sound waves. The intensity required to trigger the reflex varies with the center frequency of a noise band and the bandwidth of the sound, roughly 90 dB IL as a rule of thumb. The intensity, bandwidth, and frequency also have a proportional effect on
An auditory display must avoid relying on low intensity, low frequency sounds to convey information as they are likely to be partially filtered by the middle ear. A display must also avoid producing sudden, intense sounds that trigger the acoustic reflex and inadvertently attenuate the low frequency components of other sounds.
Inner Ear
The inner ear, primarily the cochlea, is responsible for the transduction of mechanical sound wave vibrations into electrical signals in the central nervous system. Figure 4.4 depicts a cross section of this spiral organ, including its primary internal structures. Figure 4.1 shows the cochlea as a whole from an external vantage.
Thestapes ossicle of the middle ear pushes on theoval windowat one end of the of the cochlea creating pressure differences within the cochlear fluids. The pressure differences create traveling waves on thebasilar membranethat peak at a location particular to the frequency of the original sound waves: high frequencies near the base, low frequencies near the apex. Tinystereocilia on the tops ofhair cells seated on the basilar membrane bend against the tectoral membrane near the front of the traveling wave. The lateral bending of the stereocillia triggers an increase in the release of neurotransmitters at the base of the hair cells. The released neurotransmitters trigger responses inafferent nerve cells, those connected to the hair cells and sending signals to the brain.
The outer three rows of hair cells running up the length of the cochlea are motile, able to shrink and elongate spontaneously and independently. When the stereocilia atop these cells are bent laterally by the traveling waves, the cells actively push against the tectoral membrane. This added force on the tectoral membrane serves to increase its motion and thus the response of the highly innervated, single row of inner hair cells. The effect of this active process results in the “sharpening” of the neural response at points along the basilar membrane coinciding with the most intense frequencies in the spectrum of the original sound.
Figure 4.4: Cross section of the cochlea. Image retrieved from http://en.wikipedia. org/wiki/Image:Cochlea-crosssection.png and licensed under the terms of the GNU Free Documentation License http://www.gnu.org/copyleft/fdl.html.
The primary benefit of the sound sharpening caused by the inner ear is an improve- ment in the frequency resolution of human hearing. However, this active process makes the inner ear inherently non-linear and results in a number of side effects. One outcome is that oftwo-tone suppression, or the tendency of an intense sound to inhibit nerve fiber response to a less intense sound near to the first in frequency. For instance, if a pure tone at 400 Hz is played at 30 dB IL simultaneously with a pure tone at 450 Hz and 60 dB, the nerve fiber response to the 400 Hz tone is inhibited by nearby fibers responding to the 450 Hz tone. Depending on the magnitude of the inhibition, the listener might not detect the 400 Hz tone at all.
Another result of the active cochlear process is that of distortion products. Aural harmonics are produced at integer multiples of an intense stimulus having only one fundamental frequency (e.g., 2f1, 3f1, 4f1). Combination tones result from the interac- tion of the fundamentals and harmonics of two simultaneous sounds. Summation and
difference tones with frequencies f2 +f1 and f2 −f1 result when two simple sounds are presented at intensities well above the threshold for hearing. Cubic difference tones
occur at frequency frequencies 2f1 −f2 and are generated when two stimulus sounds are presented at moderate intensity levels. Interestingly, combination tones can inter- act with one another and the original stimulus tones to produce secondary combination tones, implying that the tone stimuli are physically present in the cochlea.
An auditory display that relies on the presentation of multiple simultaneous sounds must separate intense sounds from less intense sounds in frequency in order to avoid suppressing the quieter of the two. The display must not rely solely on pure tones with similar frequencies to convey information because the multiple tones are likely to be perceived as one tone with intensity varying over time. Also, the display should not rely on the ability of the listener to count the exact number of sounds heard or judge the absolute frequencies as “ghost” frequencies might be introduced by the cochlea. Overall, though, the extraordinary dynamic range and resolution of the cochlea gives an audio display a large space in which to work.
Auditory filters
More abstractly, the cochlea can be viewed as a series of overlapping bandpass filters through which sounds of various frequencies are detected and encoded as neural signals. The bandwidth of these auditory filters, or critical bands, varies with frequency and ranges from roughly 90 Hz wide below 200 Hz to 900 Hz wide around 5000 Hz. Mappings of the filters reveal that they have longer high frequency tails than low frequency tails and that the high frequency tail falls off slower for filters centered around low frequencies.
This filter abstraction provides a useful framework for explaining a number of au- ditory phenomenon. If two simple tones close in frequency are played simultaneously, their points of maximum excitation on the basilar membrane will fall into the same critical band. As a result, the two tones will actually be perceived as one beating tone with intensity varying on a cycle equal to f2−f1.
Auditory filters are also important in understanding peripheral masking: when one sound (the masker) diminishes the ability of the listener to detect another sound (the target) presented at the same time (simultaneous masking), immediately before (back- ward masking), or immediately after (forward masking).
Simultaneous masking occurs when spectral content from the masker falls within the same auditory filter(s) as the target such that the intensity threshold for detecting the target is elevated. For instance, band-limited white noise centered around a target tone will mask the tone even if the noise intensity is less than that of the tone. Low frequency sounds are also capable of masking remote high frequency sounds due to the long tails of auditory filters covering higher frequencies.
Forward and backward masking occur when a masker, presented either before or after a target tone, has spectral content that falls into the same auditory filter as the target. Forward masking may be the result of the listener hearing the target as a continuation of the masker, up to roughly 150 ms after the masker ceases. Backward masking is not as well understood and has an effect on targets up to 50 ms before the masker starts.
An auditory display must mitigate masking by ensuring that wide-bandwidth and low frequency sounds do not become so intense that they mask other, simultaneously presented high frequencies. Similarly, the the timing of sounds in the display must be such that a wide-bandwidth sound does not mask another sound immediately preceding or following it.