San Ambrosio, De mysteriis , 390-391 d.C.:

Summing localization is generally regarded as the best current description of the process whereby multiple coherent loudspeaker signals produce a single perceived source direction. However, this theory is not without its critics and various

alternatives have been suggested. Gunther Thiele suggested that the localization of stereophonic images does not occur due to the physical superposition of the

loudspeaker signals at the ears. Instead, the auditory system first detects the

the multiple loudspeaker signals at different spatial locations, and then merges them together to a phantom source in a psychoacoustic process after their signal content was detected to be congruent [Thiele, 1980]. Thiele’s Association Model therefore supposes that phantom sources and natural sources are perceived differently and rely on different processes. The ear signals which arise due to the superposition of multiple loudspeaker signals cannot therefore provide all the information about the properties of the phantom source. While Thiele’s Association Model can account for certain aspects of stereophonic localization such as the timbral shift exhibited by phantom sources, it cannot predict the perceived source direction, and has proved difficult to verify experimentally. For these reasons, the theory of summing localization and not Thiele’s Association Model is used throughout the rest of this thesis.

3.6 Meta-Theories of Localization

While these localization theories had shown that stereophony was reasonably reliable for two channels in front of a central listener, experience with Quadraphonics had suggested that this principle was not necessarily the best approach for three- dimensional audio. Michael Gerzon suggested an alternate approach which separates the process into two distinct stages of recording/encoding and playback/decoding. This division allowed for the design of a decoder that would be psychocoustically optimized, meaning it would be designed to satisfy as many of the existing localization theories and auditory cues as possible. Gerzon suggested that this approach, which forms the basis of the Ambisonics system, would provide the best performance that could be expected of any spatial audio reproduction system. His meta-theory of localization [Gerzon, 1992a] incorporated various different models of localization and is fundamental to the design of ambisonic decoders. He described a hierarchy of localization models and for each derives a vector whose direction θ gives the predicted direction of the sound, and whose magnitude R represents the stability of the localization. A real source would therefore have a vector magnitude of one with a positional angle described using spherical coordinates. The optimal decoder design would therefore ensure that the localization vectors for each model agreed for all frequencies, and that their magnitude was as large as possible, in every direction.

Gerzon’s theory emphasized the importance of the two models based, respectively, on the velocity and energy flow of the sound field at the ears. The theories of Clark, Bauer and Makita are described as special cases of a more general description based on the velocity of the wavefront. This velocity model is primarily concerned with the IPD cues produced at low frequencies. Gerzon showed how a velocity vector equal to that produced by a real source can be produced by a

multichannel system and that this will ensure that the perceived source direction will remain stable as the head is rotated. However, this can only be achieved at low frequencies, as at higher frequencies, the signal wavelength becomes comparable to the size of the human head and the effect of head-shadowing becomes more

pronounced. The frequency range in which the velocity model is applied, i.e. the decoder cross-over frequency, is therefore related to the size of the effective listening area. Above a certain limit, the size of the listening area is smaller than the human

would produce an effective listening area suitable for a single listener, while 400Hz [Gerzon, 1992a] would be suitable for a domestic listening situation with

approximately six listeners.

Above this cross-over frequency, the decoder emphasizes the ILD localization cues which arise due to the directional behaviour of the energy field around the listener. It can be shown mathematically that it is only possible to recreate the energy field of a real sound source using a small number of loudspeakers, if the sound

happens to be at the position of one of the loudspeakers. Therefore, at mid and high frequencies, not all of the ear’s localization mechanisms can be satisfied in a practical reproduction system. The direction of the energy localization vectorcan, however, be adjusted so it matches the velocity localization vector (θE = θV) for all frequencies up

to 4kHz. This is similar to the stereophonic approach recommended by Clark (see Section 3.4) [Clark et al, 1958]. In addition, Gerzon’s design optimizes RE in all

directions, which necessarily compromises localization in the directions of the loudspeakers in favour of making the quality of the localization uniform in all directions [Benjamin et al, 2006]. This effectively eliminates the timbral change which occurs with amplitude panning as the signal moves from a position at a loudspeaker to one in between loudspeakers. This also means, however, that the localization of sources positioned at a loudspeaker will be less than optimal. In summary, therefore, Gerzon recommends that the following optimizations be implemented when designing an ambisonic decoder:

- The velocity and energy vector directions are the same up to 4kHz (θE =θV)

- At low frequencies, the magnitude of the velocity vector should be near unity

for all directions (rV = 1)

- At high frequencies, the energy vector magnitude should be maximised as

much as possible and made consistent in all directions (maximum rE )

The fundamental design theory of ambisonic decoders therefore concentrates on the velocity and energy models while other models are only used to further refine the design. It is clear therefore that the Ambisonics system was based on psychoacoustic principles, however, this system also built upon the work of Alan Blumlein with coincident microphone techniques and has been developed into a complete set of recording and production techniques.

4 Sound Field Reconstruction

Two of the most common techniques that attempt to reconstruct a sound field over a given area are Ambisonics and Wavefield Synthesis. While these techniques were initially derived from quite distinct theoretical fundamentals, recent research has shown that under certain conditions, both theories can be considered equivalent [Daniel et al, 1999; Nicol et al, 1999].

In document Listado de las secciones (página 75-77)