DIAGNOSTICO DE LA SITUACIÓN COMERCIAL DEL SECTOR
RELACIONES PÚBLICAS SOUTH PACIFIC GROUP
2.4 ESTUDIO DE DEMANDA
Although some studies focus solely on the representation of HRTF data, data analysis is often performed with HRTF interpolation in mind. Therefore HRTF data analysis, representation and interpolation are often intrinsically linked. Interpolation is required in two scenarios: the first involves statically spatialising virtual sources at non measured locations; the second involves moving sources through dynamic trajectories. In the first case, interpolation essentially constitutes improving the spatial resolution of the dataset. Consequences of omitting interpolation may not be severe in this application of HRTF processing. Provided the dataset is relatively dense with respect to the Minimum Audible Angle (discussed in more detail below), simply using the nearest measured HRTF may be sufficient.
The second scenario: dynamic sources moving around a listener’s (virtual) spatial environment requires interpolation more urgently. Simply switching the HRTFs being used from one to another as a nearer measured location becomes
available may cause an audible discontinuity. This is discussed in [107], which suggests various crossfade methods to avoid this perceptual discontinuity.
Objectively, a windowed overlap-add process performed best (compared to square root, cosine and Fourier based fade in/out envelope shapes).
Reference Method Comments
Convolvotron: Early Approach [222]
HRIR mixing, time domain
Combing Effects Convolvotron: updated
[14]
Minimum-phase HRIR mixing, time domain
Improvement on above [233] Spherical head model,
magnitude interpolation, frequency domain
n/a
SFSRs [38] Inherent to representation n/a
[231] Emphasise spectral
differences
Reduces reversals, improves source movement IPTFs [65] Ipsilateral and
contralateral difference used
n/a
[1] Invert source-listener:
emit source from eardrum n/a
[211] Phase Vocoder based Non-minimum-phase [76] Filter root alignment n/a
Table 2.2: HRTF interpolation summary
Perhaps a natural initial attempt at HRTF interpolation is to simply mix adjacent HRIRs, an approach initially taken in the ‘Convolvotron’ [222], an early hardware implementation of HRIR based artificial spatialisation (for more detail, see [14]). However, the authors report unnatural combing effects for dynamic sources. Time- domain interpolation of this nature (using empirical HRIRs) is problematic, as adjacent HRIRs may have different inherent delays. Therefore, interpolation of an impulse with a pre-impulse delay (leading to smearing of the two impulses), or cancellation (a pressure trough corresponding with a pressure peak) may occur (illustrated in figure 2.1, below; also discussed and illustrated in [14]).
Figure 2.1: Three plots illustrating an extreme example of the problems with time- domain interpolation; a HRIR for 0 degree elevation, 0 degree angle (top) mixed with that at 0 degree elevation, 45 degree angle (middle). The result (bottom) illustrates pressure peak and trough cancellation and time smearing.
This difficulty was observed in the study, and a minimum-phase plus delay approach was suggested to avoid differences in arrival time (see also [14]). This minimum- phase based approach was then implemented in the next generation Convolvotron. Interestingly, in this study, a high perceptual tolerance to interpolation was reported.
Interpolating by combining adjacent impulses can be performed in the time or frequency domain. The frequency domain is more suitable, as it allows
representation of data as magnitude and phase values, which correlate to intensity and time differences when considered binaurally. The frequency domain therefore represents a higher level of binaural processing than the pressure fluctuations of the time domain. This is confirmed in [77], which concludes that frequency-domain interpolation is superior. Objectively, if intermediate measurements in a HRTF dataset are removed and replaced with interpolated versions, success of the
interpolation algorithm can be inferred by comparing the interpolated and empirical HRTFs. This study ([77]) did not use minimum-phase HRTFs, but did remove the initial inherent delay to avoid the time domain problems discussed above.
Interpolation methods used were spherically-weighted four-point interpolation and spherical-spline interpolation. Polynomial-spline interpolation considers the whole dataset, as opposed to just the nearest measured values, so has the potential to be a more comprehensive and accurate approach. However, it is considerably more computationally costly. Spherical-spline interpolation gave best results. In [153], however, linear interpolation performed better than spline interpolation in a
minimum-phase-based frequency-domain process when a relatively small number of HRTFs were used in the interpolation procedure.
In implementing frequency domain interpolation, phase values need
error. This is illustrated in the figure 2.2, below; 2 phase values are shown, 10
degrees and 50 degrees. A phase value for a HRTF half way between these measured values will be assumed to be 30 degrees in a linear interpolation scenario. However, the 50 degree phase value may actually imply 410 degrees, i.e. 50 degrees plus 1 full cycle. It is not clear how this phase interpolation issue is dealt with in the above study [77].
Figure 2.2: Phase interpolation
This flawed nature of phase interpolation is discussed in [233], which notes that a highly populated dataset can minimise this problem for low frequencies, in accordance with a Nyquist criterion (the same authors describe the HRTF
interpolation problem as an ‘open research question’ in [234]). The more complete solution suggested in forthcoming chapters is motivated by a desire for accuracy across all frequencies. The apparently appropriate solution of phase unwrapping is also discussed in forthcoming implementation chapters, which conclude that phase unwrapping cannot be deemed an infallible approach.
Considering the arrival time in empirical HRIRs and interpolating in the time domain is examined in [130]. Interestingly, in this scenario, a linear interpolation
algorithm performed better than spline and DFT-based (essentially over sampling using vectors of every ith HRIR sample) algorithms.
In a study of the often neglected median plane [154], objective tests (on minimum-phase, magnitude-spectrum interpolation) suggest spline interpolation is best for sparse datasets and linear for more densely sampled data.
Required spatial resolution of the empirical dataset is also an important consideration. In [140], linear, time domain minimum-phase interpolation suggests that the number of empirical measurements can be greatly reduced, thus reducing storage requirements. Different regions also appear to require different resolutions.
Audible differences in the magnitude spectra of HRTF filters were
investigated in [78] (differences in ITDs were not included). Overall, a poor ability to recognise differences of 1 degree was reported, rising swiftly to excellent ability at 16 degree differences. Elevation differences were more sensitive. The smallest audible change in location is known as the Minimum Audible Angle (MAA) [143].
Also particularly relevant in the context of development of an artificial spatialisation system is ability to perceive changes in directional location of moving sources. In [74], the author concludes that ‘the auditory system is relatively
insensitive to motion’. Experiments also illustrate high individual differences. The amount of source movement for a moving, as opposed to static, source to be
perceived is known as the Minimum Audible Movement Angle (MAMA). Figures of 8.3 degrees at 90 degrees/sec and 21.2 degrees at 360 degrees/sec are given in [161] (suggesting that extremely fast motion can be afforded low priority in artificial- spatialisation tasks). Interestingly, this work [161] also highlights the apparent lack of studies on moving sources (albeit in 1977), perhaps due to practical limitations (literature on static localisation is described as ‘enormous’). The study also states
that extrapolation of dynamic localisation from static results may not be valid. The area is further reviewed in [143].
Several other approaches to interpolation of HRTFs exist, for example, the SFSRs discussed above [38] inherently incorporate interpolation, as part of the process of deriving the spatial plots. In an interesting approach [231], front/back reversals are reduced and moving source perception improved by emphasising spectral differences (essentially suppressing less prominent spectral components). In [65] the authors build on the idea of processing the mono input to their spatialisation application with the ipsilateral HRTF for the ipsilateral output, then this result with the contralateral HRTF divided by the ipsilateral to arrive at the contralateral output. The Inter-Positional Transfer Function (IPTF) is developed from this technique. It essentially represents the ratio of a HRTF with its neighbour. When performing HRTF interpolation, these IPTFs are used in place of empirical measurements for nearest neighbouring HRTFs. The benefit of the method is the ability to represent the IPTF as a lower order function.
The typical source-receiver model is inverted in [1]. The analysis source is emitted from the eardrum, and recorded using a circular array. The resulting spatial frequency implies 5-degree resolution is necessary for derivation of a complete dataset. Possible aliasing at higher frequencies is implied by the Nyquist Theorem, solutions for which are suggested in the elaboration in [2]. A similar inversion is performed in [56], which looks at interpolation as a scattering process. Interpolated HRTFs (uniquely, in both angle and distance) can be derived by considering an appropriately spatially sampled scattering solution.
In an interesting non-minimum-phase approach [211], an overlap-add solution is suggested. Phase Vocoder processing is used for smooth movement
(which conveniently allows for Doppler-Effect pitch processing). MAMA criteria are suggested, with no interpolation in between points (with 5 degree resolution).
Objectively, analysis appears to show smooth spectral evolution in moving sources. It is this author’s opinion that subjective tests could verify the method, from the point of view of omitting interpolation, particularly for slow moving sources at points in HRTF space where there are relatively large differences between adjacent measurements.
In [233], magnitude interpolation and a spherical head model for phase are used. The necessity of frequency domain processing is highlighted, from an ITD point of view, due to the sensitivity of the auditory system. Finally, in [76] an
interpolation method based on the alignment of the filter roots is presented. Here, the roots of minimum-phase FIRs are aligned and interpolated in a process that
guarantees a minimum-phase output.