ADMINISTRACIÓN FINANCIERA DE TESORERÍA
4.1 E STRUCTURA DE LA TESORERÍA
In this chapter, we have developed a broadband direction of arrival estimation tech- nique for a sensor array mounted on a complex-shaped rigid body. The key to the success of this method is the algorithm’s ability to exploit the diversity information in the frequency-domain of the measured channel transfer function, produced by the scattering and reflection of sound waves off the rigid body. Subband signals extracted from the broadband received signals are used to derive a DOA estimator using a sig- nal subspace approach. The proposed method achieves higher resolution and clearer separation of closely space sound sources, in comparison to existing DOA estimators.
Specific contributions made in this chapter are:
i The concept of using frequency-domain diversity for DOA estimation was intro- duced. In this context, the diversity information was derived from the scattering and reflections caused by the rigid body that acts as the mounting object of a sensor array.
ii A subband signal decomposition and focussing method was provided for ex- tracting the spatial diversity information in the broadband received signals. This method arose from the interpretation of a broadband source as a collection of modulated narrowband sources.
iii A method was developed to combine the subband source information across frequency, such that the diversity in the frequency-domain was retained. This was achieved by creating a higher dimensional received signal correlation ma- trix, where the focussed subband signals act as a set of co-located independent sources. We showed that this formulation leads to a number of DOA estimation scenarios, where a DOA estimator based on signal subspace concepts could be applied.
iv The performance of the proposed DOA estimator was evaluated in each scenario and compared with existing DOA estimators. It was shown that higher resolution and clearer separation of closely spaced sources can be achieved by exploiting the frequency-domain diversity, albeit at a minimum cost of a linear increase in computational complexity.
Finally, the Cramér-Rao Bound (CRB) is an important benchmark for the compar- ison of direction of arrival estimators. The derivation of the CRB for a sensor array on a complex-shaped rigid body and an analysis of its closely spaced source resolution capabilities are discussed in Chapter 5.
Chapter4
Binaural Sound Source Localization
using the Frequency Diversity of
the Head-Related Transfer Function
Overview: This chapter investigates the localization performance of a binaural source loca-
tion estimator applied to the human auditory system. Localizing a source in 3-D using just two sensors typically results in location ambiguities and false detections, and resolving this ambiguity requires the use of the additional diversity information contained in the frequency- domain of the head-related transfer function. In this chapter, the theoretical development of the source location estimator in Chapter 3 has been applied to the binaural source localiza- tion problem. The localization performance is experimentally evaluated for single and multiple source scenarios in the horizontal and vertical planes, corresponding to regions in space where the localization ability of humans differ. The localization performance of the proposed estima- tor is compared with existing localization techniques, and its ability to successfully localize a sound source and resolve the ambiguities in the vertical plane is demonstrated. The perfor- mance impact of the actual source location and the calibration of the HRTF measurements to the room conditions are also evaluated and discussed.
4.1
Introduction
Accurately locating the source of a sound is a matter of life or death in the natural environment. Although binaural localization is a simple task for the neural networks in the brain, artificially replicating these abilities has been a challenge in signal pro- cessing. Many solutions to the multi-channel source localization problem have been proposed, but high spatial resolution requires sensor arrays with a large number of elements. In contrast, the auditory systems of humans and animals provide similar levels of performance using just two sensors. A localization technique that exploits the knowledge and diversity of the Head-Related Transfer Function (HRTF) could therefore provide high-precision source location estimates using a binaural system.
In the context of a human listener, a sound wave propagating from a source to the ear is transformed as it encounters the body and pinna of the individual. The scat- tering and reflections caused by the head, torso and pinna are both frequency- and direction-dependent, and can be characterized using the head-related transfer func- tion [41, 65]. A human being exploits three localization cues described by the HRTF for sound source localization [67, 77]; interaural time difference (ITD) caused by the propagation delay between the ears, interaural intensity difference (IID) caused by the head shadowing effect and spectral cues caused by reflections in the pinna. Per- ceptual experiments have shown that any change to the physical structure of the ear can affect the source localization performance of humans [45], and reaffirms the importance of the HRTF for binaural source localization [3, 12, 70, 83]. Given that the HRTF at each potential source location is known, the objective of a localization algorithm is to perform the inverse mapping of the perceived localization cues to a source location.
A number of techniques based on correlation analysis [54], beamforming [103] and signal subspace concepts [47, 101] have been developed for the broadband source localization problem in free-space, and the Time Difference Of Arrival (TDOA), or ITD in the binaural scenario, remains the most popular localization cue that is ex- ploited. This is mainly due to the TDOA being a natural estimator of the source loca- tion for two spatially separated sensors in the free field. However, the presence of the head complicates the localization process in the binaural scenario. For example, the approximately spherical shape of the human head results in regions of similar ITD, known as a “cone of confusion” [93], where the different source locations are iden- tified by the IID and spectral cues. Although the change in ITD including the head and torso can be modelled using a spherical head model [4], the emphasis on ITD as the primary localization cue could lead to front-to-back confusions and poor perfor- mance distinguishing between locations on a sagittal (vertical) plane. This has been demonstrated in binaural localization experiments using artificial systems [22, 60], as well as in perceptual experiments on human subjects [21, 69, 105]. Thus, IID and spectral cues must act as the primary localization cues that enable the accurate deter- mination of the source location at higher frequencies. Experimental results indicate that this is indeed the case, and that an accurate estimate of the elevation angle is possible when the ITD or IID cues are combined with the spectral cues generated by the pinna [3, 12, 63, 69, 70, 79, 83]. Hence, it is well established that any binau- ral source localization mechanism must exploit all three localization cues within the HRTF for accurate localization of a source, in both azimuth and elevation.