ECTS Semestral 1 ECTS Semestral 2 ECTS Semestral 3 2,5
NIVEL 2: Ciencia y Tecnología para la Paz: poder, conflictos y valores 5.5.1.1 Datos Básicos del Nivel 2
In this section, we present receiver operating characteristics of three DTDs, namely, the Geigel detector, the cross-correlation detector, and the normalized cross-correlation detector. As reference, we also show the region of operation for the “threshold free” two-path logic described in Section 3.6.
Estimates of and were obtained according to the procedure in [6]
and we present the ROCs estimated from speech as well as stationary synthetic data. The speech data we use contain sentences from three male and two female talkers. Furthermore, all sentences/synthetic data are also normalized to have the same average power level.
Simulation details are as follows:
Echo path. The echo path used is a measured acoustic response between the left loudspeaker and a standard cardioid microphone positioned on top of a workstation. The original impulse response has a length of 256 ms,
consisting of 4096 coefficients at a 16 kHz sampling rate. However, it was subsequently decimated to an 8 kHz sampling rate, resulting in 2048 coefficients. It is also normalized so that for the actual speech and synthetic data.
Probability of false alarm. When estimating the probability of false alarm we use sentences from five talkers as far-end speech. These consist to-ambient-noise ratio,
of speech sequences of 21.8 seconds at an 8 kHz sampling rate. The echo-is set to 1000 (30 dB).
Probability of miss. The probability of a miss is estimated using 5 seconds of far-end speech from one male talker. As near-end speech 8 sentences are used, each about 2 seconds long. In this case, we investigate the performance when the average echo-to-background ratio
since it is natural to assume equally strong talkers.
The simulation conditions for the case with synthetic data are equivalent to those of the speech case. The synthetic sequences [far-end, near-end (double-talk), and ambient noise] are all white Gaussian distributed and mutually independent.
The synthetic data enables us to assess the influence of in-stationarity of speech (e.g. the instantaneously varying EBR/ENR).
A hold time of 30 ms (240 samples) is used in all three detectors.
The thresholds of the detectors are chosen such that their probability of false alarm is in the range of 0 to about 1. For these thresholds, we then estimate the corresponding probability of detection (see [6] for details). Since the two-path method is threshold free, we instead vary the smoothing parameter over a range of practical values and estimate the resulting probabilities. These probabilities are presented together with the ROCs of the other detectors. The smoothing parameter (time constant) is varied from 0.1 to 0.5 s in steps of 0.1 s.
Results from these simulations are shown in Fig. 6.5. These results are consistent with those reported in [6] and, by the use of the ROC, it is possible to set thresholds such that the DTDs are compared fairly. In general, increases and decreases when we compare the ROC curves estimated using speech versus the ROC curves estimated using synthetic signals. We also find that the ROC of the Geigel detector is more sensitive to this signal condition change than the ROCs of the other detectors. It is clear from these results that the normalized cross-correlation detector has superior performance compared to the two others. Also in this figure, the operation region for the threshold free two-path implementation is shown. The two-two-path method takes into account whether or not it is beneficial to update the adaptive algorithm for the specific data set.
Hence, the estimated probability of false alarm increases and this should be taken into account when interpreting the results. The lowest probability of miss is attained when the smoothing parameter (time constant) is 0.5 s.
5. DISCUSSION
In this chapter, we have presented double-talk detection algorithms suitable for acoustic echo cancellation. Because of the often unknown attenuation and the continuously time-varying nature of acoustic echo paths, devising an appro-priate DTD is more challenging than in the network echo canceler case. There are basically two types of double-talk detectors. First, those which form their test statistics from estimated level or power of far-end, near-end including echo, or residual echo signals. Secondly, detectors that make their decisions from cross-correlation or coherence estimates of the same involved signals. In this group we also find detectors utilizing the estimate of the echo path since these estimates are derived through cross-correlation as well. Double-talk detectors based on cross-correlation techniques exhibit desirable properties needed for the acoustic case. Mainly, they have very low sensitivity to the attenuation of the echo path. However, a problem that has to be considered and designed for, is the longer response time which may result. This is due to the fact that good (low variance) test statistics need to be based on a large amount of data.
The ideal double-talk detector should be insensitive to echo path variations, have equal performance whether the echo canceler has converged or not, have quick response time and be sensitive to low near-end speech levels. More-over, the DTD must not slow down convergence rate of the AEC which can result either from erroneous decisions or introduction of delays. Some of these properties can be characterized by probability of detection and false alarm.
Notes
1. Originally from [1]. This quotation was borrowed from [2].
References
[1] S. B. Weinstein, “Echo cancellation in the telephone network,” IEEE Commun. Soc. Mag., pp. 9-15, 1977.
[2] C. R. Johnson, “On the interaction of adaptive filtering, identification, and control,” IEEE Signal Proc. Mag., vol. 12, pp. 22-37, Mar. 1995.
[3] S. Haykin, Adaptive Filter Theory. New Jersey: Prentice-Hall, Inc, 2002.
[4] T. Gänsler, M. Hansson, C.-J. Ivarsson, and G. Salomonsson, “A double-talk detector based on coherence,” IEEE Trans. Commun., vol. 44, pp. 1421-1427, Nov. 1996.
[5] J. Benesty, D. R. Morgan, and J. H. Cho, “An new class of doubletalk detectors based on cross-correlation,” IEEE Trans. Speech Audio Processing. vol. 8, pp 168-172, Mar. 2000.
[6] J. H. Cho, D. R. Morgan, and J. Benesty, “An objective technique for evaluating doubletalk detectors in acoustic cancelers,” IEEE Trans. Speech Audio Processing, vol. 7, pp. 718-724, Nov. 1999.
[7] M. M. Sondhi, “An adaptive echo canceler,” Bell Syst. Techn. J., vol. XLVI, pp. 497-510, Mar. 1967.
[8] D. L. Duttweiler, “A twelve-channel digital echo canceler,” IEEE Trans. Commun., vol.
26, pp. 647-653, May 1978.
[9] H. Ye and B. X. Wu, “A new double-talk detection algorithm based on the orthogonality theorem,” IEEE Trans. Commun., vol. 39, pp. 1542-1545, Nov. 1991.
[10] R. D. Wesel, “Cross-correlation vectors and double-talk control for echo cancellation,”
Unpublished work, 1994.
[11] J. Prado and E. Moulines, “Frequency-domain adaptive filtering with applications to acous-tic echo cancellation,” Ann. Télécomun., vol. 49, pp. 414-428, 1994.
[12] S. M. Kuo and Z. Pan, “An acoustic echo canceller adaptable during double-talk periods using two microphones,” Acoustics Letters, vol. 15, pp. 175-179, 1992.
[13] K. Ochiai, T. Araseki, and T. Ogihara, “Echo canceler with two echo path models,” IEEE Trans. Commun., vol. COM-25, pp. 589-595, June 1977.
[14] J. Benesty, D. R. Morgan, and J. H. Cho, “A family of doubletalk detectors based on cross-correlation,” in Proc. IWAENC, Sept. 1999, pp. 108-111.
[15] C. H. Knapp and C. G. Carter, “The generalised correlation method for estimation of time delay,” IEEE Trans. Acoust., Speech and Signal Processing, vol. 24, pp. 320-327, Aug.
1976.
[16] J. R. Zeidler, “Performance analysis of LMS adaptive prediction filters,” Proc. of the IEEE, vol. 78, pp. 1781-1806, Dec. 1990.
[17] J. Benesty et. al. Advances in Network and Acoustic Echo Cancellation. Springer-Verlag, Berlin, 2001.
[18] D. J. Thomson, “Spectrum estimation and harmonic analysis,” Proc. of the IEEE, vol. 70, pp. 1055-1096, Sept. 1982.
[19] D. Slepian, “Prolate spheroidal wave funcions, Fourier analysis, and uncertainty-V,” Bell Syst. Tech. J., vol. 40, pp. 1371-1429, 1978.
[22] E. J. Diethorn, “Improved decision logic for two-path echo cancelers,” in Proc. IWAENC, 2001.
[23] T. Gänsler, S. L. Gay, M. M. Sondhi, and J. Benesty, “Double-talk robust fast converging algorithms for network echo cancellation,” IEEE Trans. Speech Audio Processing. vol. 8, pp. 656-663, Nov. 2000.
[24] P. J. Huber, Robust Statistics. pages 68-71, 135-138. New York: Wiley, 1981.
Tomas Gänsler
Agere Systems [email protected]
Volker Fischer
Darmstadt University of Technology, Department of Communication Technology [email protected]
Eric J. Diethorn
Avaya Labs, Avaya [email protected]
Jacob Benesty
Université du Québec, INRS-EMT [email protected]
Abstract A software application has been designed that runs a stereophonic acoustic echo canceler natively under Windows operating systems on personal computers: the WinEC. This is a major achievement since echo cancelers require that the sound card’s input and output signals are time-synchronous. Synchronizing the audio streams is a great challenge in such an “asynchronous” environment as the operat-ing system of a PC. Furthermore, stereophonic echo cancellation is significantly more complicated to handle than the monophonic case because of computational complexity, nonuniqueness of solution, and convergence problems. In this chap-ter we present the system design and the core algorithms we use. This system has been evaluated in point-to-point as well as multi-point communication scenarios.
We regularly use the software for teleconferencing in wideband stereo audio over commercial IP networks.
Keywords: Real-Time Implementation, Hands-Free, Stereophonic, Acoustic Echo Canceler, VoIP
1. INTRODUCTION
Real-time echo cancellation requires a significant amount of computational resources. From a computational point of view, real-time implementation has usually been realized using custom-designed very large scale integration (VLSI) circuits or digital signal processors (DSPs) [1]. These processors are specif-ically designed for signal processing tasks. They provide parallel processing of operations and optimized pipeline structures. However, since the computa-tional power of personal computers (PCs) has increased tremendously in the last few years, it is possible to perform very demanding real-time signal pro-cessing in this environment as well. Moreover, the PC environment permits the use of high-level programming languages, like C++, without the restrictions commonly imposed by DSPs, such as fixed-point arithmetic. The resulting source code can be easily used for implementing new algorithms and testing them in real-time without the need to port to special hardware. Furthermore, modern PC processors have SIMD (single instruction, multiple data) processing capabilities which can be used to speed optimize the program.
The objective of this chapter is to present a flexible echo-cancelling speak-erphone algorithm that runs natively under the operating system (OS) on a PC.
The additional hardware needed to support hands-free communication on a PC is a full-duplex capable sound card and a network adaptor, like a modem or an ethernet card. Depending on the desired operation mode, a mono or stereo microphone and loudspeakers are needed. For all, off-the-shelf hardware can be used.
This work was done when the authors were with Bell Labs, Lucent Tech-nologies in the year of 2000. The system has previously been presented in [2, 3, 4]. Many of the original underlying research results can also be found in [5]. The echo canceler implementation provides the capability of com-municating hands-free in single-channel mode (receive one and transmit one audio-stream), synthetic-stereo mode (receive two and transmit one stream), or full stereo mode (receive two and transmit two audio-streams). In the full stereo case, natural stereo is transmitted to the receiving side. In the synthetic case, synthesized stereo [6] or 3D-audio [7] is generated from the mono audio stream at an intermediate conference server. The bandwidth of the audio is 8 kHz.
To accommodate different acoustic environments, the echo canceler can span acoustic paths of lengths 32, 64, 128, or 256 ms.
1.1 SIGNAL MODEL
A block diagram of a two-channel, point-to-point speech communication link with one1 echo canceler is shown in Figure 7.1. We denote the signals picked up by the microphones in the transmission room by and the return signal picked up by one of the microphones in the receiving room by The receiving room
Hence, we have the receiving room signal is in general composed of echo, ambient noise and possibly receiving room speech
signal model: where
is the echo, * denotes convolution, and are the receiving room echo paths.