ACTIVIDAD FORMATIVA HORAS PRESENCIALIDAD

This function consists of the a posteriori SNR normalized by the threshold of speech activity detection, Specifically, the noise reduction formula is

where gain function is given by

Threshold specifies the a posteriori SNR level at which the certainty of speech is declared and a positive integer, is the gain expansion factor. Typical values for the detection threshold fall in the range though the (subjec-tively) best value depends on the characteristics of the filter bank architecture and the time constants used to compute the envelope estimates, among other things. The expansion factor controls the rate of decay of the gain function for a posteriori SNRs below unity. With for example, the gain decays linearly with a posteriori SNR. Factor also governs the amount of noise re-duction possible by controlling the lower bound of (4.35); larger results in a smaller lower bound. The operator insures the gain reaches a value no greater than unity.

Looking at (4.35), subband time series whose a posteriori SNR exceeds the speech detection threshold are passed to the synthesis bank with unity gain.

Subband time series whose a posteriori SNR is less than the threshold are passed to the synthesis bank with a gain that is proportional to the SNR raised to the power

Note in particular that (4.35) does not involve a spectral subtraction operation.

This has the benefit of circumventing the problem of a negative argument, as occurs with the parametric form in (4.26). A disadvantage of (4.35) is that the gain function, and therefore noise reduction level, is bounded below by the we have (for

reciprocal of the detection threshold. That is, as the a priori SNR goes to zero

For example, with the system provides no more than 20 dB of noise reduction.

A variation on the above technique incorporates for each subband both the per-band, or narrowband, normalized a posteriori SNR and a arithmetic average of the a posteriori SNRs from neighboring bands. This narrowband-broadband hybrid gain function can provide improved noise reduction perfor-mance for wideband speech utterances, such as fricatives. The reader is referred to [19] for more information.

resulting in a minimum gain in (4.35) of –18 dB. The lower-most solid line in Fig. 4.1 shows the gain function (4.35) in comparison with the Wiener, magnitude subtraction and power subtraction gain functions.

The upper trace in Fig. 4.4 shows a segment of raw (unprocessed) time se-ries for a sese-ries of short utterances (digit counting) recorded in an automobile traveling at highway speeds. The speech was recorded from the microphone channel of a wireless-phone handset and later digitized at an 8 kHz sampling rate. The lower trace in Fig. 4.4 shows the corresponding noise-reduced time series produced by the noise reduction algorithm. Figure 4.5 shows spectro-grams corresponding to the time series in Fig. 4.4. The spectrospectro-grams show, at least visibly, that the noise reduction method introduces no noticeable distor-tion. Figure 4.6 shows the averaged power-spectral density of the background noise for the raw and noise-reduced time series. These power spectral densities were computed from the time series in Fig. 4.4 over the interval

can be seen, the noise floor of the processed time series is about 18 dB below that of the raw time series uniformly across the speech band.

6. CONCLUSION

The subject of noise reduction for speech enhancement is a mature one with a 40-year history in the field of telecommunication. The majority of research has focused on the class of noise reduction methods incorporating the technique of short-time spectral modification. These methods are based upon subband filter bank processing architectures, are relatively simple to implement and can provide significant gains to the subjective quality of noisy speech. The earliest of these methods was developed in 1960 by researchers at Bell Laboratories.

Noise reduction processing has its roots in classical Wiener filter theory.

Reviewed in this chapter were the most commonly used noise reduction formu-lations, including the short-time Wiener filter, spectral magnitude subtraction, spectral power subtraction, and the generalized parametric Wiener filter. When implemented digitally, these methods frequently suffer from the presence of processing artifacts, a phenomenon known as musical noise. The origins of musical noise were reviewed, as were approaches to combating the problem.

The subject of speech envelope estimation was presented in detail and sev-eral averaging techniques for computing envelope estimates were reviewed. A low-complexity noise reduction algorithm was presented and demonstrated by example.

Notes

1. The estimated instantaneous a priori SNR is the ratio

References

[1] M. R. Schroeder, U.S. Patent No. 3,180,936, filed Dec. 1, 1960, issued Apr. 27, 1965.

[2] M. R. Schroeder, U.S. Patent No. 3,403,224, filed May 28, 1965, issued Sep. 24, 1968.

[3] M. M. Sondhi and S. Sievers, AT&T Bell Laboratories Internal Report (unpublished), Dec. 1964.

[4] M. M. Sondhi, C. E. Schmidt, and L. R. Rabiner,“Improving the quality of a noisy speech signal,” Bell Syst. Techn. J., vol. 60, Oct. 1981.

[5] R. E. Crochiere and L. R. Rabiner, Multimte Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1983.

[6] N. Wiener, Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Applications. New York: Wiley, 1949.

[7] H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I. New York: John Wiley & Sons, 1968.

[8] M. R. Portnoff, “Time-frequency representation of digital signals and systems based on short-time Fourier analysis,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-28, pp. 55-69, Feb. 1980.

[9] J. B. Allen, “Short-time spectral analysis, synthesis and modification by discrete Fourier transform,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-25, pp. 235-238, June 1977.

[10] L. R. Rabiner, M. R. Sambur, and C. E. Schmidt, “Applications of a nonlinear smoothing algorithm to speech processing,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-23, Dec. 1975.

[11] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust.. Speech, Signal Proc., vol. ASSP-27, Apr. 1979.

[12] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, Dec. 1984.

[13] R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft-decision noise suppression filter,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, Apr.

1980.

[14] J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proc. of the IEEE, vol. 67, Dec. 1979.

[15] M. R. Weiss, E. Aschkenasy, and T. W. Parsons, “Processing speech signals to attenuate interference,” in Proc. IEEE Symposium on Speech Recognition, Carnegie-Mellon Univ., Apr. 15-19, 1974, pp. 292-295.

[16] O. Cappé, “Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor,” IEEE Trans. Speech Audio Process., vol. 2, Apr. 1994.

[17] W. Etter and G. S. Moschytz, “Noise reduction by noise-adaptive spectral magnitude expansion,” J. Audio Eng. Soc., vol. 42, May 1994.

[18] B. M. Helf and P. L. Chu, “Reduction of background noise for speech enhancement,” U.S.

Patent No. 5,550,924, Mar. 13, 1995.

[19] E. J. Diethorn, “A subband noise-reduction method for enhancing speech in telephony &

teleconferencing,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, NY, Oct. 19-22, 1997.

[20] P. Vary, “Noise suppression by spectral magnitude estimation – mechanism and theoretical limits,” Signal Processing, vol. 8, pp. 387-400, July 1985.

[21] Y. Ephraim, D. Malah, and B. H. Juang, “On the application of hidden Markov models for enhancing noisy speech,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-37, pp. 1846-1856, Dec. 1989.

[24] D. E. Tsoukalas and J. N. Mourjopoulos, “Speech enhancement based on audible noise suppression,” IEEE Trans. Speech Audio Process., vol. 5, Nov. 1997.

[25] F. Plante, G. Meyer, and W. A. Ainsworth, “Improvement of speech spectrogram accuracy by the method of reassignment,” IEEE Trans. Speech Audio Process., vol. 6, May 1998.

[26] R. H. Erving, W. A. Ford, and R. Miller, U.S. Patent No. 5,007,046, filed Dec. 28, 1988, issued Apr. 9, 1991.

Jacob Benesty

Université du Québec, INRS-EMT [email protected]

Tomas Gänsler

Agere Systems [email protected]

Yiteng (Arden) Huang

Bell Laboratories, Lucent Technologies [email protected]

Markus Rupp

TU Wien, Institute for Communication and RF Engineering [email protected]

Abstract The first thing that comes in mind when we talk about acoustic echo cancellation is adaptive filtering. In this chapter, we discuss a large number of multichannel adaptive algorithms, both in time and frequency domains. This discussion will be developed in the context of multichannel acoustic echo cancellation where we have to identify a multiple-input multiple-output (MIMO) system (e.g., room acoustic impulse responses).

Keywords: Acoustic Echo Cancellation, Multichannel, Adaptive Algorithms, LMS, APA, RLS, FRLS, Exponentiated, MIMO, Frequency-Domain

1. INTRODUCTION

All today’s teleconferencing systems are hands-free and single-channel (meaning that there is only one microphone and one loudspeaker). In the near future, we expect that multichannel systems (with at least two loudspeakers and at least one microphone) will be available to customers, therefore providing a realistic presence that single-channel systems cannot offer.

In hands-free systems, the coupling between loudspeakers and microphones can be very strong and this can generate important echoes that eventually make the system completely unstable (e.g., the system starts howling). Therefore, multichannel acoustic echo cancelers (MCAECs) are absolutely necessary for full-duplex communication [1]. Let P and Q be respectively the numbers of loudspeakers and microphones. For a teleconferencing system, the MCAECs consist of PQ adaptive filters aiming at identifying PQ echo paths from P loudspeakers to Q microphones. This scheme is, in fact, a multiple-input multiple-output (MIMO) system. We assume that the teleconferencing system is organized between two rooms: the “transmission” and “receiving” rooms.

The transmission room is sometimes referred to as the far-end and the receiving room as the near-end. So each room needs an MCAEC for each microphone.

Thus, multichannel acoustic echo cancellation consists of a direct identification of an unknown linear MIMO system.

Although conceptually very similar, multichannel acoustic echo cancellation (MCAEC) is fundamentally different from traditional mono echo cancellation in one respect: a straightforward generalization of the mono echo canceler would not only have to track changing echo paths in the receiving room, but also in the transmission room! For example, the canceler would have to reconverge if one talker stops talking and another starts talking at a different location in the transmission room. There is no adaptive algorithm that can track such a change sufficiently fast and this scheme therefore results in poor echo suppression.

Thus, a generalization of the mono AEC in the multichannel case does not result in satisfactory performance.

The theory explaining the problem of MCAEC was described in [1] and [2]. The fundamental problem is that the multiple channels may carry linearly related signals which in turn may make the normal equations to be solved by the adaptive algorithm singular. This implies that there is no unique solution to the equations but an infinite number of solutions, and it can be shown that all but the true one depend on the impulse responses of the transmission room. As a result, intensive studies have been made of how to handle this properly. It was shown in [2] that the only solution to the nonuniqueness problem is to reduce the coherence between the different loudspeaker signals, and an efficient low complexity method for this purpose was also given.

The performance of the MCAEC is more severely affected by the choice of the adaptive algorithm than the monophonic counterpart [9], [10]. This is easily recognized since the performance of most adaptive algorithms depends on the condition number of the input signal covariance matrix. In the multichannel case, the condition number is very high; as a result, algorithms such as the least-mean-square (LMS) or the normalized LMS (NLMS), which do not take into account the cross-correlation among all the input signals, converge very slowly to the true solution. It is therefore highly interesting to study multichannel adaptive filtering algorithms.

In this chapter, we develop a general framework for multichannel adaptive filters with the purpose to improve their performance in time and frequency domains. We also investigate a recently proposed class of adaptive algorithms that exploit sparsity of room acoustic impulse responses. These algorithms are very interesting both from theoretical and practical standpoints since they converge and track much better than the NLMS algorithm for example.

2. NORMAL EQUATIONS AND IDENTIFICATION

In document 1 / 120 (página 30-33)