Facultad de Educación Centro de Formación del Profesorado

Reverberation is caused by the multi-path propagation of an acoustic signal from its source to the microphone. Room reverberation is introduced due to surface reflections within a room, as illustrated in Figure 1.1. Both the speakers produce wavefronts propagating outward, with some reaching the microphones directly and some others reflecting off the walls and superimposing at the microphones. The energy and phase of the reflections reaching the microphones are different from those of the direct signals due to the differences in the length of the propagation paths. As a result, delayed and attenuated copies of the source signal are present in the microphone signals, described as reverberation [58, 89, 118].

The signal received at the microphone is generally composed of a direct sound com- ing from the source to the microphone, reflections that arrive shortly after the direct sound (also called early reflections), and reflections that arrive after early reverberation (commonly known as late reverberation). The combination of direct sound and early reflections are sometimes named as early sound component. Early reverberation is not perceived as a separate sound to the direct sound as long as the delay of the reflections

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 time [s] Amplitude

Room Impulse Response Direct signal

Early reflections Late reverberation

Figure 2.4: Room impulse response of a room with T60= 0.89 s.

does not exceed a limit of approximately 80-100 msec with respect to the arrival time of the direct sound, however it can be perceived to reinforce the direct sound and is therefore considered useful with regard to speech intelligibility. This phenomenon is often referred to as the precedence effect. Early reverberation mainly causes spectral distortion due to non-flat frequency response called colouration. Late reverberation which arrives at the microphone with longer delays is perceived as separate echoes or as reverberation and impairs speech intelligibility. This is due to the two masking effects introduced by the late reverberations, namely self masking where the speech spectrum is smeared by the late reverberations, and overlap masking where the energy of the preceding phoneme overlaps with that of the subsequent phonemes. It can have severe effects on the performance of automatic speech recognition (ASR) systems. Also it is one of the main factor in performance degradation of the source separation algorithms [58, 89, 118].

The behaviour of the acoustic channel between the source and microphone can be characterized by a room impulse response (RIR). It represents the signal recorded at the microphone in response to a source that generates a sound impulse. For the sake of argument, we model the room’s acoustical behaviour in response to different sounds with a second order differential equation as follows:

d2y dt2 + 2α

dy dt + ω0

2_{y = 0,} _(2.16)

where α, the damping attenuation and ω0 is the natural frequency of the system (i.e.

room) with their ratio ζ = _ωα

0 known as damping factor. Depending on the damping

factor ζ the system can be over damped (ζ > 1), critically damped (ζ = 1) and under

damped (ζ < 1). When the natural frequency (ω0) is greater than the α, (i.e. ζ < 1)

the impulse response can go negative and so the RIRs are not always positive. As shown in Figure 2.4, the RIR can be split into three main sections, the direct path, the early reflections and late reflections. The direct sound, early reverberations and late reverberations are the convolution of these segments with the desired signal. Addition- ally, it is also observed that the energy of the reflections decays at an exponential rate. This exponential decay property of the RIR gives rise to the concept of reverberation time (RT). It is defined as the time required for the average sound-energy at a given frequency to reduce to one-millionth of its initial steady-state value after the sound source has been switched off and this corresponds to a decrease of 60 decibels (dB). Now to explain the effects of reverberation on speech perception, an example is given in Figures 2.5 and 2.6. The effects of reverberation are clearly visible and audible in the spectrogram and waveform of a speech signal. The Figure 2.5 shows the waveform of the clean and reverberant signals. The clean signal is convolved with room impulse response of a real room with T60= 0.89 s with a source-microphone distance of 1.5 m.

The spectrograms of clean and reverberant signals are shown in Figure 2.6 .

The distortion caused by the acoustic channel is visible in both the waveform and the spectrogram. In the spectrogram a blurring effect is visible, while in the waveform the silent gaps are filled with energy. These distortions result in an audible difference between the anechoic and the reverberant speech, and hence degraded speech intelligibility. Hence methods should be developed to reduce such detrimental effects of

0 0.5 1 1.5 2 2.5 −1 −0.5 0 0.5 1 Time (s) Magnitude Original signal 0 0.5 1 1.5 2 2.5 −1 −0.5 0 0.5 1 Time (s) Magnitude Reverberant signal

Figure 2.5: The waveform of one utterance from TIMIT data set (top) and the reverberant version of the same signal (bottom).

time [s] frequency [kHz] 0 0.5 1 1.5 2 2.5 0 2 4 6 8 time [s] frequency [kHz] 0 0.5 1 1.5 2 2.5 0 2 4 6 8 −60 −40 −20 0 20 −80 −60 −40 −20 0

Figure 2.6: The spectrogram of one utterance from TIMIT data set (top) and the reverberant version of the same signal with T60= 0.89 s (bottom).

reverberation on the speech signal. In chapter 5 we introduce a method to mitigate the reverberation effect.

In document Las TIC en la Enseñanza: Experiencias en la UCM (página 116-121)