FUNDAMENTACION TEORICA
SUB CAPITULO IV: MARCO TEORICO TITULO I: LA NATURALEZA JURIDICA
Autocorrelation is another very popular method for pitch detection that works in the time-domain. The algorithm is based on the notion of similarity measurement of itself defined as the sum of products as shown in Eq. (5.5).
acfxx[τ ] = x[τ ] ∗ x[−τ ] = N−1 n=0
x[n] · x[n + τ ] (5.5)
In Eq. (5.5), τ is the lag (discrete delay index), acfxx[τ ] is the corresponding
autocorrelation value, N is the length of the frame (portion for analysis using a rectangular window), n the sample index, and when τ = 0, acfxx[τ ] becomes the signal’s power (squaring and summing the signal’s
samples within the specified window). Similar to the way RMS is computed, autocorrelation also steps through windowed portions of a signal where each windowed frame’s samples are multiplied with each other and then summed according to Eq. (5.5). This is repeated where one frame is kept constant while the other (x[n + τ ]) is updated by shifting the input (x[n]) via τ .
What does all this mean? We know that ideally, if a signal is periodic, it will repeat itself after one complete cycle. Hence, in terms of similarity, it will be most similar when the signal is compared to a time-shifted version
Fig. 5.5. A sine wave with periodP = 8.
of itself in multiples of its period/fundamental frequency. For example, let’s consider the signal in Fig. 5.5 representing a low sample rate sine wave. We know that a perfect sine wave will repeat itself after one cycle, which is every period P = 8 (every 8th sample) in the example shown in Fig. 5.5. In the autocorrelation algorithm, the shifting is achieved via τ starting with τ = 0 and stepping through the sine wave by increasing the lag τ . Let’s step through a few values of lags τ to get a better grasp of the idea behind the autocorrelation algorithm.
ACF [0] = 2 n=1 x[n] · x[n + 0] = N n=1 x[n]2= 1(1) + 2(2) = 5 ACF [1] = 2 n=1 x[n] · x[n + 1] = 1(2) + 2(1) = 4 ACF [2] = 2 n=1 x[n] · x[n + 2] = 1(1) + 2(0) = 1 ACF [3] = 2 n=1 x[n] · x[n + 3] = 1(0) + 2(−1) = −2
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch02
60 Introduction to Digital Signal Processing
ACF [4] = 2 n=1 x[n] · x[n + 4] = 1(−1) + 2(−2) = −5 (5.6) ACF [5] = 2 n=1 x[n] · x[n + 5] = 1(−2) + 2(−1) = −4 ACF [6] = 2 n=1 x[n] · x[n + 6] = 1(−1) + 2(0) = −1 ACF [7] = 2 n=1 x[n] · x[n + 7] = 1(0) + 2(1) = 2 ACF [8] = 2 n=1 x[n] · x[n + 8] = 1(1) + 2(2) = 5 = ACF [0]
Figure 5.6 further shows a graphical version of the computed autocorrelation values from the above example with N = 2 and lag (τ ) values from 0 to 8. Figure 5.7 shows the results of the autocorrelation calculation. We immediately notice that ACF [0] = ACF [8] and if you further compute the autocorrelation values beyond τ = 8, you will see that ACF [1] = ACF [9] or in general:
ACF [τ ] = ACF [τ + P ] (5.7)
In other words, whenever there is a shifting of P samples, the autocorrelation result will show a maximum value which gives us vital information regarding a signal’s periodicity. The frequency and the pitch value (fundamental frequency) can now be computed using Eq. (5.8) where fs is the sampling frequency in samples/sec and P the period in samples.
fHz = fs
P (5.8)
Figure 5.8 shows the autocorrelation results for the same piano sound we used with the ZCR method without any modification of the signal (no filtering). In this example, the fundamental frequency (P = 168 samples) is quite clearly visible by inspection and can easily be detected with a peak detection algorithm. For example, one could simply design a rough peak detector for this situation by computing all peaks via slope polarity change and filtering out all peaks that are lower than say 40 in their energy.
Fig. 5.6. Autocorrelation withN = 2, lag τ = 0 . . . 8.
Fig. 5.7. Autocorrelation values forN = 2 and lag up to 8.
The autocorrelation-based fundamental frequency computation algorithm is summarized in Fig. 5.9. The slope itself is computed simply via (5.9).
slope =x[n] − x[n − 1]
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch02
62 Introduction to Digital Signal Processing
Fig. 5.8. Autocorrelation results for a piano sample.
Fig. 5.9. Autocorrelation pitch computation.
In general, a larger frame size (N : frame length) results in clearer autocorrelation peaks as we have more data for analysis. However, the longer the frame length, the less transient response we will obtain, as we compute the ACF over a longer segment of time. If the pitch changes rapidly, using a larger frame size will cause inaccurate computation of the pitch values as we lose transient information. Hence, there has to be some sort of compromise between transient response and the accuracy in computing the fundamental frequency. Also, we can improve the performance of a fundamental frequency detector if we know beforehand what instrument it is that we want to analyze. If for example you were to design a guitar tuner using the autocorrelation method, you would not need to worry about frequencies that belong to
the bass guitar only for example and hence can make the maximum frame size considerably shorter (f is inversely proportional to the wavelength λ) improving the overall transient response.
Another method to potentially improve transient response in the autocorrelation algorithm is through a dynamically changing frame size. This method simply uses a “very” long frame size as a guide which is determined by the lowest allowed fundamental frequency. This long frame is used in guiding the search for sharp peaks as longer frame sizes result in clearer and more pronounced maxima. The basic idea is exploiting the fact that if the computed period is short and hence the pitch detected is “high,”
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch02
64 Introduction to Digital Signal Processing
then there would be no need to utilize a long frame to get the job done — a “shorter” frame size will be sufficient. The ensuing benefit is that whenever possible the algorithm selects a shorter frame length thus resulting in improved transient response. This algorithm is outlined in Fig. 5.10.
Although the autocorrelation method works robustly in many cases and is used widely in cell phone and voice coding applications as well as for musical signals, there are probably a lot of instances where it does not do a proper job in computing the fundamental frequency. This often occurs when the local peaks that are not part of the fundamental frequency are not negligible and small enough unlike the local peaks in Fig. 5.8. The problem thus arises as those local peaks become quite large and therefore complicate the peak picking process considerably. We could again apply a moving average filter as we did with ZCR to alleviate this problem, but the bottom line is that there is no perfect, bullet-proof algorithm that will detect all pitches for all instruments with total accuracy. At the end of the day, not only is it difficult to deal with the technical aspects of the algorithms used, but it is also very difficult to compute the pitch itself, as pitch is very much a perceptual aspect of sound — psychoacoustic considerations have to be taken into account.