por el ser humano
7. Una reflexión final
Let’s take one more look at a single butterfly. The FFT butterfly structures in
Figures 4-8, 4-9, 4-11, and 4-12 are the direct result of the derivations of the decimation-in-time and decimation-in-frequency algorithms. Although it’s not very obvious at first, the twiddle factor exponents shown in these structures do have a consistent pattern. Notice how they always take the general forms shown in Figure 4-14(a).† To implement the decimation-in-time butterfly of Figure 4-14(a), we’d have to perform two complex multiplications and two complex additions. Well, there’s a better way. Consider the decimation-in-time butterfly in Figure 4-14(a). If the top input is x and the bottom input is y, the top butterfly output would be
† Remember, for simplicity the butterfly structures in Figures 4-8 through 4-13 show only the twiddle factor exponents, k and k+N/2,
and not the entire complex twiddle factors.
Figure 4-14 Decimation-in-time and decimation-in-frequency butterfly structures: (a) original form; (b)
simplified form; (c) optimized form.
(4-26)
and the bottom butterfly output would be
(4-27)
Fortunately, the operations in Eqs. (4-26) and (4-27) can be simplified because the two twiddle factors are related by
So we can replace the twiddle factors in
Figure 4-14(a) with to give us the simplified butterflies shown in Figure 4-14(b). Because the twiddle factors in Figure 4-14(b) differ only by their signs, the optimized butterflies in Figure 4-14(c) can be used. Notice that these optimized butterflies require two complex additions but only one complex multiplication, thus reducing our computational workload.†
† It’s because there are (N/2)log
2N butterflies in an N-point FFT that we said the number of complex multiplications performed by an
FFT is (N/2)log2N in Eq. (4-2).
We’ll often see the optimized butterfly structures of Figure 4-14(c) in the literature instead of those in Figure 4- 14(a). These optimized butterflies give us an easy way to recognize decimation-in-time and decimation-in- frequency algorithms. When we do come across the optimized butterflies from Figure 4-14(c), we’ll know that the algorithm is decimation-in-time if the twiddle factor precedes the −1, or else the algorithm is decimation-in- frequency if the twiddle factor follows the −1.
Sometimes we’ll encounter FFT structures in the literature that use the notation shown in Figure 4-15[5, 12]. These wingless butterflies are equivalent to those shown in Figure 4-14(c). The signal-flow convention in
Figure 4-15 is such that the plus output of a circle is the sum of the two samples that enter the circle from the left, and the minus output of a circle is the difference of the samples that enter the circle. So the outputs of the decimation-in-time butterflies in Figures 4-14(c) and 4-15(a) are given by
(4-29)
Figure 4-15 Alternate FFT butterfly notation: (a) decimation in time; (b) decimation in frequency.
The outputs of the decimation-in-frequency butterflies in Figures 4-14(c) and 4-15(b) are (4-30)
So which FFT structure is the best one to use? It depends on the application, the hardware implementation, and convenience. If we’re using a software routine to perform FFTs on a general-purpose computer, we usually don’t have a lot of choices. Most folks just use whatever existing FFT routines happen to be included in their commercial software package. Their code may be optimized for speed, but you never know. Examination of the software code may be necessary to see just how the FFT is implemented. If we feel the need for speed, we should check to see if the software calculates the sines and cosines each time it needs a twiddle factor. Trigonometric calculations normally take many machine cycles. It may be possible to speed up the algorithm by calculating the twiddle factors ahead of time and storing them in a table. That way, they can be looked up, instead of being calculated each time they’re needed in a butterfly. If we’re writing our own software routine, checking for butterfly output data overflow and careful magnitude scaling may allow our FFT to be performed using integer arithmetic that can be faster on some machines.
† Care must be taken, however, when using integer arithmetic; some Reduced Instruction Set Computer (RISC)
processors actually take longer to perform integer calculations because they’re specifically designed to operate on floating-point numbers.
† Overflow is what happens when the result of an arithmetic operation has too many bits, or digits, to be represented in the hardware
registers designed to contain that result. FFT data overflow is described in Section 12.3.
If we’re using commercial array processor hardware for our calculations, the code in these processors is always optimized because their purpose in life is high speed. Array processor manufacturers typically publicize their
products by specifying the speed at which their machines perform a 1024-point FFT. Let’s look at some of our options in selecting a particular FFT structure in case we’re designing special-purpose hardware to implement an FFT.
The FFT butterfly structures previously discussed typically fall into one of two categories: in-place FFT algorithms and double-memory FFT algorithms. An in-place algorithm is depicted in Figure 4-5. The output of a butterfly operation can be stored in the same hardware memory locations that previously held the butterfly’s input data. No intermediate storage is necessary. This way, for an N-point FFT, only 2N memory locations are needed. (The 2 comes from the fact that each butterfly node represents a data value that has both a real and an imaginary part.) The rub with the in-place algorithms is that data routing and memory addressing are rather complicated. A double-memory FFT structure is that depicted in Figure 4-10. With this structure, intermediate storage is necessary because we no longer have the standard butterflies, and 4N memory locations are needed. However, data routing and memory address control are much simpler in double-memory FFT structures than the in-place technique. The use of high-speed, floating-point integrated circuits to implement pipelined FFT architectures takes better advantage of their pipelined structure when the double-memory algorithm is used[13]. There’s another class of FFT structures, known as constant-geometry algorithms, that make the addressing of memory both simple and constant for each stage of the FFT. These structures are of interest to those folks who build special-purpose FFT hardware devices[4,14]. From the standpoint of general hardware the decimation-in- time algorithms are optimum for real input data sequences, and decimation-in-frequency is appropriate when the input is complex[6]. When the FFT input data is symmetrical in time, special FFT structures exist to eliminate unnecessary calculations. These special butterfly structures based on input data symmetry are described in the literature[15].
For two-dimensional FFT applications, such as processing photographic images, the decimation-in-frequency algorithms appear to be the optimum choice[16]. Your application may be such that FFT input and output bit reversal is not an important factor. Some FFT applications allow manipulating a bit-reversed FFT output sequence in the frequency domain without having to unscramble the FFT’s output data. Then an inverse transform that’s expecting bit-reversed inputs will give a time-domain output whose data sequence is correct. This situation avoids the need to perform any bit reversals at all. Multiplying two FFT outputs to implement convolution or correlation are examples of this possibility.† As we can see, finding the optimum FFT algorithm and hardware architecture for an FFT is a fairly complex problem to solve, but the literature provides guidance [4,17,18].
† See Section 13.10 for an example of using the FFT to perform convolution.
References
[1] Cooley, J., and Tukey, J. “An Algorithm for the Machine Calculation of Complex Fourier Series,” Math.
Comput., Vol. 19, No. 90, April 1965, pp. 297–301.
[2] Cooley, J., Lewis, P., and Welch, P. “Historical Notes on the Fast Fourier Transform,” IEEE Trans. on
Audio and Electroacoustics, Vol. AU-15, No. 2, June 1967.
[3] Harris, F. J. “On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform,”
Proceedings of the IEEE, Vol. 66, No. 1, January 1978, p. 54.
[4] Oppenheim, A. V., and Schafer, R. W. Discrete-Time Signal Processing, Prentice Hall, Englewood Cliffs, New Jersey, 1989, p. 608.
[5] Rabiner, L. R., and Gold, B. Theory and Application of Digital Signal Processing, Prentice Hall, Englewood Cliffs, New Jersey, 1975, p. 367.
[6] Sorenson, H. V., Jones, D. L., Heideman, M. T., and Burrus, C. S. “Real-Valued Fast Fourier Transform Algorithms,” IEEE Trans. on Acoust. Speech, and Signal Proc., Vol. ASSP-35, No. 6, June 1987.
[7] Evans, D. “An Improved Digit-Reversal Permutation Algorithm for the Fast Fourier and Hartley Transforms,” IEEE Trans. on Acoust. Speech, and Signal Proc., Vol. ASSP-35, No. 8, August 1987.
[8] Burrus, C. S. “Unscrambling for Fast DFT Algorithms,” IEEE Trans. on Acoust. Speech, and Signal Proc., Vol. 36, No. 7, July 1988.
[9] Rodriguez, J. J. “An Improved FFT Digit-Reversal Algorithm,” IEEE Trans. on Acoust. Speech, and Signal
Proc., Vol. ASSP-37, No. 8, August 1989.
[10] Land, A. “Bit Reverser Scrambles Data for FFT,” EDN, March 2, 1995.
[11] JG-AE Subcommittee on Measurement Concepts, “What Is the Fast Fourier Transform?,” IEEE Trans. on
[12] Cohen, R., and Perlman, R. “500 kHz Single Board FFT System Incorporates DSP Optimized Chips,”
EDN, October 31, 1984.
[13] Eldon, J., and Winter, G. E. “Floating-Point Chips Carve Out FFT Systems,” Electronic Design, August 4, 1983.
[14] Lamb, K. “CMOS Building Blocks Shrink and Speed Up FFT Systems,” Electronic Design, August 6, 1987.
[15] Markel, J. D. “FFT Pruning,” IEEE Trans. on Audio and Electroacoustics, Vol. AU-19, No. 4, December 1971.
[16] Wu, H. R., and Paoloni, F. J. “The Structure of Vector Radix Fast Fourier Transforms,” IEEE Trans. on
Acoust. Speech, and Signal Proc., Vol. ASSP-37, No. 8, August 1989.
[17] Ali, Z. M. “High Speed FFT Processor,” IEEE Trans. on Communications, Vol. COM-26, No. 5, May 1978.
[18] Bergland, G. “Fast Fourier Transform Hardware Implementations—An Overview,” IEEE Trans. on Audio
and Electroacoustics, Vol. AU-17, June 1969.