Fundamental to microsound synthesis is the recognition of the continuum be- tween rhythm (the infrasonic frequencies) and pitch (the audible frequencies). This idea was central to what the poet and composer Ezra Pound called the theory of the ``Great Base'' (Pound 1934). In 1910 he wrote:
Rhythm is perhaps the most primal of all things known to us . . . Music is, by further analysis, pure rhythm; rhythm and nothing else, for the variation of pitch is the variation in rhythms of the individual notes, and harmony, the blending of these varied rhythms. (Pound 1910, in Schafer 1977)
Pound proposed the Great Base theory in 1927:
You can use your beat as a third or fourth or Nth note in the harmony. To put it another way; the percussion of the rhythm can enter the harmony exactly as another note would. It enters usually as a Bassus . . . giving the main form to the sound. It may be convenient to call these di¨erent degrees of the scale the megaphonic and microphonic parts of the har- mony. Rhythm is nothing but the division of frequency plus an emphasis or phrasing of that division. (Pound 1927, in Schafer 1977)
In this theory, Pound recognized the rhythmic potential of infrasonic fre- quencies. The composer Henry Cowell also describes this relationship:
Rhythm and tone, which have been thought to be entirely separate musical fundamentals . . . are de®nitely related through overtone ratios. (Cowell 1930)
Later in his book he gives an example:
Assume that we have two melodies in parallel to each other, the ®rst written in whole notes and the second in half-notes. If the time for each note were to be indicated by the tapping of
a stick, the taps for the second melody would recur with double the rapidity of those of the ®rst. If now the taps were to be increased greatly in rapidity without changing the relative speed, it will be seen that when the taps for the ®rst melody reach sixteen to the second, those for the second melody will be thirty-two to the second. In other words, the vibrations from the taps of one melody will give the musical tone C, while those of the other will give the tone C one octave higher. Time has been translated, as it were, into musical tone. Or, as has been shown above, a parallel can be drawn between the ratio of rhythmical beats and the ratio of musical tones by virtue of the common mathematical basis of both musical time and musical tone. The two times, in this view, might be said to be ``in harmony,'' the sim- plest possible. . . . There is, of course, nothing radical in what is thus far suggested. It is only the interpretation that is new; but when we extend this principle more widely we begin to open up new ®elds of rhythmical expression in music. (Cowell 1930)
Cowell formulated this insight two decades before Karlheinz Stockhausen's temporal theory, explained later in this chapter.
Temporal Continuity in Perception
Inherent in the concept of microsound is the notion that sounds on the object time scale can be broken down into a succession of events on a smaller time scale. This means that the apparently continuous ¯ow of music can be consid- ered as a succession of frames passing by at a rate too fast to be heard as dis- crete events. This ideal concept of time division is ancient (consider Zeno of Elea's four paradoxes). It could not be fully exploited by technology until the modern age.
In the visual domain, the illusion of cinemaÐmotion picturesÐis made pos- sible by a perceptual phenomenon known as persistence of vision. This enables a rapid succession of discrete images to fuse into the illusion of a continuum. Persistence of vision was ®rst explained scienti®cally by P. M. Roget in 1824 (Read and Welch 1977). W. Fritton demonstrated it with images on the two sides of a card: one of a bird, the other of a cage. When the card was spun rapidly, it appeared that the bird was in the cage (de Reydellet 1999).
The auditory analogy to persistence of vision is the phenomenon of tone fusion induced by the forward masking e¨ect, described in chapter 1.
Throughout the nineteenth century, slow progress was made toward the de- velopment of more sophisticated devices for the display of moving images. (See Read and Welch 1977 for details.) A breakthrough, however, did come in 1834 with W. G. Horner's Zoetrope (originally called the Daedelum). The Zoetrope took advantage of persistence of vision by rotating a series of images around a
®xed window ®tted with a viewing lens. Depending on the speed of rotation, the image appeared to move in fast or slow motion.
After the invention of celluloid ®lm for photography, the ubiquitous Thomas Alva Edison created the ®rst commercial system for motion pictures in 1891. This consisted of the Kinetograph camera and the Kinetoscope viewing system. Cinema came into being with the projection of motion pictures onto a large screen, introduced by the LumieÁre brothers in 1895.
In 1889 George Eastman demonstrated a system which synchronized moving pictures with a phonograph, but the ``talking picture'' with optical soundtrack did not appear until 1927. An optical sound track, however, is not divided into frames. It appears as a continuous band running horizontally alongside the succession of vertical image frames.
In music, automated mechanical instruments had long quantized time into steps lasting as little as a brief note. But it was impossible for these machines to operate with precision on the time scale of microsound. Electronics technology was needed for this, and the modern era of microsound did not dawn until the acoustic theory and experiments of Dennis Gabor in the 1940s.
The Gabor Matrix
Inherent in the concept of a continuum between rhythm and pitch is the notion that tones can be considered as a succession of discrete units of acoustic energy. This leads to the notion of a granular or quantum approach to sound, ®rst proposed by the British physicist Dennis Gabor in a trio of brilliant papers. These papers combined theoretical insights from quantum physics with practi- cal experiments (1946, 1947, 1952). In Gabor's conception, any sound can be decomposed into a family of functions obtained by time and frequency shifts of a single Gaussian particle. Another way of saying this is that any sound can be decomposed into an appropriate combination of thousands of elementary grains. It is important to emphasize the analytical orientation of Gabor's theory. He was interested in a general, invertible method for the analysis of waveforms. As he wrote in 1952:
The orthodox method [of analysis] starts with the assumption that the signal s is a func- tion s(t) of time t. This is a very misleading start. If we take it literally, it means that we have a rule of constructing an exact value of s(t) to any instant of time t. Actually we are never in a position to do this. . . . If there is a bandwidth W at our disposal, we cannot mark time any more exactly than by a time-width of the order 1/W; hence we cannot talk physically of time elements smaller than 1/W. (Gabor 1952, p. 6)
Gabor took exception to the notion that hearing was well represented by Fourier analysis of in®nite signals, a notion derived from Helmholtz (1885). As he wrote:
Fourier analysis is a timeless description in terms of exactly periodic waves of in®nite du- ration. On the other hand it is our most elementary experience that sound has a time pat- tern as well as a frequency pattern. . . . A mathematical description is wanted which ab ovo takes account of this duality. (Gabor 1947, p. 591)
Gabor's solution involved the combination of two previously separated dimensions: frequency and time, and their correlation in two new representa- tions: the mathematical domain of acoustic quanta, and the psychoacoustical domain of hearing. He formed a mathematical representation for acoustic quanta by relating a time-domain signal s t to a frequency-domain spectrum S f . He then mapped an energy function from s t over an ``e¨ective dura- tion'' Dt into an energy function from S f over an ``e¨ective spectral width'' D f to obtain a characteristic cell or acoustic quantum. Today one refers to analyses that are limited to a short time frame as windowed analysis (see chapter 6). One way to view the Gabor transform is to see it as a kind of collection of localized Fourier transforms. As such, it is highly useful for the analysis of time-varying signals, such as music.
Gabor recognized that any windowed analysis entails an uncertainty relation between time and frequency resolution. That is, a high resolution in frequency requires the analysis of a large number of samples. This implies a long time window. It is possible to pinpoint speci®c frequencies in an analyzed segment of samples, but only at the cost of losing track of when exactly they occurred. Conversely, it is possible to pinpoint the temporal structure of audio events with great precision, but only at the cost of giving up frequency precision. This relation is expressed in Gabor's formula:
Dt D f b 1
For example, if the uncertainty product is 1 and Dt is 10 ms (or 1/100 Hz), then D f can be no less than 100 Hz. Another way of stating this is: to resolve frequencies to within a bandwidth of 100 Hz, we need a time window of at least 10 ms.
Time and frequency resolution are bound together. The more precisely we ®x one magnitude, the more inexact is the determination of the other.
Gabor's quanta are units of elementary acoustical information. They can be represented as elementary signals with oscillations at any audible frequency f ,
modulated by a ®nite duration envelope (a Gaussian curve). Any audio signal fed into a Gabor analyzer can be represented in terms of such signals by expand- ing the information area (time versus frequency) into unit cells and associating with each cell an amplitude factor (®gure 2.2). His formula for sound quanta was:
g t eÿa2 tÿt02
e2pjf0t 1
where
Dt p1=2=a and D f a=p1=2
The ®rst part of equation 1 de®nes the Gaussian envelope, while the second part de®nes the complex sinusoidal function (frequency plus initial phase) within each quantum.
The geometry of the acoustic quantum Dt D f depends on the parameter a, where the greater the value of a, the greater the time resolution at the expense of the frequency resolution. (For example, if a 1:0, then Dt 1:77245, and D f 0:56419. Setting the time scale to milliseconds, this corresponds to a time window of 1.77245 ms, and a frequency window of 564.19 Hz. For a 2:0, Dt would be 0.88 ms and D f would be 1128.38 Hz.) The extreme limiting cases of the Gabor series expansion are a time series (where Dt is the delta function d), and the Fourier series (where Dt y).
Gabor proposed that a quantum of sound was a concept of signi®cance to the theory of hearing, since human hearing is not continuous and in®nite in resolution. Hearing is governed by quanta of di¨erence thresholds in fre- quency, time, and amplitude (see also Whit®eld 1978). Within a short time win- dow (between 10 and 21 ms), he reasoned, the ear can register only one distinct sensation, that is, only one event at a speci®c frequency and amplitude.
Gabor gave an iterative approximation method to calculate the matrix. By 1966 Helstrom showed how Gabor's analysis/resynthesis approximation could be recast into an exact identity by turning the elementary signals into orthogo- nal functions. Bacry, Grossman, and Zak (1975) and Bastiaans (1980, 1985) veri®ed this hypothesis. They developed analytic methods for calculating the matrix and resynthesizing the signal.
A similar time-frequency lattice of functions was also proposed in 1932 in a di¨erent context by the mathematician John von Neumann. It subsequently became known as the von Neumann lattice and lived a parallel life among quantum physicists (Feichtinger and Strohmer 1998).
Electro-optical and Electromechanical Sound Granulation
Gabor was also an inventor, and indeed, he won the Nobel Prize for the in- vention of holography. In the mid-1940s, he constructed a sound granulator based on a sprocketed optical recording system adapted from a 16 mm ®lm projector (Gabor 1946). He used this ``Kinematical Frequency Convertor'' to make pitch-time changing experimentsÐchanging the pitch of a sound without changing its duration, and vice versa.
Working with Pierre Schae¨er, Jacques Poullin constructed another spin- ning-head device, dubbed the PhonogeÁne, in the early 1950s (Schae¨er 1977, pp. 417±9, 427±8; Moles 1960). (See also Fairbanks, Everitt, and Jaeger 1954 for a description of a similar invention.) Later, a German company, Springer, made a machine based on similar principles, using the medium of magnetic tape and several spinning playback heads (Morawaska-BuÈngler 1988; Schae¨er 1977, pp. 427±8). This device, called the Zeitregler or Tempophon, processed speech sounds in Herbert Eimert's 1963 electronic music composition Epitaph fuÈr Aikichi Kuboyama (recorded on Wergo 60014).
The basic principle of these machines is time-granulation of recorded sounds. In an electromechanical pitch-time changer, a rotating head (the sampling head) spins across a recording (on ®lm or tape) of a sound. The sampling head spins in the same direction that the tape is moving. Because the head only contacts the tape for a short period, the e¨ect is that of sampling the sound on the tape at regular intervals. Each of these sampled segments is a grain of sound.
In Gabor's system, the grains were reassembled into a continuous stream on another recorder. When this second recording played back, the result was a more-or-less continuous signal but with a di¨erent time base. For example, shrinking the duration of the original signal was achieved by slowing down the rotation speed of the sampling head. This meant that the resampled recording contained a joined sequence of grains that were formerly separated. For time expansion, the rotating head spun quickly, sampling multiple copies (clones) of the original signal. When these samples were played back as a continuous signal, the e¨ect of the multiple copies was to stretch out the duration of the
Figure 2.2 The Gabor matrix. The top image indicates the energy levels numerically. The middle image indicates the energy levels graphically. The lower image shows how the cells of the Gabor matrix (bounded by Dn, where n is frequency, and Dt, where t is time) can be mapped into a sonogram.
resampled version. The local frequency content of the original signal and in particular of the pitch, is preserved in the resampled version.
To e¨ect a change in pitch without changing the duration of a sound, one need only to change the playback rate of the original and use the timescale modi- ®cation just described to adjust its duration. For example, to shift the pitch up an octave, play back the original at double speed and use time-granulation to double the duration of the resampled version. This restores the duration to its original length. Chapter 5 looks at sound granulation using digital technology.
Meyer-Eppler
The acoustician Werner Meyer-Eppler was one of the founders of the West Deutscher Rundfunk (WDR) studio for electronic music in Cologne (Morawska-BuÈngler 1988). He was well aware of the signi®cance of Gabor's research. In an historic lecture entitled Das Klangfarbenproblem in der elektro- nischen Musik (``The problem of timbre in electronic music'') delivered in August 1950 at the Internationale Ferienkurse fuÈr Neue Musik in Darmstadt, Meyer-Eppler described the Gabor matrix for analyzing sounds into acoustic quanta (Ungeheuer 1992). He also presented examples of Oskar Fischinger's animated ®lms with their optical images of waveforms as the ``scores of the future.'' In his later lecture Metamorphose der Klangelemente, presented in 1955 at among other places, Gravesano, Switzerland at the studio of Hermann Scherchen, Meyer-Eppler described the Gabor matrix as a kind of score that could be composed with a ``Mosaiktechnik.'' In his textbook, Meyer-Eppler (1959) described the Gabor matrix in the context of measuring the information content of audio signals. He de®ned the ``maximum structure content'' of a signal as a physical measurement
K 2 W T
where W is the bandwidth in Hertz and T is the signal duration. Thus for a signal with a full bandwidth of 20 kHz and a duration of 10 seconds, the maximum structure content is 2 20000 10 400;000, which isÐby the sam- pling theoremÐthe number of samples needed to record it. He recognized that aural perception was limited in its time resolution, and estimated that the lower boundary on perception of parameter di¨erences was of the order of 15 ms, about 1/66th of a second.
The concept of time-segmentation was central to his notion of systematic sound transformation (Meyer-Eppler 1960). For example, he described experi- ments with speech in which grains from one word could be interpolated into another to change its sense.
Moles
The physicist Abraham Moles (1960, 1968) was interested in applying Shan- non's information theory to aesthetic problems, particularly in new music (Galante and Sani 2000). Pierre Schae¨er hired him to work at the Groupe de Recherches Musicale (GRM). Signi®cantly, this coincided with Iannis Xenakis's residencies in the GRM studios (Orcalli 1993). Moles had read Meyer-Eppler's book. He sought a way to segment sound objects into small units for the pur- pose of measuring their information content. Following the Gabor matrix, he set up a three-dimensional space bounded by quanta in frequency, loudness, and time. He described this segmentation as follows:
We know that the receptor, the ear, divides these two dimensions [pitch and loudness] into quanta. Thus each sonic element may be represented by an elementary square. A pure sinusoidal sound, without any harmonics, would be represented by just one of these squares. . . . Because thresholds quantize the continua of pitch and loudness, the repertoire is limited to some 340,000 elements. Physically, these elements are smaller and denser to- ward the center of the sonic domain, where the ear is more acute. . . . In most cases each symbol [in a sonic message] is a combination of elements, that is, of a certain number of these squares. (Moles 1968)
Wiener
The MIT mathematician Norbert Wiener (the founder of cybernetics) was well aware of Gabor's theory of acoustic quanta, just as Gabor was well aware of Wiener's work. In 1951, Gabor was invited to present his acoustical quantum theory in a series of lectures at MIT (Gabor 1952).
Like Gabor, Wiener rejected the view (expounded by Leibniz in the eigh- teenth century) that time, space, and matter are in®nitely subdivisible or con- tinuous. He supported Planck's quantum theory principle of discontinuity in light and in matter. Wiener noted that Newton's model of deterministic physics was being replaced by Gibbsian statistical mechanicsÐa ``quali®ed indeter- minism.'' And like Gabor, he was skeptical of Fourier analysis as the best rep- resentation for music.
The frequency and timing of a note interact in a complicated manner. To start and stop a note involves an alteration of its frequency content which may be small but very real. A note lasting only a ®nite time is to be analyzed as a band of simple harmonic motions, no one of which can be taken as the only simple harmonic motion present. The considerations are not only theoretically important but correspond to a real limitation of what a musician can do. You can't play a jig in the lowest register of an organ. If you take a note oscillating at sixteen cycles per second and continue it only for one twentieth of a second, what you get is a single push of air without any noticeable periodic character. Just as in quantum theory, there is in music a di¨erence of behavior between those things belonging to small intervals of time and what we accept on the normal scale of every day. (Wiener 1964a, 1964b)
Going further, Wiener stressed the importance of recognizing the time scale of a