CAPÍTULO I: MARCO TEÓRICO
3. Redes ópticas pasivas con capacidad de gigabit GPON
The advantage of this method over the exact calculation method is that the rate functionλfmay be computed in advance for each binf before the partial frequencies are estimated, which reduces the computation required when evaluating the likelihood for multiple frames of music. Often the bins will coincide with the frequencies of the DFT used to estimate the partial frequencies.
6.2.3.3 Censored Frequencies
The partial estimation method used may only indicate that there is a partial in a frequency bin or not. An example is a single step peak picking scheme, which selects all the spectrum bins with amplitudes larger than neighbouring bins and above a noise threshold. It is possible that multiple frequencies are present within the region of the frequency axis covered by a single observation bin, for example in the case of overlapping harmonics. Although we have only `observed' at most one frequency per observation bin, we wish to allow for the possibility that more than one frequency could be present in each bin. This is useful in practice the rate function of the Poisson process is a superposition of the rate functions of harmonically related notes.
For every harmonic that overlaps within the region of a single bin, we would expect two or more partial frequencies to occur within that bin. Thus we are asserting that an observed peak in the spectrum implies the existence of multiple partial frequencies in that bin, and no observed peak implies that no partial frequencies were present in the bin.
For the observations to be valid as a Poisson process, when a peak is detected in a bin, we calculate the probability that one or more frequencies were observed in that bin, i.e.,p(Nf ≥1) = 1−p(Nf = 0) = 1−exp (−λf). When a peak is not detected in a bin, the probability is given byp(Nf = 0) = exp (−λf).
The likelihood over all the frequency bins is thus given by
F
Y
f=1
1−exp (−λf) peak observed in bin f
exp (−λf) no peak observed in binf (6.5)
The likelihood calculation in this case is the same as a set of Bernoulli trials with probability1−exp (−λf).
would be the fundamental frequency of the musical note.
Here however we pursue two designs of Bayesian prior which may be inferred tractably and are amenable to additional, higher level, prior structure.
6.3.1 Fixed Bins
When observations are grouped into xed bins, then the model parameters are a nite set of positive values λf. Eachλf is the intensity parameter of a Poisson distribution, for which the conjugate prior choice is the Gamma distribution:
p(λf) =G(αf, βf) (6.6)
The posterior distribution when observingNf is
p(λf|Nf) =G(αf+Nf, βf+ 1)
We can integrate the unknown λf to obtain a negative binomial (Pascal) distribution (Figure 6.2 on page 87)
p(Nf) = Γ (αf+Nf)
Nf!Γ (αf) pαff(1−pf)Nf (6.7)
pf = 1
βf+ 1
Figure 6.2 on page 87 shows the prior distribution on expected number of partialsλf(6.6) withαf = 2, βf = 1 and corresponding marginal distribution (6.7) on observed number of partialsNf.
The hyperparameters may be optimized for the purposes of training. For example, to train the hyperpa-rameters of the rate function for a particular musical instrument and pitch, we would useI example frames of data and estimate the partial frequencies in each frame, obtaining a set of observationsNf(1), . . . , Nf(I)for each frequency bin. The posterior of the rate function given these observations is
p
λf|Nf(1), . . . , Nf(I)
=G αf+
I
X
i=1
Nf(i), βf+I
!
(6.8)
and the hyperparameters can thus be set to new values: αf →αf+PI
i=1Nf(i)and βf →βf+I. The new values of the hyperparameters can now be used as the prior (6.6) for when new frames of data are observed, thus transparently incorporating training data into the Bayesian model.
6.3.2 Gaussian Mixture Model
In this section we model the entire rate function as a Gaussian mixture model (GMM). Modelling the rate function as a GMM is a convenient method to use prior information concerning the partial frequencies of harmonic instruments. The rate function is shaped by the probability density function of a Gaussian mixture
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
0 2 4 6 8 10
p(λf|αf, βf) p(Nf|αf, βf)
Figure 6.2: Prior on expected number of partials and marginal distribution of number of partials
model:
λ(f) =
H
X
h=1
chN µh, hσ2
(6.9)
ch≥0 ∀h (6.10)
H denotes the number of mixture components. A meaningful interpretation of the above model is thatH is the number of harmonic positions for a note with fundamental frequency f0, and that a single component of the mixture corresponds to a single harmonich. We assign
H =bfs
2f0c
wherefsis the sampling frequency. The means of the components are set to the expected harmonic positions µh ≈hf0 and be allowed to deviate from their ideal positions to account for inharmonicity. σ2 allows for further spread around the harmonics, which may occur with split peaks or modulations in the signal. Finally ch weights each harmonic, and we expect that low frequency partials have a higher probability of being detected and hence have higher values of weightingch.
Inference of the unknown parameters in a Gaussian mixture model involves introducing labels for each observation and using Expectation Maximization (EM). When we train our model by tting the parameters to the estimated partial frequencies of a set of frames of audio from a harmonic musical instrument with a particular pitch, the values forσ2are typically small so that there is negligible overlap between the mixture components for dierent harmonic positions. Figure 6.3 on page 88 is provided as an example of this, where
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04
0 500 1000 1500 2000 2500 3000 3500 4000
λ(f)
f
Figure 6.3: Intensity function for a musical note with fundamental frequency 440Hz parameterized using a Gaussian mixture model withH= 9 harmonics
we have chosen σ2 = 10−4, µh = hf0 and ch = 1/h as illustrative parameters. This assumption is also used in [Davy et al., 2006] for the model of the detuning parameters for each partial frequency. In practice, the Expectation step for training the GMM can be replaced with a K-means clustering step, where the component means are set to the expected harmonic positions, which reduces the amount of computation required for inferring the unknown parameters compared to the full Expectation step.
6.3.3 Model for mixture weights
The Gaussian mixture prior in the previous section allows for inharmonicity through the variance of each mixture component. Rather than inferring the unknown parameters of the mixture model, we can also set the parameters to particular values which match existing generative models of partial frequencies. The prior model of Godsill and Davy [2005], which is also used in Chapter 5 can be adapted as a Poisson process easily by interpreting the prior probability distribution over the number of partials and their frequencies as a counting process of the number of partials along the frequency axis. The number of partialsH per note is modelled as Poisson distributed
p(H) =Po(H|Λ) = ΛHe−Λ H!
where Λis the expected number of partials. The position of each partial frequency is normally distributed aroundhf0, where his the harmonic number andf0 is the fundamental frequency of the note.
To convert this to a Gaussian mixture model of the form (6.9), we note that each mixture weight ch
gives the expected number of partials aroundhf0. We interpret this as the probability under the generative model that the number of partials is greater or equal toh, hence following this model, the mixture weights
in (6.9) are given by
ch= 1−
h
X
m=1
Po(m−1|Λ)
!
pnote (6.11)
pnoteis the prior probability that the note is playing in the mixture, and is applied as a scaling to all of the mixture weights for that model. Ph
m=1Po(m−1|Λ)is the cumulative Poisson distribution of observing up toh−1partials. ch, when calculated by (6.11), gives the probability of observing a partial at that frequency under the prior model.
When we setµh =hf0 andσ2 to 3×10−8 we obtain the multiplicative inharmonicity model suggested in Godsill and Davy [2005].