• No se han encontrado resultados

3.13 Análisis intertextual del texto literario Caperucito Azul, de Hernán Rodríguez Castelo

3.13.5 Acción N° 4: Encuentro con el Dr Armin Herrmarchen

3.13.5.1 Relaciones actanciales

likeli-from the McGill database. As this set of notes are only for sustained note instruments and do not include piano or guitar notes, we only use the Gabor model with hamming basis functions for our comparison.

Additionally we have prepared an implementation of the auditory model based system of Klapuri [2008]

as a state-of-the-art polyphonic transcription to compare against. We present our results in Table 5.3 on page 76. The results show an improvement in transcription accuracy for two note mixtures when using the analytical representation, and we suggest that this is due to the improved estimation of higher partial frequency structure demonstrated in the previous section. However, for three and four note mixtures, the performance is not appreciably dierent to prior work of Davy et al. [2006], although there is an improvement in accuracy over the auditory model system for 4 note mixtures. We suggest that as the inference algorithm is mostly identical, the inference algorithm is limiting the performance in these cases. During this work, we took the opportunity to study the reversible jump MCMC algorithm in progress, observing the current state of the tted model after each iteration. We observed that in many of the situations where transcription errors occurred, the algorithm did reach the correct conguration of notes at some point, but was not able to sustain this state due to inaccuracies in the frequency estimates.

partial frequencies present were detected and modelled. We saw that using the analytic representation of the signal resulted in signicantly more partials being detected, and conclude that the reduction in the ambiguity of instantaneous phase and amplitude aorded by this representation is benecial for signal model based inference methods. We also present polyphonic transcription results for the case where the number of notes is known, and compare with prior work. We found that the new models improve transcription performance for two-note mixtures when compared to prior work, but the performance for more complicated mixtures is limited by the inference algorithm's ability to accurately estimate to the extent that spurious partial detec-tions are avoided. The benets of increasing the accuracy of partial frequency estimation are shown in the next chapter, where we simplify the polyphonic inference algorithm to a two-stage process, estimating the partial frequencies rst, and inferring the harmonic structure secondly, to improve transcription accuracy for higher number of notes, and also correctly estimate the number of notes playing in the mixture.

Chapter 6

Multiple Pitch Estimation using

Non-homogeneous Poisson Processes

Point estimates of the parameters of partial frequencies of a musical note are modelled as realizations from a non-homogeneous Poisson process dened on the frequency axis. When several notes are combined, the processes for the individual notes combine to give a new Poisson process whose likelihood is easy to compute.

This model avoids the data association step of linking the harmonics of each note with the corresponding par-tials and is ideal for ecient Bayesian inference of unknown multiple fundamental frequencies in a polyphonic mixture of notes.

6.1 Introduction

By observing the periodogram of a polyphonic mixture of notes, a trained observer can estimate the partial frequencies present in the signal from the localized peaks in the spectrum, and then suggest fundamental frequencies by observing that some of the partial frequencies are regularly spaced along the frequency axis.

For example, peaks in the spectrum at 440, 880, 1320 Hz and so on suggest a fundamental frequency of 440 Hz.

In the author's experience, using the periodogram to transcribe mixtures of notes is more reliable and quicker than listening to the mixture. This method also outperforms automated transcription systems such as the signal models described in the previous chapter and state-of-the-art auditory systems, especially avoiding octave errors which plague other systems. One of the goals of this chapter is to investigate and propose models for this method in order to improve the accuracy of polyphonic transcription.

Two assumptions about the transcription process are made. The rst is that the observer does not change his or her estimates of the partial frequencies when attempting to nd a set of notes which ts the observations. In plain terms, the observer is trying to t a model to the observations, incorporating errors in the partial frequency estimates into the prior, rather than tting the observations to the model. This motivates a two-stage process where the partial frequencies are estimated rst, and then a harmonic model is tted to the frequencies. A prior model on the partial frequencies is still required however, as the observer may know the range of fundamental frequencies that can be produced by the instrument for example, but

this prior must also be dened when the number of notes in the mixture is not known.

The second assumption is that the spectral shape in the vicinity of a peak is important to the estimation of partial frequencies, whereas only the frequencies and sometimes the amplitudes of the partials are required for transcription. The spectral shape sometimes allows us to distinguish between merged harmonics of two or more notes. There are various cases where simply picking peaks of the spectrum above an adaptive noise oor is inadequate, and these cases are often the cause of transcription errors. The notes of chords in music often have overlapping harmonics, which may not be manifested as separate peaks but to the observer are obvious because of dierences in spectral shape. The spectral shape also helps distinguish between noise or artifacts in the signal and genuine partial frequencies, reducing spurious detections of partials which can lead to over or under-reporting of the number of notes playing. We will use an explicit signal model with a prior on the expected spectral shape of harmonic notes to accurately estimate partial frequencies.

We do not assume that the partial estimation procedure is perfect however, and therefore need a tran-scription system which is capable of dealing both with missed and duplicated partial detections. The solution we present in this chapter is to use an iterative algorithm based on the signal model presented in the previous chapter to provide high quality estimates of the partial frequencies, and to model the prior on the frequency estimates as a non-homogeneous Poisson process. Choosing to use a signal model rather than a heuristic estimation scheme for the partial frequency estimation is advantageous as present and future improvements to that model will also benet the estimation procedure here. However, it is also permissible to use other methods to estimate the partial frequencies, as was carried out previously using periodogram peak picking [Peeling et al., 2007b] and subspace methods [Peeling et al., 2007a]. In these cases, the prior on the frequen-cies needs to reect the estimation procedure, for example including a uniform clutter process across the frequency axis if many spurious partials are detected.

The structure of this chapter is as follows. In Section 6.2 we introduce the properties of non-homogeneous Poisson processes and how to calculate the likelihood given a set of observed frequencies. In Section 6.3 priors for harmonic models are discussed, and suggestions for how these priors should be modied for dierent partial estimation methods are given. In Section 6.4 a general method for making partial estimates from a signal model is presented. Transcription results for polyphonic mixtures of notes are presented in Section 6.5 and are compared with the previous chapter and prior work. Conclusions and suggestions for future research are given in Section 6.6.