2. Caracterización de Trichoderma
3.4. Identificación Molecular a nivel de especie y sección
3.4.2. Ribotipaje del Espaciador Transcrito Interno 1 me-
In this section, the methodology of evaluating a range of objective measures will be pre- sented. The principal is that evaluation of different objective measures can be compared through resynthesis. By using the objective measure as a fitness function in an itera- tive synthesis process, the most effective measure that best encapsulates aspects of the perception of the sounds can be identified. The aim of the optimisation is to interpret which objective similarity measure correlates with the perceptual similarity, evaluated through a listening test. Essentially, this will be used for comparison of different ob- jective similarity measures, by producing audio samples. Through subjective listening tests, the results of the objective measures can be validated and compared.
6.3.1 Parameter Optimisation
Given a specific synthesis method and a recording of a sound, the parameters of that synthesis model were selected to best reflect an original audio sample. This parameter selection was performed using a particle swarm optimisation (PSO) technique. PSO is an evolutionary inspired, population based, optimisation technique in which a swarm of particles iteratively propagate in a search space, where a weighting between individual and global preferences is modelled. Each particle is evaluated with a ‘fitness’ function, which, in this case, is a computational objective similarity measure. This fitness function was used to compare each objective function presented in Section 6.3.3, given a range of audio samples.
Given a single input audio file, and a specific synthesis method, an iterative approach was undertaken. The aim of this iterative approach was to find the set of synthesis parameters that match the closest possible sound the synthesis method can make to the original input audio file.
This is demonstrated in Figure 6.1. An initial input audio file is analysed, and this is used to measure the similarity between a synthesised example and the input audio file. The synthesis engine parameters were initialised at random values, so the sample is likely to be very different. The PSO algorithm then makes suggestions for how to modify the
Chapter 6 Objective Evaluation Metric for Synthesised Environmental Sounds 87
Figure 6.1: Flow Diagram of Synthesis Optimisation Approach
synthesis parameters, which are passed to the synthesis engine. The new audio sample is then compared, using the objective similarity measure or fitness function, to compare how similar the sounds are. Once the PSO algorithm believes that a global minima has been found, the algorithm stops, and reports the best synthesis parameters and the output audio example.
PSO is a method of optimisation, inspired by evolutionary biology and social behaviour of a flock of birds or swarm of insects, and is an effective optimisation method for highly nonlinear search spaces. There are many examples of evolutionary algorithms applied to audio research [Garcia, 2001b; M¨akinen et al., 2012; McDermott et al., 2008; Ronan et al., 2018; Yee-King and Roth, 2011]. PSO works on the basis of a large number of particles, each of which is a potential solution, which can sample the search space. Each particle can then evaluate its effectiveness and both the local and global results are considered when directing the particle in which direction to move. A comprehensive overview of PSO is presented by Marini and Walczak [2015].
6.3.2 Sound Synthesis Methods
Four different sound effects were used for evaluation purposes. All of them are avail- able and hosted online as part of the FXive synthesis platform [Bahadoran et al., 2017,
Chapter 6 Objective Evaluation Metric for Synthesised Environmental Sounds 88
2018a].1 The FXive platform is a free online hosted web service for sound synthesis, created at Queen Mary University. This system was used, as it contains a large number of synthesis methods which are all openly accessible through a simple URL (Uniform Resource Locator) interface, which allowed for a simple and standard method for in- teraction. Each of the selected synthesis methods have a limited number of control parameters, and are all designed to produce a sound of a specific type. All approaches used within this chapter were originally derived from work by Farnell [2010] and are all examples of physically inspired synthesis.
Fire The fire synthesis model2 is a noise shaping synthesis method. Individual sonic components of a fire, the hiss, crackle and lapping, are all modelled through filtered and envelope shaped noise signals. Three control parameters are exposed to the user, which are lapping, hissing and crackling.
Rain In the rain model,3 components of rain are broken into a number of categories. Ambience, which is modelled as constant shaped noise, droplets, rumble and drips. Three control parameters are exposed to the user, which are density, rumble and ambience.
Stream The stream4 is modelled entirely on the bubbling sounds that are made as water runs over substances, based on control of filtered chirp sounds. Three control parameters are exposed to the user, which are bubbles, frequency and filter Q. Wind The wind model5 uses a varying filtered noise approach, where wind parameters
control the overall envelope of the sound. Different wind hitting materials, such as door or branches/wires, select the timesteps over which the wind envelope shap- ing will occur. Ten parameters are exposed to the user: Wind Speed, Gustiness, Squall, Buildings, Doorways, Branches, Leaves, Pan, Directionality and Gain. The parameters Pan, Directionality and Gain were all left constant at their default values, as discussed in Section 6.3.2.
1 https://fxive.com/ 2https://fxive.com/app/main-panel/fire.html 3 https://fxive.com/app/main-panel/rain.html 4 https://fxive.com/app/main-panel/stream.html 5https://fxive.com/app/main-panel/wind.html
Chapter 6 Objective Evaluation Metric for Synthesised Environmental Sounds 89
Parameters Not Changed
Several parameters were not used, to limit the search space and as these parameters were considered to make no immediate impact to the synthesis of the sound. During analysis, all samples were loudness normalised, so output gain controls were redundant. As no evaluation metric used spatial aspects to evaluate synthesis, pan controls were also not considered. With each sound effect, there was the ability to apply a range of audio effects, including equalisation, distortion, delay, convolution reverb and HRTF spatialisation. However, because all of these controls can be added to every single synthesised sample, it was felt this would significantly grow the search space without significant improvements in the synthesis. The impact of individual audio effects on the perceived realism of a synthesised sound is out of the scope of this work.
6.3.3 Objective Function
The set of fitness functions, or objective functions, were taken from literature. Each of the objective measures describe a set of audio features that are used to summarise the audio sample, as a set of audio statistics. The distance measure between sets of audio samples was constructed as the euclidian distance between these audio feature sets. This measure was then used as the fitness function for the PSO, described in Section 6.3.1. The audio features that make up each objective measure are described in Table 6.1. To standardise implementations, all audio features were extracted using Essentia [Bogdanov et al., 2013], based on recommendations from Chapter 3, and in Moffat et al. [2015]. The Allamanche et al. [2001] distance measure was first designed for measuring similarity of musical content, and is included as part of the MPEG-7 standard list of low level descriptors. Gygi et al. [2007] produced a set of audio features to apply to environmental sounds, both for measuring similarity and performing categorisation. Moffat et al. [2017] produced a set of reduced audio features to perform hierarchical classification of an audio effects library, where the audio features were selected through a machine learning feature selection approach, as discussed in Chapter 4. The Wichern et al. [2007] feature set was produced for indexing and segmenting natural sound environments. MFCCs
Chapter 6 Objective Evaluation Metric for Synthesised Environmental Sounds 90
Table 6.1: Attributes of Each Objective Function
Objective Function Features and Attributes Allamanche [Allamanche et al., 2001] Loudness
Spectral Flatness Spectral Crest Factor Gygi [Gygi et al., 2007] Envelope Statistics
Pitch
Autocorrelation Waveform Peaks Spectral Centroid
Spectral Moments Frequency Band Energy
Modulation Statistics Subband Correlation
Spectral Flux MFCC [Yee-King and Roth, 2011] MFCC
Moffat [Moffat et al., 2017] Loudness Pitch MFCC Envelope Statistics
Spectral Contrast Spectral Flux PEAQ [Thiede et al., 2000] Signal Bandwidth
Masking Content Modulation Difference
Distortion Harmonic Structure Wichern [Wichern et al., 2007] Loudness
Spectral Centroid Spectral Sparsity
Harmonicity Temporal Sparsity Transient Index (∆MFCC)
were used for parameter optimisation of musical tones, and are generally considered as a good measure of timbre [Pachet and Aucouturier, 2004; Yee-King and Roth, 2011]. This method was intended to be a baseline measure, through which to compare each synthesis method. PEAQ [Thiede et al., 2000] (Perceptual Evaluation of Audio Quality) is an evaluation method specifically designed to measure the sample by sample difference in audio samples. This method was first designed to measure the impact different audio compression algorithms will have, such as comparing MP3 to WAV.
Chapter 6 Objective Evaluation Metric for Synthesised Environmental Sounds 91