• No se han encontrado resultados

CAPÍTULO 2. HERRAMIENTAS Y CONDICIONES DE SIMULACIÓN

2.1 Simulación de redes vehiculares

2.1.1 Tipos de simuladores

The core application of the thesis is an investigation into the conditional independence rela- tionships between 5 Romance languages. The main analysis is given in Section7.1though the description of the data set is given here along with the preprocessing steps. The data set com- prises audio recordings originating from speakers of one of five different Romance languages: French, Italian, Portuguese, Spanish (American), and Spanish (Iberian) — while two dialects of Spanish are being used in this study, they are treated as different spoken languages in this anal- ysis as the interest is in pronunciation rather than textual representation, the difference between “dialect” and “language” being a matter of degree of difference rather than an absolute quanti- tative difference. Each recording is of some individual saying an integer from ‘one’ to ‘ten’ in their particular language. In total there are 219 word recordings and each can be classified by the language, the gender of the speaker and the number being spoken. Observations of the same word being spoken in different languages are treated as sharing the same word attribute. For example the word ‘four’ includes recordings of ‘quatre’ (French) and ‘quattro’ (Italian) as well the word ‘four’ in other languages. Integers were chosen because these have no ambiguity in terms of translation making comparison of their use across languages straightforward. Further- more, the cardinals ‘one’ to ‘ten’ of Romance languages (among many other words) stem from shared Latin forms [Price,1992]. This suggests that these words might also be suitable when comparing languages acoustically.

As mentioned, the observations are modelled as functional data as is becoming increasingly common in studies involving sound recordings (e.g.Holan et al.[2010]). Such models make the reasonable assumption that the data have been obtained by observing an underlying function at finitely many discrete points along a continuum, and that this underlying function is smooth (i.e. a certain number of derivatives exist).

3.3.1 Romance data set pre-processing

The data set used in the final analysis had already been preprocessed with the full description given inHadjipantelis[2013, Chapter 6]. The original acoustic data set originated from a num- ber of sources and the specifications of the recordings differed across these sources. Therefore, the audio recordings were resampled at a rate of 16000 samples per second to make the obser- vations comparable for processing. A short-time (10ms window) Fourier transform was taken of each audio recording to produce a spectrogram. A spectrogram is a two-dimensional repre- sentation of audio signal energy intensity in frequency-time space [Fulop,2011]. Spectrograms are a natural choice for representing power with functional data [Holan et al.,2010,Martinez et al.,2013], though approaches such as Mel-frequency cepstra can provide possible alternative representations [Davis and Mermelstein,1980]. The value stored at a frequency-time point is a function of the power (or amplitude). A10 log10(·)transformation of the original power was taken so that units of power are decibels.

InHolan et al.[2010], spectrograms of mating calls are used as predictors of mating success of treehoppers. Martinez et al.[2013] investigate regional differences in bat chirps by considering a functional mixed model with spectrograms as the image response. In contrast, the emphasis of this analysis will not be to seek a model that acts as the data generating process. Instead it aims to identify meaningful low dimensional representations of spectrograms that highlight dif- ferences between languages, and subsequently assess whether these distinctive acoustic features are compatible with the class of GLTM.

Frequencies were binned every 100Hz up to the Nyquist frequency of 8000Hz. The resulting spectrograms were stored as matrices of 81 frequency by 100 time points. These spectrograms were still distorted in two mains ways: firstly, the data was undoubtedly noisy (amplitude dis- tortions) and secondly there were phase distortions. The amplitude distortion is a common feature of many data sets and can be considered as an error term having been added to the power recording at every frequency-time point. The time distortion was the result of the overall du- ration of a word varying significantly per speaker and furthermore the timings of intra-word elements (for instance syllables). To adjust for amplitude distortion the spectrograms under- went a smoothing algorithm aimed at removing noise. This is consistent with the smoothness

assumption inherent to the functional data framework. A penalised least squares filtering ap- proach was used to smooth the data. Roughness was penalised using second-order difference and the unsmoothed data underwent a discrete cosine transformation following the algorithm in

Garcia[2010]. Having smoothed the data, the remaining unadjusted distortion was in the time dimension. The available techniques to deal with differences in the phase of curves are known as curve alignment, curve registration or warping (seeLucero and Koenig[2000],Ramsay and Silverman[2005]). The method used on this data set was based on the pairwise synchronisation as described inTang and M¨uller[2008] with adjustments for the two dimensional nature of the data. Although the warping of the spectrograms was only occurring in the time dimension, the frequency information was required for calculating discrepancy between spectrograms. These time-phase adjustments were performed on a word-gender basis as there are known differences in frequency ranges spoken by male and female speakers. As part of this process, the word du- rations were all standardised to have the same arbitrary length. In our analysis we refer to this as “standardised time” across a range of 0 to 100. Figure3.3is a spectrogram (post pre-processing) of a female French speaker saying the word ‘quatre’. Broadly, this interpolated plot indicates that there is greater power in the lower frequencies, and that the beginning and the end portions of the standardised time period are quieter.

FIGURE3.3: Post-registration spectrogram of female French speaker saying ‘quatre’. It can be seen that there is greater power in the lower frequencies, and that the very beginning and end

3.3.2 Notation

The underlying function of each spectrogram is denoted xd,gl,m(f, t) with the two dimensions

f andtreferring to frequency and time respectively. Recall that each spectrogram is derived from a spoken word — the subscripts and superscripts encode observational information: l = 1, . . . , nl denotes the language being spoken;d = 1, . . . , ndindicates the word being spoken;

m= 1, . . . , mldis a counter wheremld is the number of observations of worddfrom language

l;grefers to the gender of the speaker.

It is well documented that there are differences in the acoustics of male and female speakers which go beyond a simple shift in the spoken frequencies (for instanceNittrouer et al.[1990],

P´epiot[2013]).Parris and Carey[1996] present a statistical method for discriminating between speaker gender of short acoustic recordings. In their analysis of seven Indo-European languages (of which Romance is a subset), gender was correctly identified on average 98% of the time. This suggests that there are commonalities in acoustic gender differences across Indo-European languages. In light of this result, it is judged that gender should be adjusted for at the macro level:

xdl,m(f, t) =xl,md,g(f, t) + ˜xg(f, t)

wherex˜g is the difference between the mean of all samples with genderg and the mean of all samples. Henceforth it will be the gender adjusted function that will be the object of interest in the thesis.

The mean spectrograms for languagel, worddare defined in (3.3.1), for languagelin (3.3.2), and the grand mean spectrogram in (3.3.3).

¯ xdl(f, t) = 1 mld mld X m=1 xdl,m(f, t) (3.3.1) ¯ xl(f, t) = 1 ml· nd X d=1 mldx¯dl(f, t) (3.3.2) ¯ x(f, t) = 1 m·· nl X l=1 ml·x¯l(f, t) (3.3.3) whereml· = Pnd

d=1mld,m·· = n =Pnl=1l ml·, and fort ∈ T,f ∈ F. The parametersm··

To get a further feel for the data, we plot the mean spectrograms by gender (Figure3.4). Note that although it appears that the higher frequencies have more power for males, this is not sug- gesting males speak at a higher pitch, just that the male recordings tend to be louder in general. Recall that this gender effect is adjusted for throughout the rest of the analyses.

FIGURE3.4: Mean spectrograms by gender.

For even more detail, we plot the mean spectrograms for each language-number combination (Figure 3.5). Observe that it can be seen that certain words have two clear syllables whereas others just one (e.g. the number seven: French “sept” versus Italian “sette”).

Documento similar