The annotations of word boundaries and boundaries of unstressed portions in polysyllabic words allowed us to easily measure the whole word duration as well as the stressed syllable duration in milliseconds (note that in monosyllabic words, the values of these two variables are equal). However, the different subsets (defined by the combination of spoken language and speakers’ L1) within the dataset contained different lexical items, and therefore, their mean word durations were inevitably different. In order to reduce the variance due to the different lexical items involved, we inspected some additional durational measures, including the mean syllable duration (whole word duration divided by the number of syllables) and the mean unstressed syllable duration (duration of unstressed portion41 divided by the number of unstressed syllables42). The latter measure was, naturally, only relevant in polysyllabic words. Lastly, in order to capture the variability in durations of stressed and unstressed
41 This included all syllables except for the syllable with primary stress. 42 All syllables that did not have primary stress
syllables in polysyllabic words, we calculated a ratio of unstressed-to-stressed syllable duration using a formula:
ratiouns:str = duruns / durstr ,
where duruns is the mean duration of unstressed syllable and durstr is the duration of the
syllable with the primary stress. A similar approach has been used in a number of studies on lexical stress and rhythm in non-native production. For example, Lee at al. (2006) used a ratio of durations of the unstressed to the stressed vowel in a given word to investigate the realisation of English lexical stress by Japanese and Korean learners. Volín (2005) calculated a durational reduction coefficient as a ratio of the duration of the stressed vowel to the mean duration of the unstressed vowels within the word. This measure was applied in a small set of polysyllabic English words produced by natives, and Czech speakers judged as having a strong non-native accent. Furthermore, Trofimovich and Baker (2006) defined a measure of stress-timing as the ratio of mean unstressed syllable duration to mean stressed syllable duration in a study on acquisition of L2 suprasegmentals. Although using vowel durations to calculate ratios would possibly offer more reliable results than using durations of whole syllables, we preferred to avoid the difficulties associated with the detailed segmentation of spontaneous speech signal and only measured the syllable durations. As a result of that, the values of unstressed-to-stressed syllable duration ratio in particular word tokens reflect not only the amount of stress-induced durational contrast that we desire to measure, but also the complexity of the syllables occurring in the given words.
Apart from the durational measures, we intended to inspect vowel quality in the stressed syllable (in tokens containing monophthongs in their stressed syllable). The mean formant values (F1 and F2) in Bark were measured in the stable portion of the vowel, as annotated in the textgrid. To eliminate errors in automatic formant tracking, an additional semi-automatic method was used to detect any abrupt jumps between nearby values of the same formant, as determined by the Praat formant-tracking algorithm. The items where a jump of over 3 Bark within 12.5 ms was present in the formant
measurements were then manually checked, and the automatically obtained values were corrected to match the formants, as determined by a careful inspection of the spectrogram. The resulting formant values were used to calculate a measure of vowel spectral contrast, expressing a one-dimensional distance of a vowel to the vowel-space centre. To be able to calculate this value, we first needed to determine a point that would serve as a vowel-space centre (referred to as centroid). We calculated female and male centroids separately for each of the observed languages (using formant values reported by Hedbávná, 2004, for Czech; Deterding, 1997, for English and van Dommelen, 2011, for Norwegian). Each centroid was determined as a point in the formant space representing the mean F1 and mean F2 value of all vowel phonemes in the inventory of the respective language in Bark. Table 4.2 lists the resulting formant values of female and male centroids in each language. Although the differences between centroids in the three languages are not very large, using separate centroids for the three languages was considered more appropriate with regard to presumable differences in language-specific articulatory settings (cf. Gick et al., 2004; Wilson, 2006). The distance to the centroid was then calculated as the Euclidean distance in the F1-F2 space between the given vowel and the gender-specific centroid for the given language in Bark.
Table 4.2: Formant values (in Bark) of female and male centroids in Czech (CZE), British English (ENG) and Norwegian (NOR).
F1 (Bark) F2 (Bark) CZE females 5.12 11.05 males 4.47 10.22 ENG females 5.66 11.43 males 4.67 10.41 NOR females 5.21 11.51 males 4.40 10.44
It needs to be mentioned that the measure of distance to the centroid was calculated only for tokens containing monophthongs in their stressed syllable (i.e. not words that had a diphthong or a syllabic /r/ as nucleus of their stressed syllable). Moreover, in a few
cases where vowel formant measurements were unreliable due to disturbing background noise (e.g. overlapping speech), the values for all triplet members were excluded from analyses. The same was done whenever a vowel’s distance to the centroid was less than 1 Bark, assuming that the distance to the centroid can serve as a reliable measure of centralisation particularly in peripheral vowels. In addition, we checked all items where the measured vowel occurred in the neighbourhood of a nasal consonant. For items where the appropriate placement of formants corresponding to the vowel quality could not be reliably determined due to a strong influence of nasalisation, the values for all triplet members were excluded from the analyses.
A similar measure of distance to the centroid was used by Koopmans-van Beinum (1980: 55-62) to investigate the differences in vowel quality in different speaking styles in Dutch. Harmegnies and Poch-Olivé (1992) calculated the Euclidean distances of vowels from schwa (defined as: F1 = 500 Hz, F2 = 1500 Hz) and the centralisation indices as the differences between vowel-schwa distance for a given vowel in laboratory speech, and in spontaneous speech in Spanish. Similarly, Laan (1997) calculated the Euclidean distances of vowels in different speaking styles from “ideal” formant frequencies measured in vowels produced in isolation. A slight drawback of our approach may be the use of a common centroid for all female / male speakers rather than using speaker-specific centroids. Unfortunately, the available recordings (both spontaneous speech and read text; for details see Chapter 2) were not specifically designed for investigations of speakers’ vowel systems, and any efforts to collect representative sample from these recordings were expected to provide rather unreliable results. We assumed, however, that the possible inaccuracies in centroid position were not very large, and this simplification should be appropriate for the purpose of comparing distances between the successive mentions of words, rather than using their absolute values to compare between speakers. Moreover, as mentioned above, all vowels with a distance to the centroid less than 1 Bark, which were at most risk to present misleading values of distance to the centroid, were excluded from the analyses.
Apart from the measures directly concerning the observed word tokens, we calculated the local articulation rate in the speech fragment surrounding the observed word (approximately 1 second on each side). The value was determined as the number of syllables divided by clean speech duration in seconds, in the speech fragment excluding the observed content word. The procedure was comparable43 to that used in the previous chapter investigating the realisations of English function words (see Section 3.2.3.4 for details).