Because this study is predominantly phonetic, acoustic in particular, it is important to regard the way that data were elicited, including the selection of appropriate recording environment and equipment, as well as control of the recording procedure.
2.2.2.1. Recording place
Physical environment is important in the recording of speech sound for phonetic analysis (Samarin, 1967; Ladefoged, 1999). Having the very supportive assistance of the local community, especially my parents and relatives, I was able to investigate several places for recording, including broadcasting and meteorological stations. Unfortunately, most of these places were not acoustically desirable for a variety of reasons, for example, external noise from upper floors and/or from surrounding building construction, dominant echoes and/or noisy machines inside the rooms, or time conflict between my recording schedule, the availability of informants, and the working time provided by the stations.
In the end, the Zhangzhou Hotel was selected as the recording place for several reasons.
(1) This hotel is surrounded by tall, leafy trees that, to a large extent, dampen external noise from vehicles and building construction.
(2) The recording room was in a very quiet corner and was acoustically absorbent. The carpet and the sound-proof windows and door largely minimised background noise and echo inside the room.
(3) This hotel is located in the innermost part of the city, making it convenient for my informants. The comfortable environment made them feel relaxed and happy to provide their utterances in their best voice, even though most of them had never been recorded in phonetic fieldwork before. The excellent psychological and voice status of informants largely guaranteed the language data quality for processing and analysis.
(4) I was able to manage the recording schedule flexibly within the pre-arranged period of field work without worrying about the constraints of the working place or the time needed to balance in the official recording stations.
(5) I was able to focus entirely on data recording without being interrupted, which is normally an issue for an investigator doing research in his or her hometown.
2.2.2.2. Recording equipment
Even if the corpus for data elicitation is carefully designed, the informants are well selected, and the recording environment is excellently sound-proof, the recordings will be bad if the recording equipment has a low quality (Samarin, 1967; Ladefoged, 1999; Bowern, 2008). Several microphones, kindly provided by the local broadcasting stations and universities, were tested and compared, but not every recorder could produce satisfactory auditory and acoustic results. One professional but expensive cardioid condenser microphone, Blue Baby Bottle SL, provided by Huaqiao University, was adopted for the recording.
This recorder features a new switchable 100Hz high-pass filter and -20 dB response, which can accurately capture the the human voice, regardless of vocal type, instruments, or environments. It also features a single-membrane large-diaphragm capsule mounted in a “lollipop” style enclosure for enhanced sensitivity. Its cardioid polar pattern is effective in minimising noise and ambiance at the off-axis sections of the capsule.
Another important aspect concerning the value of this microphone is that it worked comparably well with my MacBook Air computer and the sound recorder in Praat, version 5.3. The sounds recorded and reproduced via this microphone had very high quality with clear waveforms and spectrograms shown in Praat. During the recording, this microphone was placed about 8 inches from the speaker’s mouth and about 4-6 inches down, as suggested by Samarin (1967, p.100), to avoid recording strong aspirations such as puffs or pops.
2.2.2.3. Recording procedure
As Samarin (1967, p.99) stated, “The highest quality of recording equipment is no substitute for poor recording procedure. Any one of a number of details can ruin what could have been an extremely good and valuable recording”. The recording procedure of this phonetic fieldwork was also very carefully scheduled in terms of the time management and corpus presentation.
2.2.2.3.1 Time management
Each informant was assigned an individual time slot of 4 hours according to their availability. The time included 0.5 hour’s demonstration and 3.5 hours’ recording. The demonstration time was designed for informants to see how their predecessors were recorded. It helped familiarise the informants with the recording procedure and helped them understand the mechanisms of the work session, enhancing their confidence and reducing their nervousness about being recorded. In addition, the informants were able to look at the word list to get an overall impression of what were being elicited.
The 3.5-hour recording session was broken into three formal elicitation sessions (3*1 hours) and one interaction session (0.5 hour). Each formal session of about one hour involved four small tasks:
● Reading monosyllabic tokens for citation tone elicitation,
● Reading disyllabic tokens for tone sandhi elicitation,
● Reading multi-disyllabic tokens for specific tone sandhi investigation,
● Reading supplementary tokens of local cultural relevance, for example, place names, numbers, food, and so on.
Breaks of about 5 minutes were taken between tasks. During the breaks, speakers could rest their voices and have water, dessert, or fruit to recover their energy, and I could save individual sound files. Normally, speakers might provide feedback about their recording experiences and correct items not properly produced. The three sessions actually involved the same tasks and elicited the same data sets but had differences in the ordering of tokens to control for variation in voice volume and intensity that frequently occurs when reading the same tokens in three times in sequence.
The 0.5-hour interaction session was used to elicit informants’ personal information, including date of birth, birth place, current residence, language acquisition, education, and occupation (see Tables 2-2 and 2-3). If time allowed, informants were encouraged to read a short narrative, tell a local story, or produce other utterances that they were interested in and wanted to share.
2.2.2.3.2 Corpus presentation
Tokens to be elicited were all written in simplified Chinese characters and presented via Powerpoint, with one slide for one token, rather than using a typed list. Presenting corpus in Powerpoint offered several advantages.
● It ensured each token individually shown on the slide would be produced in a clear and unexaggerated voice, with balanced and well-controlled intensity and speech rate. On the contrary, using a typed list might cause speakers to articulate tokens with a higher F0 and amplitude at the beginning but with a lower or reduced F0 and amplitude at the end.
● It could minimise possible sandhi and intonation effect on the tokens being elicited. Using a typed list, on the other hand, might induce some speakers to read tokens in a quick succession, which often contains unnecessary sandhi and intonation information.
● It also controlled speakers’ emotional state and motivated them to produce utterances in a natural and coherent state throughout the elicitation process.
All recordings were digitised at a sampling frequency of 44100 Hz in Praat, which is reported to be fast enough to capture the highest sinusoidal frequencies detectable by the human ear (Huckvale, 2012, p. 195). Each created sound file was then saved and named with a corresponding code for further data processing and analysis. The sound files for each individual speaker can be referred to in the Appendix C (attached USB).
This field trip proved to be very productive and enjoyable. Twenty-one speakers were recorded for experimental purposes and another six speakers for vocabulary documentation.