• No se han encontrado resultados

2. CASO DE ESTUDIO

1.2. Los agroecosistemas como sistemas socio-ecológicos

3.3.1. Defining the ‘record’ phase

The record phase is the data collection stage. Since, as discussed in Chapter 2, few current MM corpora are publicly available, and those that are have proven to be unsuitable for exploring the line of linguistic enquiry that is the concern of this thesis, developing MM corpora require completely new and relevant data sets to be recorded.

It is vital that all such recordings are both ‘suitable and rich enough in the information required for in-depth linguistic enquiry, and of a high enough quality’ (Knight and Adolphs, 2006) to be used and re-used in a corpus database. Thus, corpus developers should strive to collect data which is as accurate and exhaustive as it can be, capturing as much information of the content and context of the discursive environment as possible (Strassel and Cole, 2006: 3, also refer back to Sinclair’s suggestions in section 3.2.1). This is because the loss or omission of data cannot be easily rectified at a later date, as real-life communication can not be authentically rehearsed and replicated. Hence, it is paramount for the researcher to decide exactly what is to be recorded a priori to picking up a dictaphone or video camera.

This necessitates a process of planning, the importance of planning for the construction of qualitative datasets, including corpora, is discussed by Psathas and Anderson, 1990 and Thompson, 2005. Primarily, the plan helps to determine the types of subjects to be involved, in other words who the

participants are; how many will partake in the recordings, and so on. It also determines the design of the recording process, the types of data which need to be recorded; the amount; the topics that are discussed in the corpus, if specific, and how such topics are adequately covered. Furthermore, the plan helps to define the physical conditions under which the recordings are to take place, in other words the when and where of the recording; whether data is written, audio or visual; what equipment is used; where and how this is set up. Often corpus developers will keep a checklist or a log of their progress throughout the construction. This not only helps to detail specific recordings, and to catalogue and organise them, but it also acts as an invaluable point of reference for discussing and/or justifying anomalies or ‘gaps’ that occur in the data, as well as accounting for interesting patterns that may become apparent in the subsequent analyses.

3.3.2. Blueprints for recording multi-modal corpus datasets 3.3.2.1. The recording set-up

The conditions used in the recording phase perhaps require the most redefinition with the onset of new MM corpus datasets. Although research using audio recordings of conversation has had a long history in corpus-based linguistics, the use of digital video records as ‘data’ is still fairly innovative. Granted, cameras have, in the past, sometimes been used in addition to dictaphones when collecting spoken corpora, acting as an aide-mémoire when compiling a corpus (see the BASE16 corpus, for example). However,

16BASE (British Academic Spoken English Corpus) is a corpus comprised of 160 lectures and 40 seminars recorded in a variety of different academic departments at Warwick and

these recordings are not generally integrated into the final assembled corpus. Therefore considerations such as the quality of the recordings, the basic set- up and the type of the cameras used, and so on, took less precedence than they do with developing MM datasets; for which cameras are integral to the design of the record phase.

It is interesting to note that the conditions and procedures used in the VACE, AMI, MSC 1, NIST and MM4 corpora (refer to Figure 2.1 in Chapter 2 for further details and related references) are all based on a similar model; utilising a range of highly specialised equipment in a standardised, and thus replicable, recording set-up. This tends to be based on a variation of that seen in Figure 3.1, an example of a MM corpus recording set-up plan taken from the VACE Multimodal Meeting Corpus (Chen et al., 2005: 3).

Figure 3.1: An example of the recording set-up typically used in specialist

meeting room corpora (example taken from the VACE corpus, Chen et al., 2005).

The use of multiple Digital Video (DV) cameras in this set-up allows for a fairly large number of speakers (ranging from 2 to 8 in each of these corpora) to be recorded simultaneously, at a relatively close range. These cameras are either fixed on static tripods around the room, or suspended from the ceiling using overhead rail systems, as with the VACE corpus. In the case of the AMI corpus, additional remote participants are also actively involved in discussions by means of video links and conferencing software.

Each camera also records sound, which, when coupled with the output received from the fixed mounted microphones and, often, wireless microphones attached to each participant, allows for a high quality of audio output to also be collected. Each audio and video output can subsequently be synchronised, based on time, after the recording, in order to allow users to navigate the data with ease.

Given that the set-up is so fixed, it is likely that large datasets can be assembled fairly swiftly, as with the 100 hours contained within the AMI corpus, since the positioning of cameras, and so on, can be maintained from one recording session to the next. Only participants and the specific content of the discussions will change. Although, obviously this relies on the corpus compiler having the resources to, firstly, have access to this equipment and, secondly, to dedicate these cameras to corpus compilation alone, (semi) permanently fixing them into these specific positions in the recording room.

A primary criticism of the VACE corpus recording set-up, one which holds true for all forms of video recording, is that although there are no researchers or bystanders physically present throughout the recording of the data (only the recorded participants), the presence of the cameras alone can cause some of

the effects associated with the ‘observer’s paradox’ (Labov, 1972). Participants may consciously, or even sub-consciously, adjust their behaviours because they are aware that they are being filmed, as video cameras are generally quite obtrusive. However, since it is technically not ethical to ‘hide’ cameras, it is difficult to minimise the potential effect that the observer’s paradox will have on how naturalistic the participant’s behaviour is.

Another shortcoming associated with this method of recording, one which perhaps limits the extent that it can be transferred beyond this specialist context, is that the fixed positioning of the table, participants and even cameras produces almost experimental, laboratory-type, conditions. Although this set-up is perhaps not strictly as experimental as that used in the MIBL corpus and the Czech audio-visual speech corpus (see Figure 2.1 in Chapter 2 for further details), it can seen to be far from naturalistic. Firstly, the use of the table means that there is a limited view of each participant, only from the torso upwards. Thus, should a researcher desire to explore, for example, leg and lower body movements or even exaggerated hand and arm movements, this would not be possible as these movements are likely to take place out of view of the camera lens. Secondly, as participants are only allowed to sit in specific locations, they are not really encouraged to, for example, get up and move around as perhaps they naturally would. This is because such movements are likely to affect the quality of recordings as they will move out of the focus of the cameras.

Since the cameras that are used are static, the data collected is very much fixed in terms of location and time. This set-up does not support recordings of spontaneous interaction in real-life environments ‘on-the-move’. It is relevant

to note that, as discussed in Section 2.2.4 of Chapter 2, both the SVC corpus and the SK-P 2.0 corpus (see Figure 2.1, Chapter 2) begin to tackle this limitation by utilising a corpus recording approach which is less context- specific, thus more ‘mobile’. The SVC, for example, uses portable Smartphone devices to record a range of different public spaces, some:

Indoors (office, lobby, public cafe) and some outdoors (courtyard, park) with varying acoustic and lighting conditions, changing sources of background noise and visual background (resulting for example from different weather conditions: sunny with blue sky or cloudy). These conditions were not controlled for the experiment but have been documented in the recording protocol. (Schiel and Mögele, 2008: 2)

Similar environments were recorded as part of the SK-P 2.0 (see Figure 2.1, Chapter 2 for further details, also see Schiel et al., 2002).

In theory, this variability starts to overcome some of the drawbacks of using laboratory-type settings for recording MM corpora. However, in reality these corpora do not exist without shortcomings of their own. Primarily, the Smartphone devices are only used to record single participants in these corpora, even despite the fact the SVC is based on dyadic conversations. This limits the potential for exploring patterns in dyadic or group behaviour in the data. Furthermore, the quality of these recordings is not particularly good and only specific sequences of behaviour, facial expressions and, in this case head movements, can be captured at a high resolution. However it is

appropriate to note that this is perhaps more a limitation of the equipment than the recording design methodology. An additional, more general limitation of these corpora is that they are both task-orientated, so although discourse is occurring in natural contexts, the prescribed nature of the tasks involved affects the spontaneity and perceived naturalness of the data collected.

Despite this, these corpora can be seen to offer an insight into possible directions that linguistic corpora development may take in the future; an insight into the type of corpus datasets that will possibly supersede 4th generation MM corpora. Indeed plans for similar ‘mobile’ corpus datasets, comprising ubiquitous information are being drawn-up by researchers at the University of Nottingham, as part of the DReSS II17 project. This includes data from a range of different contexts, including face-to-face situated discourse through to the use of SMS messages, MMS messages, and interaction in virtual environments and so on. The DReSS II project aims to utilise digital technologies to develop a system for recording the language experience of individuals from multiple perspectives. This is with the view of enabling a more detailed investigation of the interface between various different communicative modes; tracking a specific person’s (inter)actions over time, i.e. across an hour, day or even week. The analysis of information of this kind can potentially help to question the extent of language choices determined by different communicative environments. Such advances will help to overcome some of the limitations of current MM corpora, i.e. those associated with context-specificity; the observer’s paradox; fixed and static recording method, the perceived ‘naturalness’ of data, and so on. Furthermore, they will perhaps 17

More information on DReSS II can be found at: http://www.ncess.ac.uk/research/digital_records/

allow us to gain a better insight of true, ‘real-life’ language-in-use as indeed corpora aim to provide (refer back to Sinclair’s suggestions, 2005).

Studies into corpora of this nature are therefore very much a priority for the future in CL research and development. However, at present no fully functioning corpus of this nature is in existence because linguists are still tackling the problems associated with MM corpora of the nature as discussed in the current thesis.

3.3.2.2. The recording set-up used for the NMMC

Again, the NMMC, as with the CID, IFADV and the Göteborg Spoken Language Corpus (refer to Section 2.2.4 of Chapter 2 for more details) was designed to allow more flexibility in the recording of natural language data than, more experimental, specialist meeting room corpora such as the VACE corpus allow. This was in order to meet the following prescriptions (Knight 2006, in alignment with Sinclair’s prescriptions, 2005):

 To record multiple modes of communication in natural contexts.

 To use a recording method that can be easily replicated in future studies.

 To record both the individual sequences of body movements of all speakers in an interaction, but allow for the analysis of synchronised videos in order to allow the examination of co-ordinated movement (i.e. across each speaker).

 To obtain recordings that can be replayed and annotated by other researchers.

However, as with the corpora noted above, it proved difficult to strike a balance between the resolutions of recordings, i.e. the quality of data collected, and the perceived naturalness that it represents. Furthermore, it was even more difficult to maintain a balance between these factors and the usability of the corpus data collected. Consequently, the basic recording set- up used for the NMMC is thus somewhat still similar to the laboratory-type settings seen with the VACE corpus, and other corpora listed above. However this was not merely restricted to a meeting room environment. Figure 3.2 presents a plan of this set-up (Knight et al., 2009).

Figure 3.2: A basic recording set-up for multi-modal corpus development,

based on the NMMC.

Two DV cameras were used as part of this set-up, specifically to allow for individual bodily movements of each participant to be recorded and also enabling the data to be digitised for subsequent Mpeg compression. These

images were later synchronised using Adobe Premiere18, so that the behaviours of both participants could be observed simultaneously during the analysis of the data.

These recordings took place in relaxed, familiar settings’ with ‘each conversation last[ing] 45-60 minutes (see Knight et al., 2006). The purpose of this was ‘to minimise the effects of observer’s paradox, by enabling speakers to become more at ease around recording equipment, thus promoting talk that is as natural as possible. Although the setting used was perhaps more laboratory-like than ‘natural’, as Argyle notes, it is actually possible to arouse innate responses and patterns of behaviour from participants in such environments (1988: 11), provided that they feel relaxed and at ease with, amongst other things, the settings and the people with whom they are communicating.

To enhance the quality of audio data collected, a high specification microphone was positioned between speakers. For the purpose of recording the CID, this microphone was supplemented by head-set microphones for each participant. This was to allow the corpus to be utilised for the explorations of the phonetic characteristics of talk, which is one of the key aims of the CID. Similar devices were not used in the NMMC as it was decided that the addition of such headsets would likely to obscure the images of the head, face and upper torso, making it difficult to explore specific sequences of movement in such areas with ease, as is the concern of the present study.

Further to this, unlike the set-up seen in Figure 3.1, participants were not specifically requested to sit around tables for the NMMC. This was to enable recordings to capture a range of different forms of NVB and NVC, focusing not only on the head and face, but on the hand and arm movements, and the complete torso of each participant. This was to enable a range of different iconic gestures and certain proxemic movements to be studied.

Although the conversations recorded for the NMMC were not strictly task- driven it is important to note that all data was collected from a university setting. All episodes featured native English speakers in academic environments at the University of Nottingham. These conditions perhaps suggest that the results from any analyses of such data are likely to be somewhat context and/or genre dependent. Although this is obviously a shortcoming of the corpus, perhaps aligning it to a more ‘specialised’ type, this restricted cross section of participants exists here as a useful starting point for the development and analysis of new MM methodologies. However, it would be beneficial if data from a wider range of socio-cultural contexts collected under different conditions were available for future MM CL research.

3.3.2.3. Corpus size

The question of how much data is enough? when constructing a MM corpus is a complex and challenging one, for which no definitive answer exists. This is true not only for MM corpora, but is also relevant for mono-modal corpora. On the topic of corpus size, Baroni and Ueyama (2006) suggest that:

Because of Zipfian properties of language, even a large corpus such as the BNC contains a sizeable number of examples only for a relatively limited number of frequent words, with most words of English occurring once or not occurring at all. The problem of ‘data sparseness’ is of course even bigger for word combinations and constructions.

In 1935 Zipf used a counting based method to ascertain the frequencies of various linguistic features in order to extract interesting observations in respect of real-life language use. As a result of his pioneering work, ‘Zipf’s law’ (1935) was proposed, suggesting that ‘the product of rank order and frequency [of lexemes] is constant’ (Kilgarriff, 1996: 39) in language. So, in theory, this implies that ‘the most common word in a corpus is a hundred times as common as the hundredth most common, a thousand times as common as the thousand, and a million times as common as the millionth’ (Kilgariff, 1996: 39).

This constant suggests that a key ‘factor that affects how many different encounters you have to record [for a corpus] is how frequently the variable you are interested in occurs in talk’ (Cameron, 2001: 28). Thus, larger datasets, or indeed datasets from specific contexts, will be required for less common words, whereas with more commonplace phenomena this is not always necessary.

So there is little point in collecting, for example, 70 hours of video data to explore the presence of yeah in discourse when the results would probably

not be any more revealing than those seen in 7 hours of data, given that this minimal response is so frequent (see Beach, 1993; Drummond and Hopper, 1993a and Gardner, 2001). Whereas, if 70 hours of data only includes a couple of instances of the phenomenon under focus it is prudent to think of other ways of collecting relevant data, or indeed to reconsider whether it is more cost-effective to focus upon something that is more frequent in discourse.

Referring to spoken corpora specifically, Thompson (2005) highlights the necessity of deciding between the ‘breadth’ and ‘depth’ of what is to be recorded, and for providing a cost-benefit analysis of this. This notion of the cost-benefit is also relevant for emergent MM corpora. Essentially, this