2.1 REVISIÓN DE TRABAJOS PREVIO REALIZADOS
2.1.2 INTERNACIONALES
There are a number of chapters in this volume which deal with the analysis of spoken corpora, so this issue will not be covered here in any detail. Other chapters in this volume discuss the implications of the unique nature of spoken discourse in terms of its implications for any type of spoken corpus analysis (see, for example, chapters by Evison, Vo and Carter, Rühlemann, this volume). The discourse level frameworks that may be of use for the analysis of spoken corpora are not necessarily compatible with the kind of concordance-based and frequency-driven analysis that is often used in large-scale lex- icography studies. One of the key differences between spoken and written corpora is that most spoken discourse is collaborative in nature and as such it is more fluid and marked by emerging and changing orientations of the participants (McCarthy 1998).
Yet it is important to identify external categories for grouping transcripts in a corpus, especially when we are concerned with analysing levels of formality and other functions which need to be judged against the wider context of the encounter. This is often more straightforward when dealing with written texts as many of the genres that tend to form the basis of large written corpora are readily recognised as belonging to a particular category, such as fiction versus non-fiction, letters versus e-mails, etc. The group mem- bership of such texts is more clearly demarcated than is the case with the majority of spoken discourse. The development of suitable frameworks for analysing spoken corpus data is thus particularly complex and further research is needed to explore and evaluate ways in which the analysis of concordance data and discourse phenomena can be fully integrated.
The development of techniques and tools to record, store and analyse naturally occurring interaction in spoken corpora has revolutionised the way in which we describe language and human interaction. Spoken corpora serve as an invaluable resource for the research of a large range of diverse communities and disciplines, including computer scientists, social scientists and researchers in the arts and humanities, policy makers and publishers. In order to be able to share resources across these diverse communities, it is important that spoken corpora are developed in a way that enables re-usability. This can be achieved through the use of guidelines and frameworks for recording, representing and replaying spoken discourse. In this chapter we have outlined some of the issues that surround these three stages of spoken corpus development and analysis. As advances in technology allow us to develop new kinds of spoken corpora, which include audio- visual data-streams, as well as a much richer description of contextual variables, it will become increasingly important to agree on conventions for recording and representing
this kind of data, and the associated metadata. Similarly, advances in voice-to-text soft- ware may ease the burden of transcription, but will also rely heavily on the ability to follow clearly articulated conventions for coding and transcribing communicative events. Adherence to agreed conventions of this kind, especially when developing new kinds of multi-modal and contextually enhanced spoken corpora, will significantly extend the scope of spoken corpus linguistics in the future.
Further reading
Adolphs, S. (2006) Introducing Electronic Text Analysis. London and New York: Routledge. (This book contains chapters on spoken corpus analysis and corpus pragmatics.)
Kress, G. and van Leeuwen, T. (2001) Multimodal Discourse: The Modes and Media of Contemporary Communication. London: Arnold. (This text provides an introduction to the notion of the‘multi- modal’ in linguistic research.)
McCarthy, M. (1998) Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press. (This book covers a wide range of issues relating to the use of spoken corpora in the broad area of Applied Linguistics.)
Wynne, M. (ed.) (2005) Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books. (This book provides a general overview of some of the key issues and challenges faced in the construction of corpora; from collection to analysis.)
References
Adolphs, S., Brown, B., Carter, R., Crawford, P. and Sahota, O. (2004)‘Applying Corpus Linguistics in a Health Care Context’, Journal of Applied Linguistics 1: 9–28.
Aijmer, K. (2002) English Discourse Particles: Evidence from a Corpus. Amsterdam: John Benjamins. Anderson, J., Beavan, D. and Kay, C. (2007)‘SCOTS: Scottish Corpus of Texts and Speech’, in J. Beal,
K. Corrigan and H. Moisl (eds) Creating and Digitizing Language Corpora: Volume 1: Synchronic Data- bases. Basingstoke: Palgrave Macmillan, pp. 17–34.
Anderson, W. and Beavan, D. (2005) ‘Internet Delivery of Time-synchronised Multimedia: The SCOTS Corpus’, Proceedings from the Corpus Linguistics Conference Series 1(1): 1747–93.
Ashby, S., Bourban, S., Carletta, J., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D. and Wellner, P. (2005)‘The AMI Meeting Corpus’, in Proceedings of Measuring Behavior 2005. Wageningen, Netherlands, pp. 4–8.
Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999) Longman Grammar of Spoken and Written English. Longman: Pearson.
Bird, S. and Liberman, M. (2001)‘A Formal Framework for Linguistic Annotation’, Speech Communication 33(1–2): 23–60.
Bolton, K., Nelson, G. and Hung, J. (2003)‘A Corpus-based Study of Connectors in Student Writing: Research from the International Corpus of English in Hong Kong (ICE-HK)’, International Journal of Corpus Linguistics 7(2): 165–82.
Burnard, L. (2005)‘Developing Linguistic Corpora: Metadata for Corpus Work’, in M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books, pp. 30–46.
Cameron, D. (2001) Working with Spoken Discourse. London: Sage. Canepari, L. (2005) A Handbook of Phonetics. Munich: Lincom Europa.
Carter, R. (2004) Language and Creativity: The Art of Common Talk. London: Routledge.
Carter, R. and Adolphs, S. (2008) ‘Linking the Verbal and Visual: New Directions for Corpus Linguistics’’, ‘Language, People, Numbers’, special issue of Language and Computers 64: 275–91.
Carter, R. and McCarthy, M. (2006) Cambridge Grammar of English: A Comprehensive Guide to Spoken and Written Grammar and Usage. Cambridge: Cambridge University Press.
Cheng, W. and Warren, M. (1999)‘Facilitating a Description of Intercultural Conversations: The Hong Kong Corpus of Conversational English’, ICAME Journal 23: 5–20.
——(2000) ‘The Hong Kong Corpus of Spoken English: Language Learning through Language Description’, in L. Burnard and T. McEnery (eds) Rethinking Language Pedagogy from a Corpus Per- spective: Papers from the Third International Conference on Teaching and Language Corpora. Frankfurt: Lang, pp. 133–44.
——(2002) ‘// aa beef ball // a you like //: The Intonation of Declarative-Mood Questions in a Corpus of Hong Kong English’, Teanga 21: 1515–26.
Conrad, S. (2002) ‘Corpus Linguistic Approaches for Discourse Analysis’, Annual Review of Applied Linguistics 22: 75–95.
Cook, G. (1990)‘Transcribing Infinity: Problems of Context Presentation’, Journal of Pragmatics 14: 1–24. Cotterill, J. (2004)‘Collocation, Connotation and Courtroom Semantics: Lawyers Control of Witness
Testimony through Lexical Negotiation’, Applied Linguistics 25: 513–37.
Crabtree, A., Benford, S., Greenhalgh, C., Tennent, P., Chalmers, M. and Brown, B. (2006)‘Sup- porting Ethnographic Studies of Ubiquitous Computing in the Wild’, in DIS’06: Proceedings of the 6th Conference on Designing Interactive Systems. New York, USA, pp. 60–9.
Dahlmann, I. and Adolphs, S. (2007)‘Designing Multi-modal Corpora to Support the Study of Spoken Language – A Case Study’, poster delivered at the Third Annual International eSocial Science Conference, October 2007, University of Michigan, USA.
——(2009) ‘Spoken Corpus Analysis: Multi-modal Approaches to Language Description’, in P. Baker (ed.) Contemporary Approaches to Corpus Linguistics. London: Continuum Press, pp. 125–39.
De Cock, S., Granger, S., Leech, G. and McEnery, T. (1998)‘An Automated Approach to the Phrasicon of EFL Learners’, in S. Granger (ed.) Learner English on Computer. London: Longman, pp. 67–79. Douglas, F. (2003)‘The Scottish Corpus of Texts and Speech: Problems of Corpus Design’, Literary and
Linguistic Computing 18(1): 23–37.
Du Bois, J., Schuetze-Coburn, S., Paolino, D. and Cumming, S. (1992)‘Discourse Transcription’, Santa Barbara Papers in Linguistics, Vol. 4. Santa Barbara, CA: UC Santa Barbara.
——(1993) ‘Outline of Discourse Transcription’, in J. A. Edwards and M. D. Lampert (eds) Talking Data: Transcription and Coding Methods for Language Research. Hillsdale, NJ: Lawrence Erlbaum, pp. 45–89. Edwards, J. (1993)‘Principles and Contrasting Systems of Discourse Transcription’, in J. Edwards and M. Lampert (eds) Talking Data: Transcription and Coding in Discourse Research. Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 3–44.
Farr, F., Murphy, B. and O’Keeffe, A. (2004) ‘The Limerick Corpus of Irish English: Design, Description and Application’, Teanga 21: 5–29.
French, A., Greenhalgh, C., Crabtree, A., Wright, W., Brundell, B., Hampshire, A. and Rodden, T. (2006)‘Software Replay Tools for Time-based Social Science Data’, in Proceedings of the 2nd Annual International e-Social Science Conference, available at www.ncess.ac.uk/events/conference/2006/papers /abstracts/FrenchSoftwareReplayTools.shtml (accessed 1 February 2009).
Graddol, D., Cheshire, J. and Swann, J. (1994) Describing Language. Buckingham: Open University Press.
Halliday, M. A. K. (2004)‘The Spoken Language Corpus: A Foundation for Grammatical Theory’, in K. Aijmer and B. Altenberg (eds) Advances in Corpus Linguistics. Amsterdam: Rodopi, pp. 11–38. Ide, N. (1998) ‘Corpus Encoding Standard: SGML Guidelines for Encoding Linguistic Corpora’, First
International Language Resources and Evaluation Conference, Granada, Spain.
Kendon, A. (1982)‘The Organisation of Behaviour in Face-to-face Interaction: Observations on the Development of a Methodology’, in K. R. Scherer and P. Ekman (eds) Handbook of Methods in Nonverbal Behaviour Research. Cambridge: Cambridge University Press, pp. 440–505.
Kilgarriff, A. (1996) ‘Which Words are Particularly Characteristic of a Text? A Survey of Statistical Procedures’, in Proceedings of the AISB Workshop of Language Engineering for Document Analysis and Recognition. Guildford: Sussex University, pp. 33–40.
Kipp, M. (2001)‘Anvil – A Generic Annotation Tool for Multimodal Dialogue’, in P. Dalsgaard, B. Lindberg, H. Benner and Z.-H. Tan (eds) Proceedings of 7th European Conference on Speech Communi- cation and Technology 2nd INTERSPEECH Event Aalborg, Denmark. Aalborg: ISCA, pp. 1367–70, available at www.isca-speech.org/archive/eurospeech_2001
Knight, D. and Adolphs, S. (2006) Text, Talk and Corpus Analysis [academic online module, restricted access], University of Nottingham, UK.
Knight, D., Bayoumi, S., Mills, S., Crabtree, A., Adolphs, S., Pridmore, T. and Carter, R. A. (2006) ‘Beyond the Text: Construction and Analysis of Multi-modal Linguistic Corpora’, Proceedings of the 2nd International Conference on e-Social Science, Manchester, 28–30 June 2006, available at www.ncess.ac. uk/events/conference/2006/papers/abstracts/KnightBeyondTheText.shtml (accessed 4 January 2009). Lapadat, J. C. and Lindsay, A. C. (1999)‘Transcription in Research and Practice: From Standardisation
of Technique to Interpretative Positioning’, Qualitative Inquiry 5(1): 64–86. Laver, J. (1994) Principles of Phonetics. Cambridge: Cambridge University Press.
Leech, G., Myers, G. and Thomas, J. (eds) (1995) Spoken English on Computer: Transcription, Mark-up and Application. London: Longman.
McCarthy, M. (1998) Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press. Morrison, A., Tennent, P., Williamson, J. and Chalmers, M. (2007) ‘Using Location, Bearing and Motion Data to Filter Video and System Logs’, in Proceedings of the Fifth International Conference on Pervasive Computing. Toronto: Springer, pp. 109–26.
Newton, E. M., Sweeney, L. and Malin, B. (2005)‘Preserving Privacy by De-identifying Face Images’, IEEE Transactions on Knowledge and Data Engineering 17(2): 232–43.
Ochs, E. (1979)‘Transcription as Theory’, in E. Ochs and B. B. Schieffelin (eds) Developmental Pragmatics. New York: Academic Press. pp. 43–72.
O’Connell, D.C. and Kowal, S. (1999) ‘Transcription and the Issue of Standardisation’, Journal of Psycholinguistic Research 28(2): 103–20.
O’Keeffe, A. (2006) Investigating Media Discourse. London: Routledge.
Psathas, G. and Anderson, T. (1990) ‘The “Practices” of Transcription in Conversation Analysis’, Semiotica 78(1/2): 75–99.
Reppen, R. and Simpson, R. (2002)‘Corpus Linguistics’, in N. Schmitt (ed.) An Introduction to Applied Linguistics. London: Arnold, pp. 92–111.
Scholfield, P. (1995) Quantifying Language. Clevedon: Multilingual Matters.
Simpson, R., Lucka, B. and Ovens, J. (2000)‘Methodological Challenges of Planning a Spoken Corpus with Pedagogic Outcomes’, in L. Burnard and T. McEnery (eds) Rethinking Language Pedagogy from a Corpus Perspective: Papers from the Third International Conference on Teaching and Language Corpora. Frankfurt: Lang, pp. 43–9.
Sinclair, J. (1987)‘Collocation: A Progress Report’, in R. Steele and T. Threadgold (eds) Language Topics: Essays in Honour of Michael Halliday. Amsterdam: John Benjamins.
——(2004) Trust the Text: Language, Corpus and Discourse. London: Routledge.
——(2005) ‘Corpus and Text-basic Principles’, in M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books, pp. 1–16.
——(2008) ‘Borrowed Ideas’, in A. Gerbig and O. Mason (eds) Language, People, Numbers – Corpus Linguistics and Society. Amsterdam: Rodopi BV, pp. 21–42.
Sperberg-McQueen, C. M. and Burnard, L. (1994) Guidelines for Electronic Text Encoding and Interchange (TEI P3). Chicago and Oxford: ACH-ALLC-ACL Text Encoding Initiative.
Strassel, S. and Cole, A. W. (2006) ‘Corpus Development and Publication’, Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC) 2006, available at http://papers. ldc.upenn.edu/LREC2006/CorpusDevelopmentAndPublication.pdf (accessed 4 January 2009). Stuart, K. (2005) ‘New Perspectives on Corpus Linguistics’, RAEL: revista electrónica de lingüística
aplicada 4: 180–91, available at http://dialnet.unirioja.es/servlet/articulo?codigo=1426958 (accessed 4 January 2009).
Svartvik, J. (ed.) (1990) ‘The London Corpus of Spoken English: Description and Research’, Lund Studies in English 82. Lund: Lund University Press.
Tennent, P., Crabtree, A. and Greenhalgh, C. (2008) ‘Ethno-goggles: Supporting Field Capture of Qualitative Material’, Proceedings of the 4th International e-Social Science Conference, 18–20 June, University of Manchester: ESRC NCeSS.
Thompson, P. (2005)‘Spoken Language Corpora’, in M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books, pp. 59–70, available at http://ahds.ac.uk/linguistic-corpora/ (accessed 4 January 2009).
Wray, A., Trott, K. and Bloomer, A. (1998) Projects in Linguistics: A Practical Guide to Researching Language. London: Arnold.
Wynne, M. (2005)‘Archiving, Distribution and Preservation’, in M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books, pp. 71–8.