SEGUNDA PARTE TEORÍA
P OESÍA : INTELIGENCIA LINGÜÍSTICA EJEMPLIFICADA
Sinclair (2004b) wrote that through studying corpora “we observe a creative stream that is awesome in its wide applicability, its subtlety, and its flexibility” (p. 1). These characteristics have made corpus linguistics an essential component of many areas of language material development, such as lexicography (e.g., Corda, 1998; Howarth, 1996; Rizo-Rodriguez, 2004; Rundell & Granger, 2007; Tono, 1996) and learner dictionaries such as the Cobuild Dictionary (Sinclair, 2001) and the MacMillan Dictionary for Advanced Learners of English (Rundell, 2007). Most learner dictionaries are now based on corpus data, and many of the largest and most representative corpora are proprietary ones owned by publishing companies (e.g., Cambridge University Press, Longman Pearson, and Oxford University Press).
Corpus linguistics has had a similar effect on grammar material. In the Cambridge
Grammar of English (Carter & McCarthy, 2006), the authors use the Cambridge International
Corpus, which the publisher claimed to “provide evidence about language use that helps to produce better language teaching materials” (p. vii). Likewise, the grammar reference book from Collins Cobuild (Collins Cobuild English Grammar, 1990) uses data from the Bank of English, another large proprietary corpus that purports to be representative of many areas of English.
Perhaps the most deliberate and comprehensive corpus-driven grammar, however, is the
Longman Grammar of Spoken and Written English (LGSWE) (Biber et al., 1999), which more
than any other grammar reference makes explicit and numerous references to corpus data throughout each section of the book. Table 2 illustrates one example of explicit corpus data; in this case frequency data is used to identify the use of –er and –est adjectives in the four different registers in the Longman corpus: conversation, fiction, news, and academic writing. This
example highlights the competing visions of how to expose teachers to corpus data. In the LGSWE, the emphasis is unabashedly corpus-centric, with a view that frequency lists provide valuable information; in publishing, anything that takes up space is considered more valuable that what does not make it into the book (J. Mairs, personal communication, March 20, 2010). In
the much smaller Longman Student Grammar of Spoken and Written English, based on the same
materials but geared more towards students, the distribution tables are removed, though the frequency lists remains intact (Biber, Conrad, & Leech, 2002). In varying degrees in all these reference books, corpus data informs the choice of material covered, patterns highlighted, and examples used.
Table 2 - Example of corpus data in LGSWE (adapted from Biber et al., 1999, p. 524) Frequency of comparative and superlative forms (per million words; *=100 occurrences)
Register Comparatives (-er forms) Superlatives (-est forms)
Conversation ********** *****
Fiction **************** *******
News *********************** **************
Academic ******************************** ********
In addition to grammar reference books and dictionaries, corpus data has been moving slowly into traditional ESL/EFL textbooks. Promotional materials for Real Grammar (Conrad & Biber, 2009), a student grammar book that highlights its corpus-based approach, proclaim that the book’s ‘focus on authentic usage helps students move past traditional grammar texts and use English more like native speakers” (back cover). The book explicitly identifies material as corpus-based by using sections labeled What does the corpus show? This explicit use presumably will focus the learners’ and the teachers’ attention on the use of corpus data to determine important information to focus on. Other books on writing, grammar, and vocabulary have all used, to varying degrees, principles of corpus analysis in selecting topics and creating materials (e.g., Bunting, Diniz, et al., 2013; Schmitt & Schmitt, 2005).
There are also numerous non-commercial materials available, including materials on Michigan Corpus of Academic Spoken English (MICASE) and the Corpus of Contemporary American English (COCA) both in print and online (Shaw, 2011b; Simpson-Vlach & Leicher, 2006). Other alternatives are locally created corpus-based materials, such as the PRAC
(Published Research Article Corpus) created by Cortes (2007, 2009, 2013a) for research purposes as well as for a graduate level academic writing class for non-native speakers of
English. In her materials, corpus tools were used to identify patterns in experimental research papers, and students created and analyzed their own corpora of relevant texts.
Cortes (2011) examined student written production and perceptions from both corpus- based and non-corpus-based classes using this material, with similar results, though it was noted that the students in the corpus-based class had “new skills that [they] could eventually keep on using once the semester was over” (p. 77). While teachers were not interviewed, students in both sections were satisfied with the class, though around half of the students from the corpus-based class were concerned about “too many papers to analyze” (45%) and “too little time for analysis” (55%). In contrast, around two-thirds of the students in the traditional class felt that they had too
few papers to analyze (69%) and also had not considered using online papers (61%). They also
saw having the class in a computer lab strictly in terms of using grammar and spellcheck tools, without an awareness of other benefits, such as the use of corpus tools. As with many studies, the perspective of the teacher was not directly addressed, though it was noted that the course was never implemented on a broader scale due to institutional and teacher issues.
Books specifically written to encourage either pre-service or in-service language teachers to use corpus tools are the final group of materials examined in this section. The three books included here are From Corpus to Classroom: Language Use and Language Teaching (O'Keefe et al., 2007), Using Corpus in the Language Learning Classroom: Corpus Linguistics for
Teachers (Bennett, 2010), and Using Corpora in the Language Classroom (Reppen, 2010).
These materials are especially relevant because they are perhaps the most likely texts that teacher might read to become familiar with corpora. Of the three, O’Keefe et al. (2007) is the one least likely to be used by in-service teachers due to its cost and size. Further, the authors quite openly state that “we do not intend to tell you how to teach and what to do in your own classes; only you
can know best what is effective and appropriate in your local context, and you are by far the best person to take the final, practical steps in applying our ‘applied’ linguistics, if you judge the book to have value” (O'Keefe et al., 2007, p. xi). The focus of this book is more to provide the
theoretical and research knowledge that the authors feel teachers need in order to then make best use of corpus tools in their own classrooms. They include several chapters on vocabulary topics, several on grammar topics, and one chapter on academic and business corpora.
The other two books emphasize use of corpus tools and methods in the classroom. More than half of Bennett’s book (2010) is on corpus-designed activities. Chapter 4 is on patterns of
the/a/an in television news shows, using concordance lines from COCA. Chapter 5 uses
MICASE to notice the use of signal words in academic speaking. The next chapter focuses on the AWL used in a text in a reading class. The final chapter on hands-on activities involves creating a tagged corpus, with the assistance of other teachers, to help students identify comma errors in their academic writing. This chapter requires the use of rather involved research techniques, such as tagging, and analyzing tagged concordance lines in TextSTAT
concordancing software. This book provides very specific tasks and tools, in contrast to O’Keefe et al.’s (2007) distancing from telling teachers what to do.
Reppen (2010), in addition to laying out the basic principles and tools of corpus linguistics, identifies specific ways to use corpora with language learners: building word lists, analyzing concordance lines, using tagged texts, and examining the role of register. She also provides suggestions on how to use online corpora in the classroom with students, guidelines to create a corpus of student writing, and a series of activities for teachers to modify to match their own classes. All three of these resources offer valuable information, and yet there still appears to
be resistance from teachers about using corpus tools and corpus-based teaching methodologies in their professional lives (Cortes, 2013b).