In the case of Present-day English (PDE), data were extracted from the post-1900 sections of both CLMETEV and ARCHER (detailed in tables 3.7 and 3.8, respectively) as well as from the Freiburg-Lancaster-Oslo-Bergen Corpus (FLOB) and the British National Corpus (via the web-based interface, BNCweb).
Sub-period CLMETEV
1900-1920 2,093,333
Total 2,093,333
Table 3.7 PDE - The Corpus of Late Modern English Texts (extended version)
Sub-period ARCHER
1900-1949 176,907
1950-1999 178,241
Total 355,148
Table 3.8 PDE - A Representative Corpus of Historical English Registers Compiled in the early 1990s, the FLOB corpus (see Hundt et al. 1998) was designed to match the Lancaster-Oslo-Bergen Corpus (for details, see Johansson et al. 1978) of twenty years previous. To this effect, it contains c.1 million words of written British English from the 1990s, across fifteen genres. When added to the present-day data from CLMETEV and from ARCHER, the database for this period consisted of c.3.4 million words.
The decision to consult the c. 100 million-word BNC in addition, thus bringing the total number of words in the PDE database up to c.103.4 million, was twofold. On the one hand, the further increase to the sample size was deemed advantageous in terms of
achieving maximal representativeness and thus further enhancing the validity of the results of the study. On the other hand, an added benefit was the fact that this corpus contains both spoken and written data. Although the study primarily focuses on written data (largely for reasons of uniformity, since this type of data is more readily available from the earlier stages of the language), access to spoken data was required in order to aid the investigation of the additional more pragmatic ‘response particle’ function, which is evidenced by a number of the maximizers under investigation 26.
Covering the latter years of the twentieth century (specifically, 1960 to 1993), this web-based version of the BNC consists of a series of text samples from a variety of genres, registers, domains and situations (for further details of the sampling method, see e.g. Burnard 2007 or Hoffmann et al. 2008: 27-45). As the breakdown in table 3.9 shows, the ratio of written to spoken data in this corpus is 9:1, the former comprising 87,903,571 words (from 3140 texts) and the latter 10,409,858 words (from 908 texts).
Data type BNC
Written 87,903,571
Spoken 10,409,858
Total 98,313,429
Table 3.9 The British National Corpus (accessed via BNCweb)
Whilst the inclusion of the BNC proved useful to some extent for the investigation of the above-mentioned ‘response particle’ functions, the number of examples in some cases was too few to ascertain any developmental patterns. For this reason, and given that an American source was intuitively assumed for this function in certain cases (cf. e.g. discussion of totally in chapter 9), the decision was made to conduct an additional examination of relevant terms in the Corpus of Contemporary American English (COCA) compiled by Davies (2008-)27.
COCA is a web-based corpus of c.425million words of present-day American English, spanning the time period 1990-2011 and featuring texts from five genres, viz. spoken, fiction, popular magazines, newspapers, and academic texts (for further details, see Davies 2008-).
26 In the absence of spoken data, the few instances retrieved were found in data from the written corpora that
imitate speech, for instance, in dialogue in novels and plays. Unfortunately, due to lack of access to spoken data from previous stages of the language, these latter sources had to be relied on for any evidence of this use prior to PDE.
27 Since the present study focuses on British English, any consultation of COCA is purely exploratory as a means
of hypothesising how the emergence of this function in British English may be connected to earlier usage in American English. This is thus clearly indicated as such in any of the relevant forthcoming sections.
3. Methodology
Having consulted the OED in order to determine the initial attestation dates of each of the seven selected maximizers, they were individually searched in corpora dating from the time of their earliest recorded use until the present-day in order to retrieve data for the analysis of their development. In the case of the majority of the corpora, the concordance
programme Monoconc Pro, version 2.2 (Barlow 2004) was used for the data retrieval; the exceptions being BNCweb and COCA, which were each searched (the latter, when
necessary) using their own web-based facilities.
In most cases, exhaustive searches were conducted, though in some cases particularly high frequencies made it necessary to take a random sample of the results. This was often the case with the CLMETEV corpus for the LModE period and was required for all seven of the searches of the BNCweb. Where this was the case, it is clearly indicated during the presentation of results for that item.
Following extraction, any cases where a maximizer was used in the scope of a negative was eliminated, since this particular type of context falls outside the scope of the present study. The remaining data for each item was then subject to a series of manual sorting in preparation for the analysis. In the first instance, and taking each period
consecutively, the data was sorted according to the function of the item. This was so as to enable the charting of the functional development of each item across time (in conjunction with qualitative analysis of the data). The categories (depending on the item28) included manner adverb, maximizer, scalar degree modifier, emphasizer, focussing adverb, epistemic/modal adverb, and response particle (each of which is defined in section 3.1, below). There was also a category entitled ‘ambiguous’, in which examples were placed if they were potentially ambiguous between two or more of the named categories and thus could not be positively classified in any particular one.
In the second of the data sorts, all of the degree modifier cases were further grouped according to the type of constituent it was used to modify. The purpose of this was to examine, again in conjunction with qualitative analysis, the historical development, particularly in terms of collocational behaviour, within the maximizer class. The categories distinguished were ‘adjective phrase’ (ADJP), ‘verb phrase’ (VP), ‘adverb phrase’ (ADVP), ‘prepositional phrase’ (PP) and ‘noun phrase’ (NP). In addition, there was a category designated ‘other’ to admit those items that did not fit any of the common named categories.
28 Not all of these functions were available to every one of the items.
After the manual sorting of the extracted data, a combination of detailed
quantitative and qualitative analysis was conducted for each item along the above-noted lines in order to elucidate the development of the individual maximizers. To this effect, detailed diachronic accounts of each of the seven selected maximizers are presented consecutively in chapters 4-10. Subsequently, these individual accounts were compared so as to ascertain similarities and differences and thus build up a picture of the entire
maximizer sub-class (see chapter 11). Finally, the data and the findings of the analyses conducted for the above-mentioned purposes were also used to address the wider concerns presented in the previous chapter (see chapter 11).