Capítulo 2. Análisis geográfico univariante
2.3. Análisis físico - territorial
2.3.2. Tipologías constructivas
In his (2009) study o f Shakespeare's Romeo and Juliet, Culpeper makes clear that the actual quality o f key results and their usefulness to the process o f stylistic analysis depend on the researcher fully understanding the implications o f various alterable parameters in the corpus linguistic software tools, and the influence o f different kinds o f reference corpora. He suggests some practical ways for improving the quality and reliability o f key results, which also have bearing on the other methods applied in my study. I explain the consideration given to these matters below, and the settings I use (reference corpora are discussed in 3.6).
I begin with the type o f statistical significance test used (in 3.4.1), followed by the p value (3.4.2) and minimum/maximum frequency settings (3.4.3). These constrain the number o f results generated (given that the possibilities for analysis are limited in any study). As Baker (2004:351-352) points out, however, there are no set rules about the cut-off points and parameters which should be used, since these vary according to the size and contents o f corpora and the aims which individual scholars wish to fulfil using their data. From the point o f view o f linguistic importance, it is hard to argue that some results are significant and yet others are not, just because some occur above a chosen threshold, particularly if those just below it occur with a closely similar p
value or log-likelihood value to those just above. For that reason, absolute keyness values are generally not used by corpus stylistic researchers as a definitive guide to the results which justify further analysis, but as a way o f ordering the words and other language features which are most potentially important, on an empirical basis. The researcher can then use the relative statistical values (not the absolute values) as an indicator o f which ones might reward closer investigation.
3.4.1 Tests for statistical significance
As indicated in 2 .5 .2 ,1 use the log-likelihood statistical test (Dunning 1993) for calculating key and locked results in WordSmith and Wmatrix, since it is the only option in Wmatrix. The chi-square test (see e.g. Oakes 1998:24-29) is also an option in
WordSmith, though Culpeper (2002, 2009) finds that comparisons with both tests produce very similar results. Rayson et al. (2004b:928) argue that the log-likelihood statistical test is more reliable than the chi-square test for expected frequencies below 5 (although such low frequencies are not an issue in my study). I noted some criticisms o f statistical significance tests as a measure o f keyness in 2.5.3.
3.4.2 Minimum and maximum frequency settings for key and locked results As indicated in 3.2.1, non-localised results which are potential style markers best serve my aims of comparing Shakespeare's language style to that of other contemporaneous dramatists in the parallel corpus. Culpeper (2009:35-36) discusses the imposition o f a minimum frequency cut-off to reduce the number of localised or topical results, i.e.
those which are less likely to be stylistically important (see further 2.7). In my study, the minimum frequency needs to be sufficiently high to eliminate, or at least minimise, results which are localised to one or a few plays. These are likely to be topical in the
corpus o f Shakespeare’s plays, or associated with an individual authorial style in the parallel corpus.
Furthermore, as explained in 2.6, the locked results are by definition those o f high frequency. In identifying lockwords in British English, Baker (2011:70) uses a minimum frequency o f 1,000, taken from his four corpora overall, each of which is a million words in size. Baker points out, though, that his corpora are still relatively small samples, and therefore that smaller numbers o f results from them might not reliably represent language trends in whole language varieties. My corpora are each about 800,000 words (see 4.4), but although they are smaller than Baker's corpora, they are larger samples o f what is available o f the text-type under investigation. The SDC contains Shakespeare's First Folio in its entirety, and an estimate o f the NDC would be that it contains slightly less than one third of the English drama produced at around the same time (based on Craig and Kinney's 2009:xvii corpus o f EModE drama being 3.25 million words; see further 4.3.1).
My tests showed that a minimum frequency o f 200 produced a manageable number o f results for keywords and lockwords, and also for key and locked semantic domains. For the 3-word cluster results, however, I used a lower minimum frequency o f 50, in order to generate sufficient results, since recurrent word combinations occur much less frequently than single words (as pointed out by Mahlberg 2007:12). In principle it is important to use the same minimum frequency for key and locked results o f the same type, in order to be able to claim them as statistical opposites (part of Baker's 2011 definition o f lockwords; see 2.6). For all the locked results, I raised the maximum frequency parameter in WordSmith to its highest, which is 16,000 in version 3.0 (Scott 1999) (as advised by Baker, personal communication, 27.10.11). This is
because the default maximum o f 500 risks excluding some of the high-frequency results which qualify as locked. There is no maximum frequency setting in Wmatrix.
It is worth noting that the minimum frequency settings in WordSmith and Wmatrix relate to observed (actual) frequencies. With small numbers of results, the distinction between observed and expected frequencies can affect reliability in corpus studies, as is made clear by Rayson et al. (2004b). The minimum frequencies used in my study are above the levels at which reliability is likely to be compromised, however, so I do not discuss observed and expected frequencies further.
3.4.3 P values for keyness and locking
The log-likelihood cut-off points associated with different p values discussed in this section are taken from Rayson et al. (2004b:7)26. In 2.5.3 I noted that p values as a measure for keyness are widely used, though not without criticism, and I argued that substantial existing research indicates that this method nevertheless produces results which are potentially useful for stylistic analysis. I do not discuss the pros and cons o f using p values further, other than to explain the choice o f the setting I used.
Baker outlines the statistical comparison performed by WordSmith in generating a keywords list, and explains the role of the p value as follows :27
The p value (a number between 0 and 1) indicates the amount of confidence that we have that a word is key due to chance alone - the smaller the p value, the more likely that the word's strong presence in one o f the sub-corpora isn't due to chance but a result of the author's (conscious or subconscious) choice to use that word repeatedly.
(2006:125)
The setting o f the p value in keyness studies determines the threshold above which words or other linguistic units will be considered to be statistically significant in the two corpora (in other words, what will qualify as "key" and "locked"). The nearer to 0,
26 See also http://ucrel.lancs.ac.uk/llwizard.html (last accessed 10.08.12).
27 Wmatrix works in a similar way, as detailed for example by B. Walker (2010:369-370).
the more evidence there is for a difference in frequency between the two corpora (i.e.
the more key the result is). The opposite nature o f locking to keyness, explained in 2.6, means that the nearer to 1.0 the p value (or the log-likelihood to 0), the less evidence there is o f a difference in frequency, and the more locked a result will be.
For key results, other corpus stylistic researchers have used p values of
between 0.05 and 0.000001. A p value o f 0.05 equates to a log-likelihood cut-off value o f 3.84, and a 95% probability that results are key due to chance alone, which is a generally acceptable level in social science studies (Baker 2006:125). A p value of 0.000001 is the lowest possible setting in WordSmith, equating to a one in a million probability that results are key due to chance28. The decision largely depends on the size o f the corpora and the type o f language unit being analysed. For example, to obtain keywords for their respective studies o f Shakespearean dialogue, Culpeper (2002) uses a p value o f 0.05 with a corpus o f c. 20,000 words from one play, whereas Scott and Tribble (2006) use a p value o f 0.000001 with a corpus o f 37 Shakespeare plays amounting to c. 800,000 words. Corpus stylisticians working with prose fiction have found other p values to be satisfactory: B. Walker (2010:370) opts for a p value o f 0.001 to obtain keywords, key parts of speech and key semantic domains from his 73,000-word corpus o f a novel, and Mahlberg (2007:12) uses a p value o f 0.00001 to extract key 5-word clusters from her 4.5 million-word Dickens corpus.
There are no precedents for determining the principle o f a cut-off point for locked results, i.e. how far below a p value o f 1.0 can a result still be considered to be locked. WordSmith's output shows a keyness value for any item which has a log- likelihood value o f 1.0 or above, but for items with a lower log-likelihood value only the p value is shown. For the purposes o f this study, I therefore take a log-likelihood
28 See also Scott (e.g. 1999:Help menu).
value o f 1.0 as the point below which results become locked. This equates to a p value o f about 0.3. There may well be cases for applying other principles, however (for example, that an equivalent set o f distances from p=1.0 should be established for lockwords as for keywords, such as those in the studies mentioned above). However, the testing and evaluation o f these ideas is well beyond the scope o f this study, and is not necessary to achieve my research aims. The principle I follow enables me to generate ordered lists o f the most locked items, from which I can investigate stylistic similarities between the two corpora. The statistical basis o f "locking" would benefit from further exploration and discussion in future research, however.
I found that even the lowest possible p value o f 0.000001 in WordSmith (which corresponds to a log-likelihood value o f approximately 27 in Wmatrix) produced more than sufficient keyword and key semantic domain results to discuss in the space available. Since the number o f key word cluster results generated were fewer, as mentioned in 3.4.2, for these results I used a p value of 0.01 (log likelihood=6.63) as the cut-off threshold. A p value o f 1.0 (log-likelihood=0) generated more than adequate numbers o f locked results for words, word clusters and semantic domains.
The cut-off thresholds for p value and minimum frequency effectively help to manage the distribution of results, an aspect o f corpus output which I discuss in more detail in the next section.