As we have seen, grammaticality and acceptability judgement studies have been used to investigate phenomena in both native speakers’ and learners’ language use. However, a number of problems have been associated with such studies (e.g. Alanen 1997; Dąbrowska 2010; Ellis and Barkhuizen 2005, 19-21; Gibson and Federenko 2013; Myles 2013, 55-56; Whong and Wright 2013). A summary of such criticism is provided in Table 2.
First of all, the overall validity of Universal Grammar is questionable: according to Dinsmore’s (2006, 80) meta-analysis, Universal Grammar “does not fully operate in adult/adolescent L2 learning”. Furthermore, it now seems that there are very few universals across languages (e.g. Cook 2016, 25). Another criticism is that grammaticality judgements have often included measuring reaction times of participants’ identification of grammatical or ungrammatical sentences (e.g. Ellis and Barkhuizen 2005). Quite often, participants have had to make their judgement under time pressure and with isolated sentences, without any context. The practical value of such studies is questionable. Furthermore, learners participating in grammaticality judgement tests are “influenced by non-linguistic factors, such as learners’ varying attention or willingness” to make judgements (Norris and Ortega 2011, 578).
Table 2. Criticism of grammaticality and acceptability judgements
Nature of criticism Reason for criticism Corrective measures
Isolated, decontextualised sentences
Specific contexts may require different solutions
Provide more context Too few judges Enables bias and unjustified
generalisation
Use more judges Experts used as judges Experts have repeated exposure to
linguistic forms that are rare in actual use
Use naïve judges
Yes-no scale Acceptability is a gradient phenomenon Use a more extensive scale Artificially created
sentences
Little value in studying contrived sentences
Use more natural sentences Unreliable Impossible to say what the judgement
is based on; allows guessing
Exercise great caution in reporting the results
Another problem with grammaticality and acceptability judgement studies is that grammar is not rigid. There are instances where “otherwise unacceptable utterances become acceptable in a given context” (Boas 2011, 1272). In addition to the conventionalised senses of particular items, it is sometimes possible to find unconventional usages in specific situations and contexts (Boas 2011, 1275-1284). Furthermore, if the judgement is simply between grammaticality and ungrammaticality, it remains difficult to prove what the learner’s criterion is for the judgement (Dinsmore 2006, 59).
A further problem arises from the number of judges used to testify to the appropriateness of particular expressions. Many studies assessing learners’ ability of judging the grammaticality or acceptability of sentence/meaning pairs have used either very few or even single judges, often only the writer of the article (Gibson and Federenko 2013, 89; see also Dąbrowska 2010; cf. Section 3.2.2). However, if this one judge is given unlimited power, it may lead to unjustified conclusions. When the use of one judge is combined with a small number of participants and a small number of stimuli, the cognitive biases on the part of the researchers can become overly prominent and prone to overgeneralisations and faulty interpretations (Gibson and Federenko 2013, 89).
Gibson and Federenko (2013, 98) also argue that since expert informants (e.g. experienced linguists) typically have cognitive biases related to the judgement of the acceptability or grammaticality of certain items, it would be better to use naïve participants. In particular, expert informants are “biased due to their understanding of the theoretical hypotheses”, i.e. they may deduce which phenomenon is being investigated and respond according to what they think the researcher expects (Gibson and Federenko 2013, 99; see also Dąbrowska 2010). Furthermore, such informants may have greater exposure to particular types of structures and potentially prescriptive attitudes to the stimuli, while naïve informants are more likely to respond “based on their own intuitions and are not affected by cues from the experimenter” (Gibson and Federenko 2013, 101). According to Gibson and Federenko (2013, 116), there are several cases where “questionable intuitive judgement has led to an incorrect generalisation, which has then led to unsupported theorising that many researchers have followed”.
While the above studies focus on research in testing situations, research on authentic student production and on second language learning often also uses single informants, typically the students’ teacher or the researcher. For example, Tremblay (2011, 351; see Section 6.4.2) does this: she is the researcher, teacher and the only judge in determining the acceptability of learner
French. Similarly, many studies that assess learners’ skills in a foreign language rely on error- tagging, with predefined decisions on what constitutes an error; an example of such practice is the study by Murakami and Alexopoulou (2016), cited in Section 4.4.1. In such cases, limitations in the teachers’ or researchers’ knowledge and their language-related biases may distort the results and assessment.
Repeated exposure may also affect people’s judgement so that if they see a structure, even an inappropriate one, reappear, they may start finding it more acceptable. For example, Dąbrowska (2010), who studied naïve and expert judges’ intuitions in grammaticality or acceptability judgements, found that linguists’ judgements were black-and-white, while the naïve informants displayed a continuum. Dąbrowska (2010, 15) argues that “linguists’ judgments are sensitive to grammatical structure and relatively insensitive to lexical content, while the opposite is the case for the nonlinguists”. She further argues that linguists may have a different view on certain structures compared to naïve informants simply because they have seen more such structures; some structures that are prevalent in literature on linguistics are fairly rare in actual use (Dąbrowska 2010, 15-21). Hence, experts “cannot simply rely on their own intuitions and assume that they are representative of the community at large” (Dąbrowska 2010, 21). To some extent, teachers may have a similar bias.
As was discussed in Section 3.3.1, great caution needs to be exercised when applying the results of grammaticality and acceptability judgement studies. A related problem is that it remains uncertain what the studies try to tap (e.g. Alanen 1997; Ellis and Barkhuizen 2005; Mackey and Gass 2005, 50). If a learner marks a sentence as unacceptable but continues to produce similar sentences, what does this tell us about the learner’s understanding of the second language? Furthermore, what is the learners’ criterion for the judgement: idealised use, actual use or something else? Thus, using acceptability judgements remains somewhat problematic because very little can be said about the learners’ skills based on how they rate isolated sentences, focusing on the appropriateness of sentences that may have been artificially created (for a discussion, see Mackey and Gass 2005).
As can be seen, grammaticality and acceptability judgements usually expect learners to work under time pressure and judge the appropriateness of isolated, artificially created sentences without any context. In contrast, my study (see Chapter 6) is very different in that it provides a coherent text with contextual cues and exerts no time pressure on the students; they simply fill in the gaps to the best of their ability. Furthermore, it is not the students who are to assess the
grammaticality or acceptability but the teachers, and they work within a specific context, taking as much time as they need. Nevertheless, I acknowledge the influence of the criticism presented above in some features of my study. For example, I do not claim that my test evaluates the students’ overall proficiency in English, and I use 13 teacher informants to avoid bias from a limited number of raters. Furthermore, the teachers are provided with a gradient scale instead of a black-and-white assessment criterion. In addition, the test does not consist of isolated sentence pairs but provides a contextualised story.
3.4 Summary
Grammar is understood in many ways, from a narrow view focusing on syntax to a view focusing on the meaning-making potential of language. Studies on teachers’ beliefs and their attitudes to grammar seem to produce conflicting results. It may be that teachers whose own background contained detailed attention to grammar continue to provide such instruction to their students, while teachers whose attention was not drawn to grammar may feel uncomfortable with providing grammar instruction. Non-native teachers are perhaps more prepared to teach grammar as their background more likely included extensive instruction in grammar, while native teachers may not have experienced much grammar instruction at all. Furthermore, teachers in non-Western countries are perhaps expected to teach grammar explicitly, while many teachers in Europe may be less willing to do so. In addition, there can be some mismatch between teachers’ and learners’ understanding of the term ‘grammar’. While teachers may not enjoy teaching grammar or correcting errors, learners often tend to expect instruction in grammar and explicit error correction. Moreover, if learners’ attention is not drawn to any systematic errors they may make, some of these features may become fossilised. While fossilisation is not necessarily permanent, recent research suggests that learners would benefit from explicit attention to grammar.
In testing students’ skills in grammar, different raters do not always agree on the rating to be given to a particular response. Despite attempts at standardisation and extensive training, raters may continue to assess students’ production in different ways. Ultimately, even the best raters have occasional ‘blind spots’, when they either lose concentration or occasionally take a more severe or lenient approach compared to their general performance as a rater.
Grammaticality and acceptability judgements have been used to assess students’ proficiency in grammar, but the approach has some problems. The results cannot be easily generalised, and they may not truly reflect what the participants can do in a language. Moreover, if there is only one judge determining whether the sentences are grammatical or acceptable, there is a danger of bias, which may lead to ill-justified conclusions about students’ ability or acceptable language use.