This chapter has aimed to provide insight into research on the rating of written compositions and the factors affecting that process, which lead to variation in rating (e.g. differences in topic/prompt of composition, differences in score scale or criteria used, differences in rater training). We also reviewed relevant topics concerning the rating process, such as the kinds of behaviors observed, sequences of rating behaviors, and the TA research methods often used to gather data. Although most is written about
rating with a testing or performance assessment purpose, we attempted to focus on literature with possible implications for our study of the rating of writing done with a practice purpose, by the writers' own teachers, without any imposed scale or criteria. In fact, in pursuing our literature review it became noticeable that only two studies (Huot, 1993; Wolfe et al., 1998) deal with rating where no scale at all is supplied to the raters, although a number supplied/imposed a scale without specifying the criteria to be used to determine how scores on the scale were to be awarded. These two studies are both now quite old, so this encouraged us to further explore this type of rating. Though not the sort of rating that would occur in a testing/examination situation, it is nevertheless a kind of rating which no doubt occurs very widely round the world when ordinary practising teachers rate student compositions written for practice. The present study attempts to contribute to this line of inquiry by examining rater behavior in the context of pre-university EFL writing courses (see next chapter).
To conclude, we feel it is useful to present our research questions with some words about the rationale of each, based on the above review.
1. What do our English writing teachers perceive as good writing? This target the criteria teachers generally believe in as important for writing, and which will come into play on any rating occasion where they are supplied with no criteria. Indeed, they may also have an influence where criteria are supplied but are unclearly stated or clash with the rater's own belief. We discussed this briefly in 2.2.4.6 where we found little literature that attempted to access such beliefs as we aim to, as a feature of the rater's belief system accessible separately from what occurs in the process of rating.
2. What rating scale and criteria do the teachers claim to typically use when rating student writing, and what are their sources? We discussed scales and criteria in 2.2.3,
but inevitably in contexts where they are supplied. We found no studies which really tackled the issue of where raters obtain criteria, other than through training or having them imposed.
3. What training have the teachers received that is relevant to assessing writing and what are their views on such training? Training emerged as a common theme in the literature, but its effect was debated (2.2.4.7). We wish to examine what the teacher raters themselves think about this.
4. What are the most important qualities that the teachers look at in practice when they are rating their students’ samples? (criteria used and their weighting). This emerged in places in 2.3, in studies where these were not fully specified by the rating task specification (especially ones with holistic rating). However, we feel it is interesting to look at further and indeed compare with what we find for RQ1, for our unconstrained type of context. Furthermore, how elaborate a taxonomy will we find (cf. 2.3.1.5)? 5. How do the teachers support explain or justify their rating/scoring? This RQ covers how much and what sort of justification raters offer for the award of a mark or rating on one criterion, or when they combine individual judgments into a whole. Justification or an equivalent term occurred in almost all the models in 2.3 but does not appear to have been studied in depth. This question also entails consideration of what kinds of evidence they use (e.g. what they know about the writer, reading the text multiple times, comparing with other students...) which emerged in various places in this chapter (e.g. 2.3.1.5, 2.3.2.3). There is interest here in that some sources of evidence are available to our participants to use as justification that are not available in the usual situation of exam rating, such as knowledge of the individual writer.
6. What other kinds of comments do the teachers make, beyond those directly related to achieving the rating itself? As we saw in 2.3.1, various kinds of disparate 'other comments' have been identified in the literature, but it is unclear what a full list of such comments would contain or how far they can be seen as truly irrelevant to the evaluation itself.
7. What common sequence of activity can we find in the rating behavior of our teachers? E.g. Always read whole text first? Consider criteria in a certain order? We wish to see which of the models in 2.3.2 fits best the kind of data that we obtain in our less studied context, or whether just a three stage model is supported.
8. What distinct rating styles can we detect being used by our teachers? Rater variation per se, not necessarily related to rater experience etc., was evidenced in studies such as in 2.3.1.3 and in 2.3.2.1 with respect to choice of criteria and sequences followed. This is clearly an area that needs more illumination and is particularly relevant in a study such as ours where there is no imposed rating scheme.
9. Is there any difference in any of the above between individual teachers, according to their general training, experience of rating writing, prior knowledge of a specific rating system, etc. In 2.3 we reported studies that found or failed to find differences in rating due to such rater variables, especially training. Hence, we aim to supplement such findings with some from our study in a different type of context and see how far findings in the literature are matched