ÍNDICE DE ABREVIATURAS UTILIZADAS EN EL TEXTO Y EN LAS FIGURAS
NADPH OXIDASE
Jonson and Plake (1998) conducted a study into the relationship between validity theory and actual validity practices. Their design was longitudinal: they focused on the ways in which the validity standards in five versions of the AERA/APA/NCME Standards were implemented in the evaluation of one test, the Metropolitan Achievement Test (MAT), in successive editions
of the Mental Measurement Yearbooks (MMY) (12 editions altogether, starting with Buros 1938). In the course of 57 years, the MAT had been reviewed in the Yearbooks eight times. Jonson and Plake developed a matrix of the range of the validity standards mentioned in the successive editions of the Standards and compared this with the standards which the reviewers had applied when reviewing the MAT.
Jonson and Plake (1998) operationalized the different versions of the
Standards into two long lists of classes of requirement, one concerning content validity and the other concerning construct validity. Under construct validity, one of the categories which they identified and analysed was called the test’s construct framework. The first edition of the
Standards from 1954 had required of test developers an outline of the construct theory. From the next edition (1966) onwards, the Standards
called for full statement of the theoretical interpretation and distinction from other interpretations. An account of the network of interrelationships [between different constructs] had been introduced in 1974, and appeared in all the standards from then onwards. In other words, the requirements in the Standards regarding the test’s construct network are quite substantial and they have been included in this professional code of practice for a long time.
Nevertheless, when Jonson and Plake (1998) analysed the successive reviews of the Metropolitan Achievement Test, they found no mention of the construct framework categories, either as a validity criterion mentioned by the reviewers or as a category of evidence presented about the Metropolitan Achievement Test. What is more, the discussion and conclusion in the article does not draw any attention to this result at all. My interpretation is that researchers and practitioners in educational measurement find the test construct a difficult concept to deal with, and their answer often appears to be to leave it alone and deal with something else.
I find this contrast between a stated need for construct definitions in theory and the apparent lack of demand for them followed by the apparent lack of actual working definitions of test-related constructs both intriguing and disconcerting. I think construct definitions should be written and used in test development and validation. In this chapter, I will discuss a range of theoretical and empirical approaches to defining constructs for language tests. I will analyse them as alternative, possibly complementary ways for language test developers to describe what their tests are testing.
The contrast between demand and non-supply of test-related construct definitions is closely related to a paradoxical contrast between
psychometric, quantitative quality indicators for the measurement properties of tests and the verbal, theory-based quality indicators for the conceptual coherence of the assessment system. The psychometric quality of a test is very important because it indicates the degree to which the observations have been and can be expected to be consistent. The consistency can be associated with performance variation, of course, provided that the conditions of variation can be specified on theoretical grounds. The reliability of scores is important when decisions are based on differences between scores: accountability requires that the test developers must be able to say which differences are “real”. However, both of these considerations also entail the ability to say what the scores and the differences between them mean. This requires verbal definitions that are ultimately based on theory and empirical evidence about how the definitions are related to the test, the testing process, and the scores. To clarify the range of variables that theoretical construct definition involves, I will discuss the components or variables that an interactionalist definition assumes to underlie performance consistency. I will also present a model of the features of an interactive testing process, which can help researchers analyse dimensions that can vary when the testing and assessment situations are interactive and potentially variable rather than pre- determined.
4.3 Factors underlying performance consistency: the interactionalist view
There are potentially a vast range of alternatives for explaining any performance consistencies observed in language tests. Chapelle (1998), following Messick (1981, 1989a), discusses three different theoretical perspectives. Trait theorists “attribute consistencies to characteristics of test takers, and therefore define constructs in terms of the knowledge and fundamental processes of the test taker” (Chapelle 1998:34). Behaviorists, in contrast, attribute consistencies to context. They define constructs “with reference to the environmental conditions under which performance is observed” (Chapelle 1998:34). Interactionalists see performance consistencies as “the result of traits, contextual features, and their interaction” (Chapelle 1998:34). Thus, in order to describe ability, interactionalists would consider it necessary to define the types of knowledge and fundamental processes that the individual has in relation to different contexts and as they interact and vary in response to different contexts. Citing Hymes 1972, Canale and Swain 1980 and Bachman 1990,
Chapelle (1998:43-44) argues that there is strong theoretical support for the interactionalist view in current language testing theory, concerned as it is with individual factors, contextual factors and their interaction. This became evident in Chapter 2 through the range of definitions that theorists considered it necessary to define in test specifications. Chapelle (1998:47) warns that the challenges of this perspective are considerable, however, because it combines two philosophies that locate the explanation of consistencies in different parts of an interactional world: one with the individual across situations, the other with situations or contextual characteristics across individuals. The combination requires the analysis of an individual’s abilities in interaction with different contexts. To explain or even detect performance consistency in such a complex network is a complex task. Chapelle (1998:52) illustrates the interactions between its variables with a figure that I will reproduce in Figure 1.
Figure 1 illustrates the range of factors that are required in an interactionalist model of construct-test relationships. Similarly to trait theories, the learner-related factors of language knowledge and fundamental processes are included, but because the learner’s interaction in different contexts also has to be modelled, it is necessary to assume that the learner uses strategies to facilitate the interaction and that in addition to language knowledge she also needs world knowledge that varies by situation. Contextual factors are detailed in Chapelle’s model with the help of Halliday and Hasan’s (1989) theory of context. Their concept of field refers to the locations, topics and actions in the language use situation, tenor includes the participants, their relationship and objectives, and mode includes a definition of the communication mode through channel, texture and genre of language as it is contextualised in the situation. For the analysis of the settings where performance consistencies are sought in tests, Chapelle incorporates Bachman and Palmer’s (1996) task characteristics of rubric, input, expected response, and relationship between input and expected response. Chapelle (1998:57) points out that in an interactionalist definition, task characteristics cannot simply be dismissed as error or undesirable construct-irrelevant variance. Instead, researchers and testers must consider some of the contextual variables of test tasks as relevant to the interpretation of performance consistencies.
The approach or model to which a theoris t or a test developer adheres is important because it “encompasses beliefs about what can and should be defined, how tests should be designed, and what the priorities for validation should be” (Chapelle 1998:50). The richness of Chapelle’s
Figure 1. Factors in an interactive model of language testing (Chapelle 1998:52)
framework for the evaluation and analysis of influences on a testing event may be daunting and all of them cannot be operationalized in an intentional way in a single study or test instrument, but the advantage of the richness is that it enables test developers to focus on different facets in test-construct relationships. At the same time, the complexity presents a warning against simple interpretations of data. In the rest of the thesis, I will use this model as an organising framework to identify areas in which test developers and researchers in language testing have worked.
World Knowledge Strategic Competence Language Knowledge & Fundamental Processes
Field Tenor Mode
Environment Relationship
of I and ER
Performance Consistency
Rubric Input (I) Expected Response (ER) Learner factors Test task characteristics Contextual factors