2) Pasajeros de origen extranjero que arriben a territorio nacional con mercancías de procedencia extranjera en casa rodantes o embarcaciones
1.19. Aspectos importantes relacionados con los supuestos de embargo precautorio
Where does this leave us? This chapter has explained how tests can be used for social justice and the enhancement of meritocracy. This is part of what is termed consequential validity. But the very act of testing has unintended consequences, many of which are difficult to control and can be very harmful. On the other hand, tests have been used for as long as we can tell as tools to control teachers and educational systems to deliver the kind of society and economy envisioned by the powerful. The critics of testing and assessment analyse the social evils associated with them, but offer no way out.
One person who does offer us another view is John Stuart Mill. In his discussion of testing (Mill, 1859: 118–119) he argues that tests are essential to achieve meritoc-racy in democratic societies. He believed that the primary purpose of testing was to ensure that learners acquired the basic skills needed to participate fully in their society.
Literacy and language skills were seen as extremely important within this context. We must remember that in Mill’s time suffrage was very limited. Mill saw the growth of national educational systems with assessment at key ages as preparing for universal suf-frage through the production of critical, socially aware individuals, capable of making decisions about their own lives and how they wished to live (Mill, 1873: 257). Aware that tests could be used to control educational systems and indoctrinate individuals, Mill established three principles that place limits on what can be done with tests. The first relates to who makes tests, and thus to who is able to make judgements about which knowledge is valued. The second relates to test content and what may not be tested. The purpose is to avoid using tests for ideological purposes. The third relates to test use and the kinds of decisions made about people on the basis of test scores.
The first principle is that the tests and their content should not be controlled by the state or ministry of education, but by an independent authority. Mill does not discuss how this authority should be constituted, but it is clear that he sees the involvement of teachers as professionals contributing to such decisions. In this, Mill was hugely in advance of his time. The problem today, of course, is that governments and transna-tional institutions are often incapable of seeing that many of their policies on testing are not aligned to Mill’s enlightened approach, but reflect a controlling agenda. They therefore ignore the warnings of democratic educationalists whose touchstone is ‘what-ever the exact character of the built-in safeguards, the best Ministry of Education is that which interferes least in the operation of the system’ (Cecil, 1971: 4). For Mill, the role of a Ministry was merely to provide the infrastructure for the system to operate.
The second principle is that no test should ask questions that require the test taker to hold views or consent to believe anything which would require commitment to a
‘disputed view’. A disputed view would be anything upon which the individual might reasonably come to hold some view that would result in getting the answer on the test incorrect unless he agreed with the principles or world view of the test creator. For Mill this was not just bad test design, but ethically reprehensible. As he says: ‘The knowledge for passing an examination (beyond the merely instrumental parts of knowledge, such as languages and their use) should, even in the higher classes of examinations, be con-fined to facts and positive science exclusively’ (Mill, 1859: 119).
The third principle is that the state should not take upon itself the task of saying which qualifications are recognised and which are not. This specifically puts a limitation upon the tendency to use tests and qualifications in gatekeeping, unless those concerned with specific professions and the public require the demonstration of specific skills or abilities for particular roles. It would certainly rule out the use of language tests as sur-reptitious restrictions on immigration, for example (McNamara, 2005). Mill puts it refreshingly in this way: ‘public certificates of scientific or professional acquirements, should be given to all who present themselves for examination, and stand the test; but that such certificates should confer no advantage over competitors, other than the weight which may be attached to their testimony by public opinion’ (Mill, 1859: 119).
These three principles provide a basis for an analysis of the current use of externally mandated tests in general, and language tests in particular. Although there is no direct
Validity 19
reference to Mill, Shohamy’s (2001b) notion of democratic assessment would utilise these principles, and particularly the inclusion of stakeholders in the discussion of test use. Shohamy also emphasises the responsibilities of test developers for their fair use, and the rights of test takers to be treated as valued individuals. Each of Mill’s principles is concerned with justice and freedom for the individual in the testing process. Fulcher and Davidson (2007) deal with the same issue by claiming that test developers should explicitly state who a test is designed for and what its intended impact upon them is.
If this intended impact is acceptable after democratic consultation, the test is designed with the effect in mind:
The task for the ethical language tester is to look into the future, to picture the effect the test is intended to have, and to structure the test development to achieve that effect. This is what we refer to as effect-driven testing.
(Fulcher and Davidson, 2007: 144)
Language testers have attempted to deal with their ethical responsibilities through open debate and the creation of an agreed Code of Ethics and Guidelines of Practice. These are published by the International Language Testing Association (ILTA), and are freely available from its website (www.iltaonline.com). Establishing shared understandings of what is and is not ethical within the context of a professional organisation is part of the process of the professionalisation of any body of practitioners (Davies, 1997). There are also Codes of Practice that cover testing and assessment generally, the most important of which is the Standards for Educational and Psychological Testing (AERA 1999). These Codes, Guidelines and Standards do not constrain the practice of language testing pro-fessionals in ways that are clear-cut and easily applied in every circumstance. However, they bring to bear principles for ethical test practices.
w 10. Validity
The codes and guidelines all place the concept of validity at the centre of the testing enterprise. It is the concept of validity that guides our work in testing and assessment.
What is validity? Until 1989 the same definition had been echoed down the decades.
This is taken from Ruch (1924: 13):
By validity is meant the degree to which a test or examination measures what it purports to measure. Validity might also be expressed more simply as the ‘worthwhile-ness’ of an examination. For an examination to possess validity it is necessary that the materials actually included be of prime importance, that the questions sample widely among the essentials over which complete mastery can reasonably be expected on the part of the pupils, and that proof can be brought forward that the test elements (questions) can be defended by arguments based on more than mere personal opinion.
With this traditional definition, the key validity question has always been: does my test measure what I think it does?’ If the evidence suggests that it does, the responsibility of the test developer is at an end.
However, since Messick’s (1989) work, our understanding of validity has changed. It is now seen as a single concept, with a number of different facets, or aspects. The notion of consequential validity extends the possible responsibility of the test developer to all uses of the test. It raises the question of the extent to which the score is relevant and useful to any decisions that might be made on the basis of scores, and whether the use of the test to make those decisions has positive consequences for test takers.
The question of relevance and usefulness relates to whether it can be shown that the inferences we draw from a test score about the knowledge, skills and abilities of a test taker are justified. This is the substantive aspect of validity that replaces the traditional definition in the quotation above.
Next is the structural aspect, which is closely related to the substantive aspect. If we claim that a test provides information on a number of different skills or abilities, it should be structured and scored according to the skills and abilities of interest. Thirdly, the content of the test should be reasonably representative of the content of a course of study, or of a particular domain (such as ‘aviation English’ or ‘travel Spanish’) in which we are interested. We often wish the test score to be meaningful beyond the immediate questions or tasks on a particular test, as we cannot put all content, situations and tasks on any test; it would simply be too long. So the fourth aspect is generalisability of score meaning beyond the test itself, or whether it is predictive of ability in contexts beyond those modelled in the test. Finally there is the external aspect, or the relationship of the scores on the test to other measures of the same, or different, skills and abilities. We would hope that tests of a particular skill would provide similar results. Convergence gives us more confidence in the test outcomes.
Our interest in validity is all about trying to build tests for which there is a strong link between inferences and decisions, and ensuring that test use has a positive impact on people and institutions. Whether the test is for use in the classroom, or for large-scale administration, we need a convincing argument that it is useful for its purpose (Kane, 2006).
People engaged in language testing do believe that tests can be used to make fair deci-sions, and that classroom assessment can inform teaching and learning. Yet, we could easily fall into a counsel of despair when we see how tests of all kinds have been used in society. The practice of testing is itself a social construct. It is a practice invented by humankind to make difficult decisions, and to shape educational practices and institu-tions. Testing has been used to achieve goals of control and manipulation, and has been used to provide opportunities to those who would otherwise have none. Like all social constructs, it can be used for good or ill.
The rest of this book is about how to design and build tests, and how language teach-ers and testteach-ers can develop practices to use testing and assessment to good effect. With a clear definition of test purpose, and a vision of the effect that we wish our test to have, planned intention can inform the decisions we make along the way.