Los procesos asistenciales integrados como procesos estratégicos del Plan de Calidad del

1. INTRODUCCIÓN

1.5. El Plan andaluz de cuidados paliativos

1.5.1. Los procesos asistenciales integrados como procesos estratégicos del Plan de Calidad del

The history of current validity theory begins in the late 1800s with the birth of objective testing. With more than a hundred years of development, validity theory has evolved considerably, but at first the development was slow. The standard definition of validity in the first half of the twentieth century was that it was the extent to which a test ”measures what it purports to measure” (Garrett 1937:324). This was chiefly judged in terms of how well a test predicted the criterion that it was used to predict, and it was operationalized as a correlation coefficient between the test score and the criterion value. According to Angoff (1988:19), validation work was ”characteristically pragmatic and empirical, even atheoretical, and validity data were generally developed to justify a claim that a test was useful for some particular purpose.”

One reason why validity theory developed slowly initially was that validity was not considered a problematic theoretical concept. It was a technical quality related to test use. Since educational tests were mostly used to predict performance on some criterion, validation simply required that the test correlated with the criterion. Bingham (1937:214), for instance, defined validity as the correlation of scores on a test ”with some other objective measure of that which the test is used to measure”. Guilford (1946:429) expressed this in even more radical terms: ”In a very general sense, a test is valid for anything with which it correlates.” The meaning of the scores was not the main focus of interest in validation; the usefulness of a test to predict a criterion was.

There were some tests, however, for which it was difficult to find an external criterion against which to compare them. These included achievement and proficiency tests, which measured the level of skill that an individual had acquired. In his review of the early history of validity, Angoff (1988:22) explains that measurement experts such as Rulon (1946) considered these tests valid by definition. They were their own criterion, and

all the validity evidence that was needed for them was a review by subject matter experts to confirm that the content of the test was representative of the domain of skill being measured. In the first edition of his Essentials of Psychological Testing, Cronbach (1949) similarly discussed test content as a quality criterion in achievement testing. In validity proper, he distinguished two aspects, logical and empirical. Empirical validity involved correlation with a criterion. Logical validity was based on expert judgements of what the test measured, and what was sought was ”psychological understanding of the processes that affect scores” (p. 48). This non-empirical and non- behavioral side to validation was a pre-cursor of the theoretical development in the latter half of the century.

Instead of validity, it was reliability that measurement theorists focused on. Classical test theory defined reliability as accuracy and consistency of measurement or the relationship between people’s observed scores on a fallible test and their true scores on an ideal, error-free measure of what was being tested. Reliability, like validity, was a correlation coefficient. It also defined technically the upper limit of the validity coefficient. (Angoff 1988:20, Henning 1987:90.) This was because validity involved the relationship between the “true scores” and the criterion rather than the test scores, which always included measurement error. Furthermore, the validity coefficient could only reach the upper limit of the reliability coefficient if the desired “true score” and the criterion were identical. Since this could very rarely be the case, validity would tend to be lower than the reliability coefficient – all the more so because the indicators for the criterion were also likely to include measurement error.

The kinds of language tests that were developed when the psychometric notions of reliability and validity were first being formed were a new breed of ”objective” tests. Spolsky (1995:33-41) says that the rise of the objectively scorable discrete-point test was a response to criticisms against the unfairness of the traditional essay examination. He cites Edgeworth’s (1888) criticism of the ”unavoidable uncertainty” of these examinations as an important motivation for the development, and mentions objectively scorable spelling tests and Thorndike’s (1903) work on the development of improved essay marking scales as important early responses in the area of language testing. A unifying theme when new tests and marking systems were developed was the desire to be fair to test takers through an improvement of the reliability of the tests.

Spolsky (1995:42-43) states that the work on the form and content of objective language tests was guided by four concerns: validity, reliability, comprehensiveness, and administrative feasibility. In practice, he reports,

administrative feasibility sometimes won over all the other concerns. Speaking and writing were considered important aspects of knowing a foreign language, but they often came to be excluded from large-scale language test batteries because it was so difficult to develop objective scoring systems for them. Hence, most objective language tests tested vocabulary, grammar, reading, and listening through multiple choice and true-false items. However, considerable effort was also spent on scales for rating written composition (Spolsky 1995:44-46). Where speaking was tested, the testing boards attempted to ensure reliability through using a board of examiners and investigating inter-rater reliability.

In sum, the early focus of validation was on the prediction of specific criteria, which later validity theory termed criterion-related validity. The content of a test was considered to be relevant proof of its validity when no obvious criterion existed for the evaluation of the test. Reliability was a prime concern, and a necessary condition for validity. The concern for the prediction of the criterion lives on in the modern version of validity theory, but it is not as central as it was earlier. Content concerns are similarly included, but they are now considered relevant for all tests, including the ones that are used to predict future performance. Reliability continues to be important for test evaluation, but it is not as clearly separable from validity in modern psychometrics (see e.g. Moss 1994:7, Wiley 1991:76 for arguments on the desirability of this development). The early focus on the usefulness of tests for specific intended uses continues to the present day.

In document Validación de la clasificación de complejidad del proceso asistencial Integrado de cuidados paliativos andaluz (página 54-56)