• No se han encontrado resultados

Supuestos en que se cometen y sus sanciones

•PAMA

2.5. Infracciones relacionadas con la obligación de presentar documentación y declaraciones, y sus sanciones, artículo 184 de la Ley Aduanera

2.5.1. Supuestos en que se cometen y sus sanciones

A test score is arrived at by giving the test takers a number of items or tasks to do. The responses to these items are scored, usually as correct (1) or incorrect (0). The number of correct responses for each individual is then added up to arrive at a total raw score.

Imagine that I have a test with 29 items (I really do – but I’m not going to show it to you yet!). Each item is scored as correct or incorrect, and I ask 25 language learners to take the test. Each person responds to each test item, and I then add up the number of correct answers for each test taker. The scores for my hypothetical language learners are presented as follows, from the lowest to the highest. Two learners only managed to answer one item correctly, and one learner answered 28 correctly. The rest are spread between these two extremes.

1 1 2 3 5 6 6 7 8 10 10 11 11 11 13 13 14 15 15 16 17 18 25 27 28

We can present these scores visually in histograms like the one in Figure 2.3. This tells us how many learners achieved a particular score. With this very small group of learn-ers the most frequent score is 11. The most frequent score in a distribution is called the mode. We are also interested in the score that falls in the middle of the distribution, called the median score. If we had 24 students, the middle score would be 24 / 2 = the twelfth score. In this case that would also be 11. But we have 25 scores, so we take the twelfth and thirteenth score, add them together and divide by 2 to get the middle score.

The twelfth score is 11 and the thirteenth score is 11, and so we have: 11 + 11 = 22; 22 / 2 = 11. Both of these figures tell us something about the mid-point of a distribution. We also know something very basic about our distribution, because we can see the range of scores from 1 to 28. This is a wide range, so we can infer that our group of learners are at very different levels of ability and that the test spreads them out reasonably well.

The most useful description of the mid-point for norm-referenced tests is the mean.

This is calculated by adding all the scores together and dividing the total by the number of test takers. When we add the scores together we get the sum, which is represented

by the Greek capital sigma (Σ). If I add together the 25 numbers above, Σ = 293. This is really the correct number of responses scored by all students on the test. When we divide it by the number of students we have 293 / 25 = 11.72. This is fairly easy, even if the formula for the mean looks difficult at first sight:

X– = ΣX

N

This is read as follows: X bar (the mean) equals the sum of X divided by N. Each indi-vidual score is an X (X1, X2, X3, Xn) and N is the total number of test takers. The mean is the most important measure of the centre of the distribution of test scores because it allows us to calculate a much more useful description of the distribution of scores than the range, called the standard deviation. We will look at how this is calculated first. Then we will consider the formula, and finally consider the use of these descriptive statistics in language testing.

In our example, we know that the mean score on the test is 11.72. The mean has one very important property that the mode and median do not have. If we take away the mean from each of the individual scores we get a deviation score from the mean, and the Fig. 2.3. A histogram of scores

0.00 8

6

4

2

0

5.00

Frequency

10.00 15.00 Scores

20.00 25.00 30.00

Putting it into practice 39

mean of these scores is always zero. We illustrate this in Table 2.1. The first column con-tains the scores for each of 25 test takers (X1–25). The second column contains the mean, which is of course the same in all rows. In the third column we subtract the mean from the score, giving the deviation score. This number shows how far an individual score is away from the mean, and may be a negative or positive number. If we add up the scores above zero, add up the scores below zero, and subtract one number from the other, the answer will always be zero. The final column is the square of the deviation score. That is, we multiply the deviation score by itself. On many calculators this function is achieved by pressing a button marked X2 after inputting the deviation score. So with a score of 10 and a deviation score of –1.7, 1.7 × 1.7 = 2.9. We can then add up all of the squared

From this table we can work out the standard deviation with the help of the following formula.

SD =

Σ(X – XN – 1)2

This formula states that the standard deviation is the square root of the sum of the squared deviation scores, divided by N – 1. From our table we know that the sum of the squared deviation scores is 1345.08 and N – 1 is 24; 1345.08 / 24 = 56, and the square root of 56 = 7.5. Our standard deviation is therefore 7.5.

What can we do with this information? We can place our figures back on to a curve of normal distribution as follows (Figure 2.4).

The mean (zero) is in the centre. We now know that each standard deviation = 7.5, and so for each standard deviation (marked on the diagram as –3sd to +3sd), we add or subtract 7.5. So, for example, the score we would expect at one standard deviation above the mean = 11.72 + 7.5 = 19.22 (for convenience we will call it 19.2 and round to just one decimal place). You will notice that the score at +3sd is not possible, as our test only has 29 items. Similarly, it is not possible to get a score at –2sd, as this would be a nega-tive score. This is rare, and indicates quite a serious problem with this particular test.

But we will return to this later. The most important observation at the moment is this:

if a learner scores 19.2 (again impossible – but close to 19) we know that approximately 15.86 per cent of the test takers are expected to score higher, and 84.12 per cent of test takers are expected to score lower. We know this because of the probability of scores occurring under the normal curve. The meaning of the score is therefore its place on the scale measured in standard deviations.

However, we can be even more accurate than this. We do this by transforming the raw score – the number of right answers – into a new kind of score called a z-score. A z-score is simply the raw score expressed in standard deviations. So, if your score on the test was 11.72, you would in fact score zero. And if your score was 19.2, you would score 1. There is a very straightforward formula for transferring any raw score to a z-score:

Fig. 2.4. The curve of normal distribution with raw scores for a particular test Percent of

Putting it into practice 41

Z = X – X– SD

We read this as: Z equals X (the raw score) minus X bar (the mean score) divided by the standard deviation. If my score was 11.71 on the test, we can see what this means:

11.7 – 11.7 = 0; 0 / 7.5 = 0 Similarly, let’s try it for 19.2:

19.2 – 11.7 = 7.5; 7.5 / 7.5 = 1

We can do this for all scores. The lowest score in the range for our test was 1. The z-score is:

1 – 11.7 = –10.7; –10.7 / 7.5 = –1.43 (standard deviations below the mean) The highest score, on the other hand, was 28. The z-score is:

28 – 11.7 = 16.3; 16.3 / 7.5 = 2.17 (standard deviations above the mean) To find where a test taker with any z-score stands in relation to all other test takers, all we need is a table that tells us the percentage of test takers we would expect to be higher, and the percentage would expect to be lower. You will find a copy of the table of z-scores in Appendix 1. Refer to it now, as you read the following explanation.

In the first column of the table you will find the z-score with the first decimal place.

Along the first row of the table you will find the second decimal place of the z-score. We will consider our highest score first, which is 2.17. In the first column read down to 2.1, which is at the 22ndrow. We then read across the top of the table to find the column headed 0.07. We then find the intersection of the row and column. The number in this cell is the percentage associated with a z-score of 2.17. The entry in this cell is .4850, or just 48.5 per cent (move the decimal place two places to the right). This number can be read as the percentage of test takers falling between the mean (zero, or 11.7) and the score of 28 (or 2.17). Also remember that 50 per cent of the scores are below the mean. Therefore, 50 per cent + 48.5 per cent = 98.5 per cent of test takers are expected to get a score of less than 28. Our test taker who scored 28 is therefore in the top 2 per cent of the population. We can see this visually on our curve of normal distribution (Figure 2.5).

What about our raw score of 1, giving a z-score of –1.43? If you read down the left-hand column to the row marked 1.4 and then find the cell along that row under the column 0.03, you will find the number .4236. Rounded to two decimal places, this means that 42 per cent of scores fall between the mean and this score. Of course, as the

score is negative, this means that all these scores are better than this score. As 50 per cent of scores fall above the mean, 42 per cent + 50 per cent = 92 per cent of scores are expected to be higher than the score of this test taker.

‘But,’ I can hear you saying, ‘this means that 100 per cent – 92 per cent = 8 per cent of test takers are expected to get scores below –1.43, but in raw scores the only lower score is zero, and no one got all the questions wrong! And even if someone did, we wouldn’t expect 8 per cent of test takers to get all the questions wrong.’ And you would be right to point this out. We said above that there is (at least) one problem with the test from which these scores came, and now we know what it is. It is far too difficult for the popu-lation of test takers for whom it was designed. It just isn’t sensitive enough at the lower end of the scale, and so we are told to expect test takers below a level at which they can actually score. The authors of the test admitted that this was the case. It represents a serious flaw in the test design.