Análisis poblacional del distrito de Cajamarca

This section and the next concern stability and consistency of school value-added measures, respectively. Study of consistency and stability has formed an important and ongoing part of educational effectiveness research referred to as ‘foundational studies’ (Scheerens, 1993a, Teddlie and Reynolds, 2000). This strand of research has studied the properties of the school effect revealed through value-added analysis with a view to ‘resolving basic conceptual questions regarding the construct of school effectiveness’ (Scheerens, 1993a, p.17). For the construct to be meaningful, it must have certain properties, as (Scheerens, 1993a, p.21) explains:

As is discussed at length in Section 4.3, trying to understand the properties of a construct without knowing the validity of the measure is highly problematic. If a school value-added score is unstable (or inconsistent), does this reveal that the measure is heavily influenced by the numerous sources of CIV discussed above, or does it just indicate that the effectiveness of schools is highly changeable (or multifaceted)? This simple question underpins many of the issues of interpretation discussed in Section 4.3. Presumably, the truth is that both of

“The concept "school effectiveness" has connotations of duration and scope. That is, in order to call a school effective, high achievement levels should persist over time (stability) and effectiveness judgements should not be based on the functioning of just a partial segment of the total organization (scope).”

these are to some degree correct. If this is the case, this raises the question of how one judges when variation reveals information about the school effect and when it reveals something about validity. The next section considers the issue of interpretation further, drawing on the empirical evidence presented in this section on stability and the next (Section 4.2.5) on consistency.

Existing evidence has consistently shown a considerable degree of instability in value-added measures, although this has been interpreted in remarkably different lights by different researchers. Depending on your perspective and the particular dataset, value-added scores have a “fair degree of stability” over time (Teddlie and Reynolds, 2000, p.126); ‘broad stability in some areas’ but ‘also a substantial degree of change over time in some schools’ (Thomas et al., 1997, p.193); show ‘considerable stability’ in adjacent years but are ‘much more variable for larger periods’ (Thomas et al., 2007, p.277); are of little value for school choice as correlations over more than a few years are low and uncertain (Leckie and Goldstein, 2009), ‘are not particularly reliable or stable over time’ (Marsh et al., 2011, p.286), or are ‘almost entirely useless for practical purposes because [value-added] is not a consistent characteristic of schools’ (Gorard et al., 2012, p.8). Some of these differences presumably relate to the specific dataset, the application and the model specification. There also seems to be large differences in interpretation of correlation scores. Luyten and de Wolf (2011), for example, described correlations (between school mean raw scores) across consecutive years of 0.66 and 0.61 as demonstrating ‘considerable stability across years’. In contrast, Goldstein and Leckie (2008, p.68) state that “the correlation between school-effects for cohorts of children taking such exams 6 years apart is only about 0.6” (emphasis added), adding that, “In other words, exam performance now is a poor guide to performance in 6 years’ time.” While these are not exactly like-for-like comparisons in terms of what is being compared to what, we can nevertheless see clear differences of interpretation. Also, note that one would expect the difference in interpretation to be in the opposite direction if anything: Luyten compared raw scores which are generally found to be relatively stable across only two years (Luyten et al., 2005), whereas Goldstein compared school effects, which are generally less stable, certainly over a period of 6 years where one might expect larger changes in school performance.

Another difference in interpretation which is suggested by reading how researchers summarise correlation scores is whether a Pearson r correlation is considered in terms of the

percentage of variance common to both variables (i.e. R2). Gorard et al. (2012), for example, whose interpretation is that stability is low explicitly give the latter (followed by the former in brackets), most others make no mention of this distinction. Obviously these are two different ways of presenting the same substantive result, but this may make some difference in the substantive interpretation given that a Pearson r of 0.6 gives an r2 of 0.36. At first glance, these give markedly different impressions of the level of similarity between the scores. More profound differences in interpretation are discussed further in Section 4.3. The remainder of this section reviews the empirical evidence on the stability of VA measure, starting by looking at the English secondary CVA measure.

Two studies which have examined the level of stability in the English KS2-4 CVA measure (2005-2010) are Leckie and Goldstein (2009) and Gorard et al. (2012). Gorard et al. (2012) present the correlations of school CVA scores 1, 2, 3 and 4 years apart, finding correlations ranging from 0.58 to 0.79, 0.48 to 0.67, 0.56 and 0.46 respectively. These results show that, even 1 year apart, there is only a moderate correlation in school CVA scores. Gorard et al. (2012, p.7) reach the conclusion that the CVA scores appear to be ‘meaningless’. These correlations are in line with earlier results from Leckie and Goldstein (2009) who estimate correlations 1, 2, 3, 4 and 5 years apart as 0.80, 0.73, 0.57, 0.46 and 0.40, respectively. Note that these correlations (i.e. in Leckie and Goldstein, 2009) were from a model which included compositional variables (discussed further below).

There is also research pertaining to other measures, ages and school systems. This research has produced largely similar results to those regarding the English CVA measure. In general, school-level value-added scores at primary level (age 4-11) are found to be even more unstable than those at secondary level, as might be expected given smaller cohort sizes at lower ages. Dumay et al. (2013) looked at value-added performance of different primary grades across time for 1, 2, 3 and 4 years apart, finding correlations of 0.40-0.53, 0.40-0.43, 0.36-0.40 and 0.29 respectively. The only thing that was stable was that the vast majority of schools had ‘indeterminable’ effectiveness (Dumay et al., 2013, p.75). As they pointed out, this low level of stability ‘poses a significant challenge to the conventionally accepted view that we can make a generalized evaluation of how effective a school is, based on cross- sectional data from a single cohort’ (Dumay et al., 2013, pp.78-79). Similarly, research into systems other than England has found very low correlations in performance. Marks (2014,

p.14) estimated year-on-year correlations in VA performance in grades 5 and 9 as ranging from ‘from a very low 0.10 to 0.30 for Year 5 and from 0.16 to 0.50 for Year 9.’

Another study examining primary school stability – this time in Portugal - is Ferrão and Couto (2013). Ferrão and Couto (2013) analysed the sign of the value-added scores across the 3 study years, finding that 26% of schools had positive VA for all 3 years, 15% had negative scores for all 3 years, and 85% had the same sign for at least 2 adjacent years within the 3 years. Note that some level of consistency would be expected by chance alone (Gorard et al., 2012): figures of 12.5%, 12.5% and 75%, respectively. Correlations are not presented but from the scatter plots presented appear to be quite small. Ferrão and Couto (2013, p.186) conclude that “the findings reveal a systematic pattern of educational units’ performance is more than just randomness.” At this (rather low) threshold for value, they conclude that Portugal should include a VA indicator into its system of evaluation.

All of this suggests that value-added scores exhibit some degree of stability but that this is less than might be desired. Primary-level stability correlations are generally being estimated at less than 0.50 and often far lower than this. These are very low correlations in this context and mean that scores even 1 year apart typically show marked differences. Scores separated by several years bear hardly any relation to one another. At secondary level, the correlations are moderate and so the issue of whether these are meaningfully stable (i.e. reflective of a valid measure of school effectiveness) is more contestable. There are certainly grounds for serious concern with these levels of stability and a strong suggestion that the measures are appreciably comprised of measurement error and unobserved bias.

A problem in terms of generalising about stability is that differences appear to relate to the specific dataset and the model specification. There are important links which need to be made between stability, validity, data quality and model specification (Dumay et al., 2013). Regarding model specification, for example, Leckie and Goldstein (2009) show that not including compositional variables tends to inflate stability because bias carries through the scores over a number of years and so mean school intake achievement is a ‘major driver’ of between-school differences in later years. Consider this in light of the continuum (from raw scores to perfectly valid VA scores) considered above. Raw scores tend to be highly stable over time (Luyten et al., 2005, Dumay et al., 2013, Gray et al., 2001). This is because the characteristics of schools in terms of intake characteristics tend to be relatively stable and possibly also because schools aim for similar standards with successive cohorts, smoothing

performance - the ‘stable target hypothesis’ (Dumay et al., 2013, p.78). As one corrects for non-school factor bias, and moves along the continuum, two things happen: a) the measure is likely to become more valid (assuming appropriate specification) and b) the measure is likely to become less stable. Value-added is a residual after accounting for the effect of non- school factors. As further non-school factors are removed, measurement error and other sources of CIV become larger relative to the residual value-added. Moreover, as noted in the previous section, it is likely that more complex contextual models with too many and poorly theoretically grounded control variables may be over-correcting differences between schools, removing some genuine school effect (Willms, 2003, OECD, 2008, p.126). As Allen and Burgess (2011, p.253) note, “CVA is unstable because it results from fitting a complex model with many imprecisely measured parameters.” Similarly, Gorard et al. (2012) note variation in the model coefficients over time as well as stressing a number of problems with the quality of the underlying data, especially in relation to the measured contextual variables. This particular problem serves as an excellent example on the linkages between the issues discussed so far: the choices of model specification and the quality of the data have serious implications for both stability and validity.

In document Estructura socioeconómica y el crecimiento del espacio urbano de Cajamarca, 1990 - 2015 (página 86-101)