5. MARCO TEÓRICO Y CONCEPTUAL
5.1. FUNDAMENTACIÓN TEÓRICA
5.1.2. Subsistema de Provisión de Recursos Humanos
The core research question is as follows:
Are school value-added measures valid measures of school effectiveness?
Where ‘school effectiveness’ is operationally defined by the value-added method as the relative effect of schools on measured outcomes. This is examined here in terms of the estimated size of the residual school-level differences. Note that the term school effect is frequently used elsewhere (e.g. Luyten, 2003) to describe the overall variance attributable to schools relative to all other non-school factors (also see Willms, 2003). In this alternative use of the term, subordinate levels such as the classroom may or may not be encompassed within the term, depending on the purpose. Also note that school value-added measures are viewed here in terms of being an estimate of causal effects of schools rather than merely as school-level unexplained differences in performance. It is the interpretation of school value- added scores as a causal estimate of school effects which raises the question of validity. In
sum, a school value-added measure is valid to the degree that it captures the relative causal effect of the school and therefore can be used to draw conclusions about a school’s performance relative to other schools.
5.2.2 Primary Research Questions
The core research question is broken down into the following primary research questions: Table 5.2.2a - Primary Research Questions
Study 1 – Biases and Error
RQ 1.1 Are there observable biases in the current English value-added measure? RQ 1.2 What is the level of missing data in the National Pupil Database?
RQ 1.3 What is the influence of measurement error on value-added scores? Study 2 – Inter-Method Reliability
RQ 2.1 How similar are estimates of effectiveness produced by value-added (VA), cross- sectional regression discontinuity (RD) and longitudinal regression discontinuity (LRD) designs?
Study 3 – Stability over Time
RQ 3.1 How stable is the current English value-added measure across several years? RQ 3.2 Is the rate of stability in value-added scores associated with school
performance?
RQ 3.3 How stable is the contextual value-added performance of a given cohort over time?
Study 4 – Consistency across and within Cohorts
RQ 4.1 How consistent are value-added estimates of performance across cohorts from within a single school in a single year?
RQ 4.2 How consistent is performance within cohorts?
RQ 4.3 Does within-cohort consistency vary by mean school performance?
These questions form the primary questions for each study. Within each study there are also supplementary questions which provide key supporting information as well as more fine- grained questions which address specific issues or aspects of the data which are used.
5.2.3 Analytical Approach
At the start of this chapter (Section 5.1.1), it was asserted that the best practicable way to examine the validity of value-added is to draw on several types of indirect evidence. This section provides further justification and explanation for this assertion and so links the core research question to the primary research questions for which empirical evidence is provided. Two issues are discussed: first, the combination of various sources to reach conclusions about validity. Second, the interpretation of inconclusive, indirect or ambiguous evidence. These issues underpin the approach taken in the four empirical studies as well as how the findings are dealt with in the discussion section to reach overall conclusions.
Combining Validity Evidence
Each study and analysis follows what is intended to be a self-contained, self-explanatory argument pertaining to a property of value-added evidence in the given context. To a large extent, therefore, the four studies presented here and the analyses and sub-analyses within them could be considered a number of discrete issues which happen to be grouped by a common topic (properties of value-added). Despite this, the results are understood to form a larger picture of value-added measures and their methodology, rather than merely being a series of independent findings so it is useful to briefly comment on how these analyses and findings can be brought together to reach more general conclusions.
The basic approach to address all research questions has been to aim for a concrete and objective answer within a single analysis using the highest quality data and most robust research design available. Often, however, this is not sufficient to provide a definitive answer to the specific research question or the core question of validity. As this is the case, multiple sources of evidence are brought together with a view to building coherence between a body of empirical findings and a theoretical understanding (Kvanvig, 2008). By looking across studies rather than within them it is possible, for example, to examine whether instability over time is likely to have been caused by inconsistency in the performance of cohorts, changes in school performance, measurement error or changes in the value-added model. While the evidence of any single analysis may fall short of being conclusive, often results from other studies will be able to narrow down the possible explanations for the properties of value-added observed (and whether these are likely to be error or effect). In order to do this, the research questions have been designed to complement each other, addressing the
questions raised by other analyses. Most analyses within the studies were conducted around the same time, only being separated into the four main studies afterwards for purposes of clear presentation. Note that many of the research questions relate to properties of school value-added rather than to directly inform validity (see Chapter 4, Section 4.2.1 for discussion of direct and indirect validity evidence).
Bringing together all findings across all studies and discussing the evidence in terms of validity (rather than properties) is a task undertaken in the discussion and conclusion chapters, where the results are considered collectively in order to reach conclusions about the validity of value-added which can be reconciled to the greatest possible extent with the findings of all four studies. The conclusions could be described as being underpinned by a coherentist approach to justification (Kvanvig, 2008, Little, 2013), where explanations for variance are considered alongside the collective results to develop an evaluation of validity which is most consistent with all of the available evidence.
Dealing with Inconclusive, Indirect or Ambiguous Evidence
While the approach to justification described above will generate some general conclusions about the validity of value-added, it is unlikely that highly specific conclusions are possible with the evidence available. As well as the interpretative difficulties discussed in Chapter 4, there are difficulties generalising from the specific data and value-added models used to value-added evidence more generally (see Chapter 2). Rather than stop at general conclusions based on findings which are inescapably ambiguous to some degree, the discussion chapter shifts the consideration of the results to the various areas of use across research policy and practice which have been reviewed (Chapter 3).
As an example, consider the issue of stability. A specific conclusion on reliability cannot be reached as there are (at least) two unknowns: the reliability of value-added and the changeability of school performance. While it might be possible, drawing on various other results, to get an indication of the extent to which these factors are driving the observed level of stability, a precise answer is not possible. But rather than leave this problem open, it is pursued to the practical context of decisions where, even if conclusions about the validity of value-added evidence cannot be reached, conclusions about the validity of arguments for the use and interpretation of this evidence can (Kane, 2013). This approach bears some similarity with that taken in research of Leckie and Goldstein (2009) and Allen and Burgess (2013).
Both of these studies examine performance data in relation to a specific use (parental choice of school). One can construct the following pragmatic argument: for a measure to be of value for purposes of school choice, it must be able to provide a meaningful estimate of future school performance. In this practical context, if the measure is not sufficiently stable to provide a meaningful measure for purposes of school choice, it is arguably of lesser importance whether this is due to the measure validity or phenomenon stability. It can simply be concluded that the uses and interpretations which are demanded by the particular context are or are not valid given the properties of the measure. In the case of the study by Leckie and Goldstein (2009), for example, the measure is found to be insufficiently stable over time and have too much statistical uncertainty for it to be a meaningful measure for purposes of parental choice. Allen and Burgess (2013) estimate the extent to which several performance measures improve choice of school compared to a random choice and concludes that the measures (including the value-added measure) improve on the random choice. Both of these cases are able to reach a concrete conclusion in relation to a practical application of the measure without needing to address more fundamental questions about the underlying source of the observed variance.
The value of this pragmatic position is that it allows more fine-grained validity conclusions, despite ambiguous and inconclusive results. This approach can be taken with all applications of value-added where the general idea is to look at the practical application of value-added evidence and consider the requirements for beneficial use of the measure under these circumstances.