• No se han encontrado resultados

Contrastacion de los resultados con estudios similares Respecto al objetivo general

DISCUSIÓN DE RESULTADOS

6.1 Contrastacion de los resultados con estudios similares Respecto al objetivo general

Although similar EGRA instruments may be used for impact assessment and for other aims (such as national testing or classroom diagnostics), impact assessment demands distinct methodological considerations and particular caution in the interpretation of results. For example, during the development and piloting of instruments, parallel (and statistically equated) versions must be developed to ensure that baseline and endline tests are equivalent. Also, impact evaluations require careful attention to sampling methodologies to ensure that comparisons between control and treatment groups and between pre- and post-intervention assessments are reliable. Other EGRA uses—for example, taking a national snapshot that compares socioeconomic subgroups and regions—also apply sophisticated sampling frameworks. To be confident that changes in reading skills can be attributed to an intervention, however, evaluators using EGRA must also ensure that the sampling for the endline data collection precisely matches the sampling used for the baseline.

Proper randomization can be a challenge for all impact evaluations, but can be particularly complex in a developing-country context. Randomization into equivalent comparison groups depends on maximizing the completeness of information. The implementers of an impact evaluation need a list of all the schools (or children, or teachers who make up the population of interest) to assign them randomly into treatment and control groups. With near-complete information, in fact, random selection will capture most of the representative characteristics of the target population and the sample can be stratified and/or clustered to ensure that particularly important features are equally represented across treatment and control groups (and to reduce data collection costs).

However, reliable information is not always readily available. This was the case in Mali, where the research design called for identifying a pool of schools that fit certain eligibility criteria (such as language of instruction and accessibility) and then randomly assigning those schools into treatment and control groups. The ministry did not have a data set that would enable the researchers to be sufficiently confident in the representativeness of such a pool—in other words, we did not have information that was adequate and valid

for randomized selection. In this situation, we needed to take compensatory (and very expensive) steps to overcome the gaps in data, such as undertaking fieldwork to collect such information and identifying alternative data sources.

When EGRA is used as a snapshot or in national approaches (see

Chapters 1 and 2), results may point to elements of fundamental reading skills that are not addressed in the current education system; alternatively, they may point simply and dramatically to extremely low levels of reading overall. Responding to EGRA findings, policy makers and educators can revise policy, curricula, and/or lesson plans to strengthen identified weaknesses. They can sometims get to the core of the problem by claiming curricular or scheduling space for teaching reading and providing appropriate reading materials.

In contrast, when EGRA is used for impact evaluation, the focus is

typically on determining whether a specific program or intervention improves performance in a particular context. Although the links between specific techniques and outcomes may seem apparent and likely, they cannot be established without a more complex research design than those described for South Africa and Mali. For example, a notable improvement in comprehension skills in South African treatment schools cannot be assumed to be a result of the progressively leveled stories developed for the SMRS intervention because

we made no effort in the evaluation design to measure this specific effect.

Overall impacts should still alert policy makers and educators to areas for improvement, but to isolate the impacts of components of an intervention, evaluators must specifically structure the research design for this purpose. They must then randomize each component of the approach—an expensive proposition. Unless an impact evaluation’s research design is structured to isolate and test different parts of an intervention, any changes in outcome between the baseline and endline assessments are attributable to the intervention as a whole (assuming, of course, that control and treatment groups were also subject to the same exogenous changes). For an example of a research design that isolated particular techniques, see the “Overall Impact” section below.

Such information on the whole intervention’s impact is needed, however, in contexts where improving learning outcomes is perceived as a challenging, long-term objective. Identifying interventions that can show significant improvements in a short period can thus provide valuable input to policy makers, even without more complex analyses of individual intervention components.

This aspect of impact evaluation can be misunderstood in any context, but it may risk increased confusion where such research is relatively new. In addition, it demands close collaboration between evaluators and the intervention/ program implementers, where the latter are willing to follow the research design required for rigorous evaluation. This may be difficult for implementers who do not fully accept these research demands and who are reluctant to allow schools to be randomly selected into treatment or control groups, or even to have control groups at all, which would deny elements of the intervention at particular sites in order to test their effect. In such situations, the impact evaluation involves not only a strong research design and skillful conduct of the research, but also a carefully balanced relationship between the evaluators and the implementers.

Documento similar