• No se han encontrado resultados

1.3 La Ingeniería imperial y las alternativas energéticas

1.3.4 Engineering: ideología, tecnología y mercados

Although PISA provides rich data that allow for analysing features of

education systems worldwide (Fuchs and Wößmann, 2007) and identifying relationships between input factors, processes factors, and learning

outcomes (Schleicher and Zoido, 2016), reflective use of PISA data is suggested (Dolin and Krogh, 2010). It is worth noting that, PISA, like many other ILSAs, also has limitations which are important to keep in mind when using and interpreting its data. In the following, I will review its limitations in measurement (Section 3.5.1), sampling (Section 3.5.2), translation (Section 3.5.3), content coverage (Section 3.5.4), and causality inference (Section 3.5.5) which have been mainly discussed by researchers.

3.5.1 Measurement model

In pre-2015 cycles, the Rasch model (Rasch, 1960), in which the probability of correctly responding to an item is modelled on the item difficulty

parameter and the respondent’s latent ability parameter, was employed in the estimation of item difficulty parameters. One of the assumptions of the Rasch model is parameter invariance. That is, equivalence of item difficulty parameters across respondent populations is assumed. In early cycles of PISA, under this assumption, international item parameters were estimated based on the random sample selected only from each of OECD countries (so called “international calibration”), to which item parameter estimates for all PISA-participating education systems were then scaled (OECD, 2002;

2005; 2009). However, with empirical investigation, this assumption has been found hard to hold across student populations of various systems (Kreiner and Christensen, 2014; Rutkowski et al., 2016). Researchers argue that violation of this assumption may weaken the accuracy of rankings of performance, since estimation of the differences in students’ performance between education systems might be biased in this approach (L, Rutkowski and D, Rutkowski, 2016). It is noted that the technical team of PISA has been making efforts to reduce bias brought by the heterogeneity of item difficulty parameters across populations. For example, from PISA 2009 which gave countries an option to use easier booklets compared with standard ones, a subsample selected from the countries using easier

booklets were added into the international calibration sample (OECD, 2012). In PISA 2015, data from all participating systems in 2015 and previous cycles (2006-2012) were further added into international calibration, with the consideration of population differences across systems and over cycles (OECD, 2017).

Another measurement limitation of PISA that has been debated is with respect to the unidimensionality assumption of the Rasch model (Rasch, 1960). It assumes that there is only one single ability dimension for each scale (e.g. mathematics literacy). The items which fail to meet this

assumption at national level are treated as “dodgy items” which would be excluded from international scaling (e.g. OECD, 2014a). Although this

assumption allows PISA to render league tables of students’ performance on each scale of assessed domains (Bonnet, 2002), it is argued that, like the parameter assumption, the unidimensionality assumption may also not hold in the case of ILSAs such as PISA (Bonnet, 2002; Goldstein, 2004;

Goldstein, 2018), considering the linguistic and cultural influence on the test (Bonnet, 2002). Eliminating the “dodgy items” would make interesting

information which might also be meaningful for understanding country differences removed as well (Goldstein, 2004). Hence, adopting

multidimensional statistical models which retain this information is suggested (Goldstein, 2004).

3.5.2 Sampling

PISA is criticised for its limitations in target population comparability. As we know, PISA targets students of a given age (i.e. 15-year-old). However, the percentage of 15-year-olds enrolled in schools varies across education systems depending on national policies (e.g. policies on grade repeating,

school-starting age) or students’ maturation (Prais, 2003; Bracey, 2004; Duru-Bellat and Suchaut, 2005). This means, for some systems, some pupils have finished their compulsory education and left schools by age 15, and therefore would not be captured by PISA sampling. In addition, as I reviewed previously in Section 3.4.1, due to various school-starting ages across systems, students at the same age, 15-year-old, for example, would have received different years of schooling (Leung, 2014). Because of this, some researchers (e.g. Prais, 2003) prefer grade-based sampling which is used in other ILSAs such as TIMSS. Yet, this approach is contestable as well, since it is not problem-free either considering various school-starting ages (Goldstein and Thomas, 2008; Leung, 2014).

The uncertainty in students’ demographic distribution can hinder the accurate estimation of students’ performance trends across cycles (Gebhardt and Adams, 2007; Aloisi and Tymms, 2017), as I reviewed in Section 3.4.1.4. For example, by reweighting Portuguese samples in PISA 2009 and PISA 2012 in terms of students’ distribution in grades, tracks, and school types to reflect the changes of population composition over this period, Freitas et al (2016) find that the trends in each of the three primary assessment domains (i.e. reading, mathematics, and science) were notably different from those reported in PISA official reports.

3.5.3 Translation

The international comparability of PISA assessment instruments has been questioned in terms of the linguistic and cultural equivalence across various language versions. Since the first cycle of PISA was administered in 2000, the number of participating education systems has been increasing (see Figure 3.1 in Section 3.1). With the launch of its extension programme, PISA for Development, even more and more developing countries have been getting involved in this programme. In developing assessment instruments, PISA uses English and French as two language sources, from which national versions are translated and adapted (e.g. OECD, 2014a). The expanded participation of education systems of various languages and cultures has brought PISA the challenge of developing assessment

instruments of various national versions of different languages. It is admitted that the OECD has made great effort with systematic and rigorous

procedures during translation and back-translation to assure the linguistic and cultural equivalence of these versions across systems (Grisay, 2003; McQueen and Mendelovits, 2003; Grisay et al., 2007). However, some

researchers argue that it is hard to achieve full comparability in international assessments of various language versions (Brown et al., 2007).

There are still concerns of possible bias caused by linguistic and cultural differences across countries (Goldstein, 2004; Grisay et al., 2007; Goldstein, 2018). By comparing the English version of PISA cognitive tests with other language versions such as Finnish, Irish, and German, Eivers (2010) finds that translation into different languages caused changes to various item text lengths which would bring the issue of test speededness, considering that participating students in all systems have the same time limits on the test. Moving the comparison to worldwide language and cultural context, by examining the item difficulties across participating jurisdictions in the first three cycles of PISA, Grisay et al. (2007) find that PISA test instruments tended to favour students in Western countries in which Indo-European languages are used.

3.5.4 Content coverage

As already discussed, PISA indicators inform educational policies (Section 3.1). Due to the policy-driven aim, PISA focuses on the content that is “of high value or interest to educational policy-makers or practitioners” across participating education systems (Adam, 2003, p.379). Those constructs that can be measured and assessed internationally in terms of feasibility with existing technology are selected (Gorur, 2016) and operationalised in assessment (OECD, 1999; Adam, 2003).

For PISA cognitive tests, the primary domains assessed in each cycle of PISA are decided to be limited to reading, mathematics, and science literacies (OECD, 1999). As a senior PISA official stated in an in-depth interview, “reading, science and maths are there largely because we can do it. We can build a common set of things that are valued across the countries and we have the technology for assessing them” (Interview transcript, senior PISA official, cited in Gorur, 2011, p.83). To keep up with social

development, since PISA 2012, more aspects of cognitive content, such as financial literacy (OECD, 2013a), collaborative problem-solving (OECD, 2016d), and global competence (OECD, 2019b) have been successively included in PISA assessment frameworks as optional domains.

Data collected through background questionnaires suffer debates on content limitations as well. PISA questionnaires collect a range of data with regard to educational effectiveness including students’ background, educational

policies and classroom practices through self-reports of students and school principals (OECD, 2013a). Researchers suggest that not all the key factors leading to high educational achievement are covered in PISA. For example, Wu (2014) considers that, like many educational surveys, PISA does not comprehensively capture the education environmental factors (e.g. coaching schools, parental pressures) about students’ lives outside of schools, which may also be influential in leading to the success of education outcomes. Feniger and Lefstein (2014) argue that from PISA data, one cannot learn how policies have actually been enacted and the impact of actual classroom teaching and learning practices. Goldstein (2004) further argues that, like many other ILSAs, as a cross-sectional survey, PISA has limitations in comparing the effectiveness of education systems since it does not have longitudinal data to allow social and other differences, in addition to differences across education systems, to be accounted for.

3.5.5 Causality inference

By building understandings of the characteristics of an education system, PISA could be an empirical knowledge base serving policy formulation (Gustafsson, 2008; Yore et al., 2010). However, to assume that a study could provide a comprehensive understanding of a complex social system such as education is seen as unrealistic (Gist, 1998). Mainly because of PISA’s limitations in content coverage, cautions on drawing causal inferences from the association between students’ performance and learning-related factors identified in PISA data are raised by a number of researchers (e.g. Duru-Bellat and Suchaut, 2005; Goldstein and Thomas, 2008; Gustafsson, 2008; Rutkowski and Delandshere, 2016).

It is considered that for each education system educational outcomes are generated by the overall structures of its societal coherence (Duru-Bellat and Suchaut, 2005). There might be mediating variables having effects on the relationships between the given factors and students’ performance (Gillis et al., 2016), while some of them, as reviewed in last section, are absent in PISA assessment frameworks. As stated in PISA official results reports (e.g. OECD, 2010, p.18), “PISA cannot identify cause-and-effect relationships between inputs, processes and educational outcomes”. Statistics alone could not sufficiently evidence causal relationships (Wu, 2014). Therefore, admittedly, PISA has limitations in answering what we would like to know about how to improve education (Buckingham, 2012; Jerrim, 2015; Baird et al., 2016; Gillis et al., 2016). To form good policies with PISA data, an

amount of reasoning and other evidence beyond statistical analyses are still needed (Wu, 2014).

With regard to additional evidence necessary for disclosing causality

relationships, Goldstein and Thomas (2008) suggest the need of controlling for potential confounding factors such as prior achievements and social background, which could be identified from in-depth qualitative investigations (Egelund, 2008). Since longitudinal data are considered more powerful in controlling for these factors through repeated observation on the same students and education systems over time (Gustafsson, 2008), incorporating longitudinal data, collected from qualitative methods or mixed methods, is proposed (Goldstein, 2004; Egelund, 2008; Gustafsson, 2008) to reveal causal relationships. It is suggested that PISA participating education systems could additionally collect these kind of data for their national purposes by adapting and extending the design of PISA international assessment instruments (Gustafsson, 2008).

In summary, the literature as reviewed above (Section 3.2-3.5) displays the overall picture about extant discussions and debates on PISA’s policy impact in domestic education contexts, and points out the importance of

understanding PISA data and limitations for making evidence-based policies, and also for discussing and evaluating PISA’s impact in educational policies and practices. Based on the synthesis of the literature, I will develop the conceptual framework by further employing a hybrid of theories about washback (also called backwash3) effect (e.g. Alderson and Walker, 1993; Hughes, 1993, cited in Bailey, 1996) and ecological systems theory

(Bronfenbrenner, 1979) as theoretical underpinnings in the next section.