CAPÍTULO IV: RESULTADOS DE LA INVESTIGACIÓN
4.3. PROPUESTA
4.3.3. Programación
The survey data were collected and stored electronically using SNAP 10 software (Snap Surveys 2010). They were then exported and analysed using the Statistical Package for Social Sciences (SPSS) (IBM Corp. 2012) for the quantitative responses and NVivo (QSR
International 2011) for the qualitative responses. Quantitative analysis using SPSS
Initial analysis of the quantitative data was conducted on the basis of response frequencies as a proportion of each group (TTO, NTO and Comparator). The responses to the Likert-style items were considered in their original five categories, including two degrees each of
positivity and negativity alongside the neutral option. Initial observations were made on this basis.
Descriptive data
As described in Section 3.4.2, Section 1 of the questionnaire contained a small number of demographic questions aimed at identifiying general trends in the respondent population. These data were analysed in terms of frequency and the data regarding year-group and education type employed as bases for the quantitative and qualitative analyses described below. The data regarding gender was ultimately not used for this purpose as it was believed that consideration of gender differences, while likely to prove interesting and revealing as has been the case in previous CLIL studies (e.g. Merisuo-Storm 2007), might detract from the central goals of the research. This was also the reason behind the decision not to include any questions aimed at determining social class.
Statistical testing & attitude to data
Following the initial frequency analysis, the data were considered in terms of the differences between groups and, within each group, of the differences between year-groups. To this end, Chi squared (𝑥2) analysis was conducted within SPSS in order to identify areas of
statistically significant variation, based on a significance level (ρ) of ρ=0.05. As recommended by Muijs (2011), Chi squared analyses were complemented by the Phi measure of effect size (r), using the parameters displayed in Table 3.10.
[141]
Table 3.10 Parameters for effect size (from Muijs 2011).
r effect size <0.1 weak <0.3 modest <0.5 moderate <0.8 strong ≥0.8 very strong
In the first instance, these analyses were conducted only in relation to the data from the NTO and TTO groups, as the inclusion of the Comparator responses risked interference with the results of the tests. Later, Comparator responses were considered in relation to the responses from the NTO group in particular.
Due to the relatively low response-rate to the R2 survey, and its uneven spread across schools and year-groups, it was decided that these data should only be considered with regard to the changes that appeared to take place within the group of respondents who participated in both data collection rounds. As the responses analysed in R2 could therefore all be paired with responses from the same participant from R1, it was possible to employ a McNemar Chi squared test (McNemar 1947) or Bowker test of internal symmetry (Bowker 1948) to identify areas of statistically significant difference. The McNemar test is an alternative to Pearson’s Chi squared test where the two sets of data to be compared both come from the same subject, and where there are two variables, making a 2x2 table. The Bowker test is an extension of the McNemar test that allows for more than two variables. The same test was also employed for the identification of statistically significant differences between one group’s responses to different items, for example TTO learners’ response to English-medium and Dutch-medium teaching and learning. As with the Pearson Chi squared analyses, the results of these tests were interpreted on the basis of a level of significance of 0.05. As the McNemar-Bowker analyses were conducted on the basis of one sample at two different points in time rather than two separate samples, measures of effect size were not conducted for these comparisons.
For a large number of items, reliable Chi squared and Phi, or McNemar-Bowker analysis was not possible due to low numbers of responses in individual categories. This was especially the case for the year-group analyses, where response numbers were smaller. Where this was the case, the data from the scaled items were collapsed into the three larger values of ‘positive’, ‘neutral’ and ‘negative’. This allowed for more consistent and
[142]
widespread application of Chi squared analysis, and therefore for more reliable identification of statistical significance. Cohen et al. (2011) highlight that the subtle distinction between extreme and more measured positive and negative responses can add to the depth and significance of questionnaire data, and emphasise that data should be used in their purest form wherever possible. Care was therefore taken to ensure that distinctions between groups that were only visible through the uncollapsed values were not ignored. To this end, where collapsing the data appeared to conceal statistically significant differences that had already been identified in the uncollapsed data, the original values were restored, even where this invalidated the Chi squared measurement. In the presentation of findings in Chapters 4 and 5, most of the data have been presented visually in their original form, although sometimes with complementary illustrations of the reduced data. The Chi squared, Phi and McNemar-Bowker analyses reported on vary between the reduced and original values, although this is always stated clearly to avoid ambiguity.
Some statisticians might argue that it would have been more meaningful to analyse the Likert-style data using analysis of variance instead of non-parametric tests such Chi-squared. Boone and Boone (2012) provide an overview of the common misconceptions and mistakes in the analysis of Likert scale and Likert-type data. Much of the confusion in this area, they suggest, is a result of the lack of clarity that has arisen between scales that are employed in the way that Likert (1932) intended, and scales that are simply influenced by Likert’s design (‘Likert-type’ or ‘Likert-style’ scales). They argue that, while the original Likert scale was designed to involve simultaneous analysis of responses to a number of related items as interval data, Likert-type items, which are often to be analysed individually, require a different approach. As has been echoed elsewhere (e.g. Cohen, Manion et al. 2011, Muijs 2011), viewing such data as continuous implies the assumption that the response labels will be interpreted in exactly the same way by all respondents, and that the distance between them is equal.
According to Boone and Boone, among others, this assumption is unfounded. Viewing Likert-style data as ordinal, they purport, is a more appropriate approach. Means would therefore have little significance in relation to such data, as would analyses of variance, which are drawn from the mean. Instead, frequency analysis and non-parametric tests are recommended for use with Likert-type data such as those obtained in the current study (Muijs 2011). This approach was also considered most appropriate to the epistemological
[143]
values described in Section 3.2.3, as the current research took the interpretivist view that human experience is individual and context-bound (Cohen, Manion et al. 2011).
Analysis of qualitative responses
As with the Phase I data, the qualitative responses from the R1 questionnaire were copied into NVivo and coded and categorised within the TTO, NTO and Comparator respondent groups. Although these data were generally regarded and analysed qualitatively, data
transformation was employed on occasion (Cohen, Manion et al. 2011), where analysis tools such as the ‘Explore’ function enabled the quick identification of the most common
responses. Together with specific examples from the qualitative responses, these
transformed data were employed to elaborate upon and suggest explanations for trends identified in the quantitative survey responses.
Reporting of data
The findings produced by the majority of the items from the R1 questionnaire are presented and discussed in Chapters 4 and 5. In Chapter 4, nearly all of these analyses were performed on the basis of the R1 responses of all three year-groups in TTO and NTO. In Chapter 5, the analyses shown also include comparisons across year-groups within TTO or NTO, or between TTO and NTO respondents from a single year-group. Chapter 5 also contains within-sample analyses considering differences between R1 and R2 responses from the same group, or comparing responses within TTO to items regarding, for example teaching practices within English-medium and Dutch-medium lessons. In all of these cases, it is stated in the figure caption and in the text to precisely which comparisons the data refer.
Responses to a small number of items were deliberately omitted due to lack of
conclusiveness. Moreover, year-group analyses and comparisons with the R2 questionnaire are only presented for the items whose findings were considered pertinent to the answering of the research questions have been included in this report.
Ethical considerations 3.5
Where living beings are the subjects of research, ethics will most likely play an important role (Dörnyei 2007). Furthermore, as has been emphasised in evaluations of previous pupil research projects (Atweh, Burton 1995), ethical issues are of particular importance in research where children or young people are actively involved. Due to the high level of involvement of young people in Phase I of this research, additional attention was therefore
[144]
paid to adherence to guidelines on research ethics, both in order to ensure the ethical validity of the research and to ensure that the participants felt involved and respected from the outset. These, along with the measures taken to the same ends in Phase II, are described below. A copy of the Student Research Ethics Application Form submitted for approval to the University of Aberdeen can be found in Appendix A.
3.5.1 Election of pupil researchers and consultation
In an approach that is intended to be democratic, deciding which voices should be heard can both be a considerable challenge (Arnot, Reay 2007) and be valued intensely by pupils (Leitch, Gardner et al. 2007). Participants in Phase I were asked to nominate their classmates on the basis of their representativeness of the class as a whole, and the pupils with the most nominations invited to participate on a voluntary basis. While, in particular in Class B, this did not necessarily lead to the selection of a representative group, it was an attempt to employ democratic methods and to make the research as inclusive as possible.