CAPÍTULO V: ANÁLISIS DE LA IMPLEMENTACIÓN DEL REI
5.2 Desarrollo del REI e identificación de dialécticas
5.2.3 Tercera etapa: Estudio de la caída con resistencia de aire
Both descriptive and analytical statistics were used for the analysis of students’ ‘literacy’ scores. By “analytical”, I mean a more advanced statistical procedure to test for statistical
107 significance of various data patterns observed in the data. The following details the two categories of statistics used in the quantitative data analysis process:
Descriptive statistics were employed to describe the central tendency (average of test scores split into mean and median) and dispersion of students’ ‘literacy’ scores. Because the data was found to be not normally distributed, the median and interquartile range were mainly used for commentary and analysis purposes. Simply put, by normal distribution, I am referring to the shape of the distribution generally being bell-shaped. If the data distribution is found to be fairly symmetrical in shape, in that the marks cluster around the middle range, then the data is said to be normally distributed (Rowntree, 2000:58). This was not the case as evidenced by the level of skewness, for example, in the data table (See Chapter Six, Tables 1- 3).
Analytical statistics were employed to evaluate the sample distributions of students’ ‘literacy’ scores. Again, because the distributions for most scores were not normally distributed, using a parametric test, which assumes normality of the data distribution in question (or that the underlying population distribution is normally distributed), was not suitable for statistical testing purposes. Rather, a nonparametric test, which is distribution-free and more flexible with respect to the assumptions it makes about the data distribution(s) in question, was more appropriate for assessing whether there were statistically significant differences between relevant (paired) sample distributions. In keeping with the statistical methods used in my pilot study of RtL (Millin, 2011; Millin & Millin, 2014), the Wilcoxon signed-rank test was employed, which was a suitable test to use for data that is generally not normally distributed.
4.9.1.1 Explanation of the Wilcoxon Signed-Rank Test
The analytical results were generated using the Wilcoxon (matched-pairs) signed-rank test (See Chapter Six, Section 6.3). The parametric equivalent of this test is the dependent or paired samples t-test. An alternative parametric testing procedure, which might have been used, is repeated measures analysis of variance (ANOVA). But, both of the aforementioned tests make stronger distributional assumptions about the sample data, which do not necessarily hold in small samples ranging from roughly 10 to 50 observations. A
108 nonparametric test might be more flexible and robust in its application. “[I]n statistical parlance, the Wilcoxon test is more robust (less susceptible) to violations in the distributional assumptions so important in parametric testing procedures – in layman’s terms, under varying distributional properties and small sample sizes, the Wilcoxon test is more powerful at discerning the ‘truth’ in the data” (Millin & Millin, 2014: 35). In other words, nonparametric techniques, such as the Wilcoxon test, are more powerful than parametric techniques at discerning meaningful patterns in the data, when one needs to be more flexible regarding distributional assumptions of the data; when faced with small samples of data and such data is not necessarily symmetrical (the data is otherwise skewed) around the mean or median (Upton & Cook, 2008: 360-361) – hence, the use of the nonparametric Wilcoxon signed-rank test in this study.
In contrast to a simple sign test, which only makes use of information concerning the direction of differences within pairs, the Wilcoxon test is more powerful in that it makes use of information concerning both the direction and magnitude of said differences (Siegel, 1956: 75-83). The test is used to determine whether the collective differences in each student’s performance across two assessments, represents a meaningful (or statistically significant) difference in students’ demonstration of their written ‘literacy’ skills across the two tasks (the two samples of data). The Wilcoxon test makes no prior assumptions about the sample, hence, population distribution under consideration, and is best applied to research questions in which the researcher wants to make comparisons between paired (or dependent) samples. The test proceeds by converting continuous data into ordinal (ranked) data by computing the difference in performance for each student across two tasks, ignoring ties or no differences; ranks the absolute value of these performances from smallest to largest; replaces (adds back) the signs with their corresponding ranked differences, and then uses the sum of the positive or negative ranks to compute a test statistic, which follows a standard normal distribution, with mean of 0 and standard deviation or variance of 1. In the pairwise comparisons to follow in Chapter Six, where, for example, the comparison is written as A4-A0, this literally means A4 “minus” A0 or the difference of the two data points for each pair. This is important to note, because if the RtL process is to show a significant improvement pre-intervention (A0) versus post-intervention (A4), the difference on the whole, should be positive more often than not.
109 The test statistic is shown to be significant (statistically different from zero, where zero essentially means no difference between the two assessed literacy performances or sample distributions of ‘literacy’ scores) if the probability value (p-value or level of significance) is less than or equal to 0.05 (5%). For interest sake, the critical z-values for the test statistic at various levels of significance (two-tailed) are (approximately): 1.65 (10%); 1.96 (5%) and 2.58 (1%). Hence, a z-statistic of greater than or equal to 1.96 is indicative of a statistically significant (meaningful) difference between two sample distributions, which in this case, relates to pairwise ‘literacy’ performance (scores). For a more detailed explanation of the Wilcoxon signed-rank test and the statistical power thereof, the interested reader is referred to Wilcoxon (1945); Siegel (1956) and Posten (1982).
Table 4.5: Pairwise Samples of Data for the Wilcoxon Signed-Rank Test Paired Samples of Data
Na rr ativ e N1 – N0 N2 – N1 N3 – N2 N4 – N3 N4 – N0 Ac ad em ic Ess ay A1 – A0 A2 – A1 A3 – A2 A4 – A3 A4 – A0
* Note: See section 4.7, Table 4.2 and Table 4.3 for definition of the above paired samples of data.