Capítulo XXIII - DON QUIJOTE DE LA MANCHA

The following definitions explain the meaning of the terms as used in the context of the present study.

Bilingualism

The term ‘bilingualism’ is reserved to describe two languages used by individual students, regardless of students’ fluency in each of the languages.

Bilingual Education

Bilingual education is defined as education that aims to promote bilingual competence by using both languages as media of instruction with respect to three features: linguistic goals, pedagogical approaches, and levels of schooling, for

significant portions of the academic curriculum. More specifically, for the purpose of the present study, it is defined as the teaching of English by using a combined English and Bahasa Indonesia with equal percentage of time.

Bilingual Classroom Setting

A bilingual classroom setting is an English language classroom in which English is used as the medium of instruction, whereas Indonesian is used as the medium of instruction in all other classes.

Monolingual Education

A monolingual classroom setting is an English language classroom in which Bahasa Indonesia is used as the medium of instruction to explain English words,

sentences, stories and context. More specifically, for the purpose of the present study, it is defined as the teaching of English by using a majority of Bahasa Indonesia and a small amount of English by time.

English Reading Comprehension

English Reading Comprehension is the ability of second language readers to obtain meaning from texts by actively using both lower and higher skills to decode the smaller elements and construct the meaning. By using their schemata (previous or background knowledge), they are able to understand the main idea, sequence the order, and obtain detailed information. The English reading comprehension in this study represents student English achievement and it is based on the Bloom Taxonomy.

English Reading Comprehension Test

The English Reading Comprehension Test used in the present study consists of 12 multiple choice items and three written items to be answered by the students after reading a given piece of English text (see the full test in Chapter Three). A special scoring rubric in which scoring is ordered by item difficulty for the multiple choice items and ordered by quality for the written answers was designed for the test (see the scoring rubric in Chapter Three). This scoring is consistent with Rasch measurement principles and is used to create a linear, unidimensional measure for the variable English Reading Comprehension.

English Text Writing

English Text Writing is the ability of second language writers to produce written texts by actively using writing strategies, techniques and skills, which have been

acquired during their English language learning instruction, whether it be in bilingual or monolingual classes. The English text writing in this study represents student English achievement and it is based on Bloom’s Taxonomy (Bloom et al., 1956).

English Text Writing Test

The English Paragraph Writing Test consists of two compulsory topics on which the students are asked to write several paragraphs in English (see the full test in Chapter Three). A special scoring rubric in which scoring on three aspects of writing is ordered by quality was designed for this test (see Chapter Three). This scoring is in line with

19 Rasch measurement principles and is used to create a linear unidimensional measure of English Writing Quality.

Attitude and Behaviour with regard to Learning English

The Attitude and Behaviour with regard to Learning English Questionnaire (see Appendix E) contains 18 items: three on tasks for listening, three on tasks for speaking, three on tasks for reading, three on tasks for writing, three on tasks for student/student relationships, and three on tasks for student/teacher relationships. Each item was answered from the two perspectives of attitude and behaviour, that is ‘ideally, this is what I think should happen’ (attitude and easier); and ‘this is what actually happened’ (behaviour and harder). The full questionnaire is given in Chapter Three. Response category ‘most or all of the time’ was scored 3, response category ‘some of the time’ was scored 2, and response category ‘never or rarely’ was scored 1. This scoring is ordered in line with Rasch measurement principles and was used to create a linear unidimensional measure.

Middle School

Middle school in Aceh is the level after secondary school. Middle school students are 12-15 years old. The duration of middle school is three years, and English classes are begun in the first semester of the first year of this level.

True Score Theory Measurement

True Score Theory is a way of measuring variables in the social sciences (and education) which claims that the observed total score obtained by a person on a set of test or questionnaire items is made up of a ‘true score’ and a random error score. The scale created by True Score Theory does not contain equal units of measures and is therefore non-linear. That is, the difference between, for example, 50% and 60% does not represent the same amount of variable difference as between 70% and 80%. True Score Theory scores are commonly considered to have at least six problems: (1) non- linearity; (2) multi-dimensional with ‘noise’; (3) item difficulties not ordered; (4) person ‘measures’ and item difficulties not ordered on the same scale; (5) the ‘measures’ are test (item content) dependent; and (6) the ‘measures’ from different tests, even on the

20 same topic, cannot be validly added or linked onto a single scale (see Michel, 1990, 1999; Smith, 1996; Waugh & Chapman, 2005) .

Rasch Measurement

In Rasch measurement, items are ordered from easy to hard on a continuum and their difficulties are calculated on a linear scale (a log odds scale). The person measures are calculated on the same linear scale. An important point to understand is that when the data fit a Rasch measurement model, the differences between the person measures and the item difficulties can be calibrated together in such a way that they are freed from the distributional properties of the incidental parameter, because of the

mathematics involved in the measurement model. This means that ‘scale-free’ measures and ‘sample-free’ item difficulties can be estimated with the creation of a

mathematically objective linear scale with standard units. The standard units are called logits (the log odds of successfully answering the items) (This has been taken from Waugh, 2003, 2005, 2010a; Waugh, 2010b).

Unidimensionality of Variables

In the present study, unidimensional measures are created for English Writing, English reading Comprehension, and Attitude and Behaviour about Learning English. These measures involve a variety of aspects including low order thinking (such as knowing facts and basic comprehension), higher order thinking (such as analysis, synthesis and evaluation), low and high order attitudes, and physical dexterity and, in this sense, cannot be unidimensional. With Rasch measurement, unidimensional means that a single parameter for each person (person measure) can be created as applying to all of the scale items, that a single parameter can be created for each item (item

difficulty) applying to all the persons measured on the same scale, and that these parameters can be applied to accurately predict each person’s response to each item.

Person Separation Index

Person Separation Index is an index ranging from 0 to 1 that shows the proportion of observed variance considered to be true. A high value of the index indicates that measures of the respondent’s ability or preference are sufficiently well

21 separated along the scale in relation to the errors of measurement. It is “structured as the ratio of estimated observed variance among persons, using estimates of their locations (measures) and the standard errors of these locations (measures)” (Andrich & van- Schoubroeck, 1989, p. 483) . The Person Separation Index is interpreted like

Cronbach’s alpha which measures the internal reliability of non-linear scales (Cronbach, 1951).

Item Thresholds

Item thresholds show the location on a continuum whereby it is likely a person will obtain a particular score. More specifically, thresholds are points between adjacent response categories where the odds of answering in either category are 1:1. With three response categories there are two thresholds and with four response categories there are three thresholds. Thresholds should be ordered in line with the ordering of the response categories showing that the responses are answered consistently and logically (Andrich, et al., 2010; RUMM 2030 Manual, 2009).

Standardised Residual

Residuals are differences between the expected response according to the Rasch measurement model and the actual response. The standardized residual is the residual divided by its standard deviation. When there are many standardized residuals, then the mean should be close to zero and the standard deviation should be close to one, when the data fit the Rasch measurement model (Andrich, et al., 2010; RUMM 2030 Manual, 2009).

Response Category Curves

Response Category Curves show whether items have been answered logically and consistently. The actual curve that is produced shows the relationship between the probabilities of answering each category in relation to the specific measure. For

example, the ideal curve for an item with three response categories shows that when the measure is low, the probability is high that the participant response is low (category one). As the measure increases, the probability of answering category one decreases and the probability of answering category two increases. As the measure increases further

22 still, the probability of answering category two decreases and the probability of

answering category three increases (Andrich, et al., 2010; RUMM 2030 Manual, 2009).

Differential Item Functioning

Differential Item Functioning refers to items that give different success rates for two or more groups, at the same ability level (Holland & Wainer, 1993). Masters (1988a) states that item bias occurs if an item's estimated difficulty is significantly greater when calibrated on one sub-group than when calibrated on the other, resulting in the item being considered 'biased' with respect to those two sub-groups. In other words, test bias can occur when the test requires different information or knowledge than that being tested, causing test scores to be less valid for a particular group of test-takers (see also Penfield & Lam, 2000).

Item Characteristics Curves

Item Characteristic Curves show how well the items differentiate between persons with differing measures. An ogive curve (see Figure 1.1 for an ogive curve) is produced for each item showing the relationship between the expected response score and the particular measure (Andrich, et al., 2010; RUMM 2030 Manual, 2009).

In document DON QUIJOTE DE LA MANCHA (página 91-106)