The following definitions explain the meaning of the terms as used in the context of the present study.
Bilingualism
The term ‘bilingualism’ is reserved to describe two languages used by individual students, regardless of students’ fluency in each of the languages.
Bilingual Education
Bilingual education is defined as education that aims to promote bilingual competence by using both languages as media of instruction with respect to three features: linguistic goals, pedagogical approaches, and levels of schooling, for
significant portions of the academic curriculum. More specifically, for the purpose of the present study, it is defined as the teaching of English by using a combined English and Bahasa Indonesia with equal percentage of time.
Bilingual Classroom Setting
A bilingual classroom setting is an English language classroom in which English is used as the medium of instruction, whereas Indonesian is used as the medium of instruction in all other classes.
Monolingual Education
A monolingual classroom setting is an English language classroom in which Bahasa Indonesia is used as the medium of instruction to explain English words,
sentences, stories and context. More specifically, for the purpose of the present study, it is defined as the teaching of English by using a majority of Bahasa Indonesia and a small amount of English by time.
18
English Reading Comprehension
English Reading Comprehension is the ability of second language readers to obtain meaning from texts by actively using both lower and higher skills to decode the smaller elements and construct the meaning. By using their schemata (previous or background knowledge), they are able to understand the main idea, sequence the order, and obtain detailed information. The English reading comprehension in this study represents student English achievement and it is based on the Bloom Taxonomy.
English Reading Comprehension Test
The English Reading Comprehension Test used in the present study consists of 12 multiple choice items and three written items to be answered by the students after reading a given piece of English text (see the full test in Chapter Three). A special scoring rubric in which scoring is ordered by item difficulty for the multiple choice items and ordered by quality for the written answers was designed for the test (see the scoring rubric in Chapter Three). This scoring is consistent with Rasch measurement principles and is used to create a linear, unidimensional measure for the variable English Reading Comprehension.
English Text Writing
English Text Writing is the ability of second language writers to produce written texts by actively using writing strategies, techniques and skills, which have been
acquired during their English language learning instruction, whether it be in bilingual or monolingual classes. The English text writing in this study represents student English achievement and it is based on Bloom’s Taxonomy (Bloom et al., 1956).
English Text Writing Test
The English Paragraph Writing Test consists of two compulsory topics on which the students are asked to write several paragraphs in English (see the full test in Chapter Three). A special scoring rubric in which scoring on three aspects of writing is ordered by quality was designed for this test (see Chapter Three). This scoring is in line with
19 Rasch measurement principles and is used to create a linear unidimensional measure of English Writing Quality.
Attitude and Behaviour with regard to Learning English
The Attitude and Behaviour with regard to Learning English Questionnaire (see Appendix E) contains 18 items: three on tasks for listening, three on tasks for speaking, three on tasks for reading, three on tasks for writing, three on tasks for student/student relationships, and three on tasks for student/teacher relationships. Each item was answered from the two perspectives of attitude and behaviour, that is ‘ideally, this is what I think should happen’ (attitude and easier); and ‘this is what actually happened’ (behaviour and harder). The full questionnaire is given in Chapter Three. Response category ‘most or all of the time’ was scored 3, response category ‘some of the time’ was scored 2, and response category ‘never or rarely’ was scored 1. This scoring is ordered in line with Rasch measurement principles and was used to create a linear unidimensional measure.
Middle School
Middle school in Aceh is the level after secondary school. Middle school students are 12-15 years old. The duration of middle school is three years, and English classes are begun in the first semester of the first year of this level.
True Score Theory Measurement
True Score Theory is a way of measuring variables in the social sciences (and education) which claims that the observed total score obtained by a person on a set of test or questionnaire items is made up of a ‘true score’ and a random error score. The scale created by True Score Theory does not contain equal units of measures and is therefore non-linear. That is, the difference between, for example, 50% and 60% does not represent the same amount of variable difference as between 70% and 80%. True Score Theory scores are commonly considered to have at least six problems: (1) non- linearity; (2) multi-dimensional with ‘noise’; (3) item difficulties not ordered; (4) person ‘measures’ and item difficulties not ordered on the same scale; (5) the ‘measures’ are test (item content) dependent; and (6) the ‘measures’ from different tests, even on the
20 same topic, cannot be validly added or linked onto a single scale (see Michel, 1990, 1999; Smith, 1996; Waugh & Chapman, 2005) .
Rasch Measurement
In Rasch measurement, items are ordered from easy to hard on a continuum and their difficulties are calculated on a linear scale (a log odds scale). The person measures are calculated on the same linear scale. An important point to understand is that when the data fit a Rasch measurement model, the differences between the person measures and the item difficulties can be calibrated together in such a way that they are freed from the distributional properties of the incidental parameter, because of the
mathematics involved in the measurement model. This means that ‘scale-free’ measures and ‘sample-free’ item difficulties can be estimated with the creation of a
mathematically objective linear scale with standard units. The standard units are called logits (the log odds of successfully answering the items) (This has been taken from Waugh, 2003, 2005, 2010a; Waugh, 2010b).
Unidimensionality of Variables
In the present study, unidimensional measures are created for English Writing, English reading Comprehension, and Attitude and Behaviour about Learning English. These measures involve a variety of aspects including low order thinking (such as knowing facts and basic comprehension), higher order thinking (such as analysis, synthesis and evaluation), low and high order attitudes, and physical dexterity and, in this sense, cannot be unidimensional. With Rasch measurement, unidimensional means that a single parameter for each person (person measure) can be created as applying to all of the scale items, that a single parameter can be created for each item (item
difficulty) applying to all the persons measured on the same scale, and that these parameters can be applied to accurately predict each person’s response to each item.
Person Separation Index
Person Separation Index is an index ranging from 0 to 1 that shows the proportion of observed variance considered to be true. A high value of the index indicates that measures of the respondent’s ability or preference are sufficiently well
21 separated along the scale in relation to the errors of measurement. It is “structured as the ratio of estimated observed variance among persons, using estimates of their locations (measures) and the standard errors of these locations (measures)” (Andrich & van- Schoubroeck, 1989, p. 483) . The Person Separation Index is interpreted like
Cronbach’s alpha which measures the internal reliability of non-linear scales (Cronbach, 1951).
Item Thresholds
Item thresholds show the location on a continuum whereby it is likely a person will obtain a particular score. More specifically, thresholds are points between adjacent response categories where the odds of answering in either category are 1:1. With three response categories there are two thresholds and with four response categories there are three thresholds. Thresholds should be ordered in line with the ordering of the response categories showing that the responses are answered consistently and logically (Andrich, et al., 2010; RUMM 2030 Manual, 2009).
Standardised Residual
Residuals are differences between the expected response according to the Rasch measurement model and the actual response. The standardized residual is the residual divided by its standard deviation. When there are many standardized residuals, then the mean should be close to zero and the standard deviation should be close to one, when the data fit the Rasch measurement model (Andrich, et al., 2010; RUMM 2030 Manual, 2009).
Response Category Curves
Response Category Curves show whether items have been answered logically and consistently. The actual curve that is produced shows the relationship between the probabilities of answering each category in relation to the specific measure. For
example, the ideal curve for an item with three response categories shows that when the measure is low, the probability is high that the participant response is low (category one). As the measure increases, the probability of answering category one decreases and the probability of answering category two increases. As the measure increases further
22 still, the probability of answering category two decreases and the probability of
answering category three increases (Andrich, et al., 2010; RUMM 2030 Manual, 2009).
Differential Item Functioning
Differential Item Functioning refers to items that give different success rates for two or more groups, at the same ability level (Holland & Wainer, 1993). Masters (1988a) states that item bias occurs if an item's estimated difficulty is significantly greater when calibrated on one sub-group than when calibrated on the other, resulting in the item being considered 'biased' with respect to those two sub-groups. In other words, test bias can occur when the test requires different information or knowledge than that being tested, causing test scores to be less valid for a particular group of test-takers (see also Penfield & Lam, 2000).
Item Characteristics Curves
Item Characteristic Curves show how well the items differentiate between persons with differing measures. An ogive curve (see Figure 1.1 for an ogive curve) is produced for each item showing the relationship between the expected response score and the particular measure (Andrich, et al., 2010; RUMM 2030 Manual, 2009).
23