The items were written in English as the medium of instruction in the classroom was English. I have been teaching the pre-university programme for at least 25 years and
therefore am proficient in the language to be able to construct the items. The initial instruments comprised of five questions. For its content validity, the instrument was sent to three experts in the area : 2 locals and 1 international (Appendix E) for their relevance and concordance with the content-domain and syllabus. Adjustments were made based on the feedbacks from the experts :
1) The title of the survey was proposed to be more specific instead of ‘Cross Sectional Survey 1’. Therefore it was change to Mathematical Visuality Test since the purpose of the instrument is to look for the students’ mathematical visuality when answering the mathematical word problems.
2) The words and terminologies used in the instruction section are to be as simple as possible so as to make sure that students were able to understand and abide to.
3) Item 2 and 3 of the original instrument were advised to be replaced since students were still able to solve for the answers using the information given and without to sketch any graph.
4) In item 5 of the original instrument, the instruction to find
dt dA
was proposed to be excluded since it would indirectly lead students to solve it using algebraic manipulation.
The instrument was pilot-tested with the same 50 students for the VRUL test. The worked solutions by the students were marked based on the final rubric to increase consistency of scoring (Moskal & Leydens, 2000). The rubric was based on the categories listed under the encoding process of the framework for assessing the visual reasoning.
reasoning, as shown in Table 3.3. The students’ works were then checked and points were assigned based on the respective category. Frequencies, percentage, mean and standard deviation were then calculated for all parts of the items. As a measure of precaution to refine the rubric, 5 students’ works were selected as ‘anchor papers’, sets of scored solutions which reflect a variety of different solutions to the items and different aspects of the rubric. Minor adjustments were made based on the ‘anchor papers’.
Table 3.3: The rubric for the MVT Point Code Description
6 CGCS Correct graph with correct solution
- Produces correct graph to explain and represent the solutions and managed to arrive to the correct solution 5 CGIS Correct graph with incorrect solution
- Produces correct graph to explain and represent the solutions but did not manage to arrive to the correct solution
4 IGCS Incorrect graph with correct solution
- Produces incorrect graph to explain and represent the solutions and managed to arrive to the correct solution based on the wrong graphs. Solutions may differ from the original solutions set.
3 IGIS Incorrect graph with incorrect solution
- Produces incorrect graph to explain and represent the solutions and did not manage to arrive to the correct solution
2 NGCS No graph with correct solution
- Produces no graph to explain and represent the solutions and managed to arrive to the correct solution 1 NGIS No graph with incorrect solution
- Produces no graph at all to explain and represent the solutions and did not manage to arrive to the correct solution
0 NA No answer / Not attempted
- Left the item un-attempted – no graphs or any algebraic solutions.
when it is being marked and who marked the test (Noraini, 2010). Two local experts in the area and subject were assigned to mark the worked solutions by the students and subject to inter-rater reliability analysis. Inter-rater reliability is defined by Noraini (2010) as when scores by two independent experts or raters are consistent due to a well- constructed rubric and scoring criteria for each level or criteria. The overall reliability of 0.94 measured with Cohen’s Kappa of the MVT was based on the inter-rater reliability score of the two experts and indicates that the MVT was reasonably reliable for the study. Although scoring rubrics may not eliminate variations that occur among the raters completely, they do reduce the occurrence of discrepancies. The main objective is for the raters to come to the same score for the same student.
An item analysis was performed on the results of the pilot test. Those items that were outside the ranges of 0.2 and 0.8 (Singh, 2012) for both the difficulty index and discriminant index, respectively, were modified. Difficulty index indicates the total number of students who were able to correctly solve each item. These values would be able to identify the vagueness or complexity of each item for the majority of the students (Kaplan & Saccuzzo, 2005). Discriminant index determines if one student had done well in one part or item will also performed well in the whole set of item. These values would be able to differentiate students with varying ability in terms of the subject content. Items that caused confusion to the students were re-worded or reviewed for clarity (Ghadi, Abu Bakar & Alwi, 2013). The final instrument had appropriate levels of difficulty ranged from 0.6 to 0.97 and levels of discriminant ranged from 0.66 to 0.89.
3.4.3 Graph Reasoning Test (GRT)