Capítulo I: Precisiones Conceptuales
I.5. El Conocimiento como Recurso Estratégico
I.5.1. El Capital Intelectual
CBL systems usually include formative assessment units in order to estimate the level of understanding of a learner on a topic. Particularly for automatic formative assessments, MC tests are preferred as they are easy to administer and mark. MC test could also cover a wide range of topics. Obviously, it is difficult to design software that automatically marks essay type questions. However, MC tests are not sufficient to assess the level of understanding precisely. The interface bandwidth is limited- it only allows users to select one or more answers. The following paragraphs discuss such criticisms and potential solutions in detail.
An MC test consists of three parts, a question (stem), the correct answer (key) and other options (distracters). The potential of MC tests in educational testing is widely discussed in the educational psychology and measurement literatures. In the late 1990s researchers in educational assessment regarded them as a type of objective test that is used for measuring knowledge and skills (Ben-Simon et al. 1997). Recently, Simkin et al. (2005)
have listed more than 15 advantages of using them. However, there are three main criticisms which are discussed in the educational psychology literature on using them for assessment (Simkin et al. 2005):
• They are not suitable for assessing higher level cognitive abilities such as synthesising or evaluating (in Bloom’s taxonomy (1956)).
• They are not reliable, as guessing may still play a role. • They cannot measure partial knowledge.
Each criticism will be considered in turn. The first one claims that MC tests can assess only recognising or recalling ability, rather than applying, analysing, synthesising or evaluating ability, where the latter cognitive tasks are more difficult than the first based on Bloom’s taxonomy (Bloom 1956). A related criticism is that MC tests lack structural fidelity: structural fidelity is defined as “the congruence between performance called upon by the test and proficient performance in the referent domain” (Simkin et al. 2005,
p. 77). Although some studies by educational psychologists have demonstrated that carefully selected MC tests could be efficient as Constructed Response (CR) tests (such as essay-type or problem solving tests) in testing higher level abilities MC tests are considered inefficient (ibid.). In a recent study in the domain of computer programming
languages the researchers assert the above claim. Simkin et al. state, “Despite research from educational psychology demonstrating the potential for MC tests to measure the same levels of students’ mastery as CR tests, recent studies in specific domain find imperfect relationship between two performance measures” (Simkin et al. 2005, p. 73).
They also maintain that in programming language instruction and similar domains CR tests measure the student’s ability to solve real world problems better than do MC tests.
The second criticism of MC tests is the fact that just because a student has answered a test question correctly does not imply that s/he has mastered the concept. The result of an MC test may not be reliable as the “testwiseness” (clever format-specific strategies) might also play a role (Simkin et al. 2005). There have been a number of attempts to try to alleviate
the effect of lucky guesses in conventional (number-correct) MC tests. Normalisation (Bush 2001) is one of the easiest correction-for-guess marking methods used in practice to smooth this effect. In normalisation, the total mark is normalised. In one-in-four MC tests (four options with one correct answer), the probability of a correct guess is ¼. Therefore, for n test items with x correct, the correction is (x-n/4)*4/3. In correction-for-guess, a
penalty of 1/3 is imposed for each incorrect answer. Other than these methods, allowing multiple correct answers in a test also reduces the effect of guessing considerably (Bush, 2001). This form of correction-for-guess marking methods, however, has been severely criticised by researchers as it penalises the candidates whose unsuccessful guesses were made by partial knowledge (informed guesses rather than blind ones). In the next section a number of research attempts that address the third criticism by giving reasonable credit for partial knowledge in MC tests are discussed.
2.10.1 Rewarding Partial Knowledge
Several scoring methods have been proposed to reward the guesses made using partial knowledge. Ben-Simon (1997) gives a comparative study between different methods that account for partial knowledge. In the 1950s, Coombs et al. (1956) devised a schema
called Elimination Testing procedure. In this procedure, students need to mark all the
incorrect answers. One point is given if a wrong answer is marked as wrong (but three points will be deducted if the right answer is marked as wrong). Even after five decades, the Coombs method is considered to be more effective than many recent such methods (Bradbard et al. 2004). Bush (2001) explains his “liberal test” method (a sort of subset
selection method) and claims it is better than many other methods. In the subset-selection- method, the candidate is asked to specify a subset that includes the correct answer. If the subset is a singleton set with the correct answer 3 points are awarded. The mark decreases, if the correct answer is included, with the size of the subset. A variation of this method is series selection, where the order of preference is to be given explicitly. Higher
marks will be awarded when a correct answer is at the top (ibid.).
In line with this research, Gardner-Medwin’s (1995) Confidence-Based Marking, which has been used for the last 10 years for formative assessment (and for four years for summative assessment) in a medical course at University College, London, is worth discussing further. Gardner-Medwin states, “confidence-based marking (CBM) has been known for many years to stimulate reflection and constructive thinking by students and improve both the reliability and validity of exam data in measuring partial knowledge”
(Gardner-Medwin 2005, p. 1).
He further explain how CBM works,
“Our system employs three confidence levels 1-2-3. Students are asked to rate their confidence after each time they have answered a question (True/False, MCQ, numeric, or open text) that will be marked categorically right or wrong. With low confidence they receive just 1 mark if correct and no penalty if their answer is wrong. At levels 2, 3 they receive accordingly 2 or 3 marks if correct, but increasing penalties (normally -2 and -6 marks) if wrong” (Gardner-Medwin
knowledge state of a student minutely on each item is crucial in order to provide rich item-specific feedback. However, Gardner-Medwin’s method requires the student to state their confidence on one selected answer only. The student’s knowledge state on other answers will not be explicitly identified.