4. RESULTADOS
4.2.4. Síntesis de la evaluación de la calidad de las aguas costeras
Amendments to the test blueprint were then taken into consideration during item writing and development of the first draft of the I-STUDIO instrument. The draft items were developed under the supervision of the graduate advisors for the project. Candidate items were designed and developed according to published standards and best practices. Items were then organized to produce a draft I-STUDIO instrument submitted to the panel of expert reviewers that had reviewed the test blueprint for feedback. Minimally acceptable responses were drafted to accompany the draft instrument so that expert reviewers could have a general sense of acceptable responses for each item. However, the final rubric was informed by actual student responses, and therefore most rubric
development took place following collection of field test data.
3.3.3.1 Item design.
In order to achieve the goals of the I-STUDIO assessment, the tasks were written in an open-ended format (i.e. constructed/produce response). According to Thorndike and Thorndike-Christ (2010), “the major advantage of the produce-response, or essay, type of question lies in its potential for measuring examinees’ abilities to organize, synthesize, and integrate their knowledge; to use information to solve novel problems; and to demonstrate original or integrative thought.” Such remarks corroborate the
appropriateness of open-ended tasks since the aforementioned outcomes align closely with the definition of cognitive transfer outcomes the I-STUDIO assessment intended to measure.
A drawback of open-ended tasks is that content knowledge may be confounded with the ability to organize and synthesize a coherent response (R. M. Thorndike &
Thorndike-Christ, 2010). While perhaps true in general, since the target construct could be described as an interaction between the content knowledge and organization of schema, the concern of the stated drawback is minimally (if at all) problematic for the goals of I-STUDIO assessment. Another challenge presented by open-ended tasks is the time required to produce thoughtful responses. In order to mitigate this challenge, instructors were permitted to offer the assessment tool for use outside of class although the constraint of student fatigue was still present.
3.3.3.2 Item development.
The items included in the I-STUDIO assessment were selected from a pool including tasks adapted or adopted from published sources as well as original tasks developed by the author. Item development adhered to published guidance and best-practices in the literature (e.g. AERA, APA, & NCME, 1999; Haladyna & Rodriguez, 2013; R. M. Thorndike & Thorndike-Christ, 2010). These guidelines include attention to suitability of content presented in each item with respect to pertinence to the target domain,
appropriate cognitive demand, and consistency of expectations among similar tasks (Haladyna & Rodriguez, 2013). Haladyna and Rodriguez (2013) also recommend that careful attention be paid to write specific instructions that include information about the desired format of a quality response. Additionally, item development attended to cultural diversity and appropriate level of language sophistication in order to mitigate these sources of construct-irrelevant variance (Haladyna & Rodriguez, 2013).
3.3.3.3 Draft instrument review.
The draft I-STUDIO instrument was developed under the supervision of the graduate advisors for the project. As with the test blueprint, the complete draft I-STUDIO
instrument was then reviewed by the panel of expert reviewers (Appendix D: Draft I- STUDIO Version Prior to Expert Feedback). The final roster of expert reviewers for the draft I-STUDIO instrument was:
- Sanford Weisberg (University of Minnesota – Statistics Dept.)
- Roxy Peck (California Polytechnic State University – Statistics Dept.) - Sashank Varma (University of Minnesota – Educational Psychology Dept.) - Tim Jacobbe (University of Florida – Teaching & Learning Dept.)
- Beth Chance (California Polytechnic State University – Statistics Dept.) - Marsha Lovett (Carnegie Mellon University – Psychology Dept.)
As with the test blueprint, reviewers were again provided a questionnaire designed to direct specific feedback and recommendations for critical components of the draft I- STUDIO instrument (Appendix E: Expert Feedback Questionnaire Accompanying Draft I-STUDIO Assessment Tool). The instrument was then updated to reflect recommended changes before use with students, and then revised again following observations extracted from the cognitive interview data.
3.3.3.4 Item scoring.
Since all I-STUDIO tasks were open-ended, scoring decisions required careful consideration. Depending on the nature of the task and the expectations for task performance, an open-ended task may accommodate objective as well as subjective
scoring criteria. Objective scoring was used where possible in order to reduce the
dependence on subject matter expertise and subjective judgments that may differ between raters or even within a rater over time (Haladyna & Rodriguez, 2013). For example, the rubrics for item 5 (matched pairs study design), item 6 (underlying principle of
inference), and item 7 (inference not required) compare the response against a checklist of target characteristics. A subjective scoring approach using a pre-defined rubric was used for item 1 (ATC preparation), item 2 (note identification), item 3 (display screen inspection), and item 4 (Walleye fishing), and their constituent subtasks.
Rubrics for items scored subjectively were designed to assess different levels of quality in response (e.g., essentially correct; partially correct; incorrect). Examples were provided to describe work commensurate with each score in the rubric and illustrate detail that is irrelevant to the target domain. Minimally acceptable responses were created to accompany the draft I-STUDIO instrument during expert review, but final item rubrics were developed and tuned using a sample of actual student responses.
The student responses used for rubric development were selected as a stratified random sample from the pool of usable responses collected in the field test. Three randomly selected students were chosen from 13 unique courses that participated in the field test for a total of 24 complete student responses. For each subtask, the 24 responses were ranked by desirability and noted for exceptional features. Themes among responses were then translated into rubric criteria for the subtask. Model responses were selected for inclusion in the rubric as exemplars of each scoring level. The draft rubric was developed under the supervision of graduate advisors to the project, and then the rubric
and a small number of actual student responses were provided to a Statistics Education PhD candidate for additional feedback that was used to further refine the rubric.