• No se han encontrado resultados

CAPÍTULO 7.- MECANISMOS DE COMERCIALIZACIÓN Y

7.3 Productos

such, relies on evidence of observed performance. Observations of professional performance in real classrooms, and with real students, is the most direct approach to determining professional

competence (Kane, 1992, p. 172) and the extrapolation inference is likely to have high-fidelity. However, as is the case with all assessment methods, the inferences made from performance assessments may still be problematic. In several articles, Kane (2004, 1999, 1992) examines the chain of inferences from evaluation to extrapolation for performance assessment and warns that measures must be taken to protect against challenges in the evaluation and generalization inferences. In the validation of performance assessments of professional competence,

“extrapolation is usually the strongest link . . . evaluation can be a problem, and generalization is almost always a problem” (Kane, Crooks, & Cohen, 1999, p. 173). It is important to note that when conducting a validation inquiry using ABV, seeking out the weakest links in the IA is vital. If the validity challenges discovered in the validation argument can be addressed, these inferences can be improved and strengthened. Kane describes how the weak links in the IA are central to the VA in this way:

It is convenient, but ultimately misguided, for advocates of performance testing or high- fidelity simulations to ignore issues of generalizability and scoring problems, just as it is convenient but misguided for advocates of objective testing to focus their attention on the objectivity of scoring and on the generalizability of the resulting scores. We all like good news and feel some inclination to shoot the bearer of bad tidings. But in evaluating

assessment procedures, it is important to play devil’s advocate. Claims about the validity of performance tests and high-fidelity simulations cannot be accepted without evidence indicating that the scoring is defensible and that the results are generalizable, no matter how realistic, natural, or authentic the assessment. (Kane, 1992, p. 181)

It is particularly important not to conflate what Sackett calls “two distinct problems: evaluating a person and evaluating a work product” (Sackett, 1998, p. 118). To avoid this problem, one of the first links in the ABV chain of inferences is evaluation.

Evaluation:

Kane notes that “the assignment of scores to performances in real practice settings involves some serious problems” (1992, p. 172). First, evaluation requires consensus with regard to “best practices” and pedagogy and where experts disagree there is likely to be errors of measurement and a potential for incoherence in the definition of the operationalized construct. It is likely that scorer disagreement about the action a candidate should have taken will result in

difficulty when scoring “quality.” The advantage of the TPA as an assessment of teacher readiness is that it asks for a sample of teaching practice in the context of actual teaching. The TPA tasks ask candidates to work in complex, realistic situations. However, these classroom-based settings demand the most of scorer judgment and “pose the greatest difficulties in evaluating performance” (p. 172). Inter-rater reliability is a prerequisite for trustworthy scores and this requires intense, costly, and time consuming scorer training and monitoring. Similarly, variability in the authentic settings that may be present for different candidates sitting the same assessment makes it necessary to create more generalized criteria that can account for a “wide range of situations that may arise in actual practice” (p. 172). The more general the criteria, the more judgment is involved both for the candidate in interpreting those criteria and for the raters in scoring the sample using that criteria. Specific rubrics and better trained and experienced raters reduce bias and subjectivity, but Kane notes that “these problems cannot be completely eliminated” (p. 172).

Generalization:

One area in which critics and proponents agree about performance assessment is that it is expensive to administer and to score. As stated before, the strength of the TPA is that it is an authentic assessment. Unfortunately, observing performance in actual classroom settings is inconvenient, time consuming, and expensive. These factors influence the generalizability of scores. Generalizability inferences are influenced by the samples of performance because they are

typically (a) rather short in length or small in number, (b) collected over a limited period of time, and (c) limited to the number of contextual factors present in that given sample. In addition, because the time spent scoring observed performances is extensive, the number of raters is typically low (one or two). Simultaneously, the level of control the candidate has over the situation in which they will be evaluated is often very limited (by placement location, curriculum taught as determined by the district or test due date for the learning segment, and classroom population). Candidates are often placed in a student teaching classroom not of their selection but based on the availability of the mentor teacher, school administrator, and school district. Allowing for the level of variation in the contexts with which two otherwise equal candidates will complete the TPA requires that the rubric criteria used to evaluate each candidate be general enough to account for those variations. The very need for this level of generalizability will create “substantial errors of measurement” (p. 172).

Adding to the problematic nature of generalizability inferences in performance assessment is the concern that the sample may not be representative (or reproducible). The logistics of student teaching dictates that the choice of encounters for candidates is determined by their placement, which is determined by the university they attend and the district where they are placed. Taken together, the validity challenges around generalization likely constitute a weak link in the TPA because it is a performance assessment. Determining readiness based on a small sample of observed performance drawn from a single setting (or one lesson) where students are likely to have much in common with each other (they are assigned schools based on their geographic proximity) to a universe of encounters generalizable to the larger domain of “teaching effectiveness” is problematic and may represent a serious threat to validity if not properly addressed (Kane, 1992, p. 173).

Extrapolation.

Of the three central links in the chain of inferences, extrapolation is likely to be the strongest for performance assessment. Observation of performance for assessment purposes in authentic, real, actual settings in which that professional performance is practiced makes an inference of teacher readiness highly probable, in theory. Despite this, there can also be validity

challenges present in extrapolation inferences of performance assessment. Kane notes that simply observing performance can cause “subtle and not so subtle influences in the quality” (p. 173). The TPA, which requires that candidates record and submit timed sections of their taught segment, adds an additional element of observational influence: the video camera. Students may be accustomed to having other adults (besides the teacher) in the classroom. Certainly, the school administrator would have periodic visits, but other guardians, school board members, teachers and interested parties are often welcome to visit classrooms. Not so typical is the introduction of a video camera. Some students may perform better simply because of the recording out of solidarity with the candidate or because of other conditioning, while other students’ behavior may deteriorate if they become self- conscious or seek additional attention. Class participation may be enhanced or non-existent simply because a video camera is turned on in the back of the room. While students can be conditioned over time to accept the presence of the video camera, it is likely to still have an impact on the candidate who is intently aware of being observed and evaluated. These issues constitute challenges to the extrapolation inference for any performance assessment that requires recorded teaching and an electronic submission.

Documento similar