6. E STADO DEL ARTE
6.4. S OBRE LA D IVERSIDAD FUNCIONAL
6.4.2. L EGISLACIÓN APLICABLE
The classic stopping rule in CMT, the SPRT algorithm (e.g., Eggen, 1999; Reckase, 1983; Spray & Reckase, 1996), simplifies the classification task. Assume that a test administrator must classify examinees into one of two categories separated by a cut- point. Let θ0 denote this a priori selected ability value separating true failures from true masters. Then point hypotheses can be specified as
H0 : θi = θ0− δ H1 : θi = θ0+ δ
inside of the mastery region.
The purpose of any stopping rule in CMT is to determine whether an examinee should be classified as a master, a non-master, or be administered another item. To make one of these three decisions, the SPRT compares the likelihood ratio test statistic to appropriate critical values. As an example of how the likelihood ratio statistic might be applied, let responses be conditionally independent and follow the unidimensional, binary, item response function defined in Equation (2.1). Then the log-likelihood for a single examinee given a particular response pattern, yi, J = [yi1, yi2, . . . , yiJ]T, is
log[L(θ|yi, J)] = J X j=1 h yijlog[pj(θ)] + (1 − yij) log[1 − pj(θ)] i (2.4)
with pj(θ) defined in Equation (2.1). If H0 : θl = θ0− δ and H1 : θu = θ0+ δ, then the log-likelihood ratio of examinee i manifesting θu relative to θl is
Ci, j = log h LR(θu, θl|yi, j) i = log L(θ u|yi, j) L(θl|yi, j) = loghL(θu|yi, j) i − loghL(θl|yi, j) i . (2.5) When Equation (2.5) is a large, positive number, then there is sizable evidence that θu generated the particular response pattern, yi, j. Conversely, when Equation (2.5) is a large, negative number, there is considerable evidence supporting θl.
Justification for using a likelihood ratio test statistic when testing simple hypothe- ses is due to the Neyman-Pearson lemma (Casella & Berger, 2001, p. 366). Accord- ing to the Neyman-Pearson lemma, for a fixed sample size, N , and conditional on a particular Type I error rate, α, the uniformly most powerful (UMP) test rejects H0 only contingent on the size of the likelihood ratio test statistic. Likelihood ratio- based test statistics are also optimal in the case of optional stopping, as proved in the
Wald-Wolfowitz theorem (Wald & Wolfowitz, 1948). Specifically, let Y1, Y2, . . . be a (possibly infinite) independent and identically distributed (i.i.d.) sample from com- mon density f with unknown parameter vector θ (dim(θ) ≥ 1). Then assuming a pair of simple hypotheses, H0 : θ = θ1 versus H1 : θ = θ2, and pre-specified crit- ical values, A and B, where 0 < A < B < ∞, a rule that stops sampling when N = inf n n ≥ 1 : Qn i=1 hf (y i|θ1) f (yi|θ2) i ≤ A or Qn i=1 hf (y i|θ1) f (yi|θ2) i
≥ Bois optimal (i.e., min- imizes the expected sample size under both H0 and H1) in the set of all tests with the same Type I and Type II error rates (Lai, 1997). Using the log-likelihood test statistic (rather than the likelihood) and given specific α (Type I error rate) and β (Type II error rate) levels, Wald (1947) recommended choosing Cl = log[A] = log
h β 1−α
i as the critical value separating non-mastery from uncertainty and Cu = log[B] = log
h 1−β
α i
as the critical value separating mastery from uncertainty.
Modeled on sequential decision theory, psychometricians have designed a simple template for ending unidimensional mastery tests. After each item between the min- imum number of items, jmin, and the maximum number of items, jmax, calculate Ci, j = logLR(θu, θl|yi, j)
as defined in Equation (2.5). If Ci, j < Cl, classify the examinee as a failure and terminate the test. If Ci, j > Cu, classify the examinee as a master and terminate the test. But if Cl ≤ Ci, j ≤ Cu, administer another item. Once j = jmax, use a final critical value of (Cl+ Cu)/2 (Finkelman, 2008a) to make a decision. Often, researchers set α = β, so that (Cl+Cu)/2 = 0, but practitioners sometimes desire to avoid one type of error depending on the ultimate costs of misclassification.
Unfortunately, researchers have identified several limitations of the standard SPRT in adaptive mastery testing. First, although Wald and Wolfowitz (1948) proved opti- mality of the SPRT when testing simple hypotheses, the SPRT is inefficient relative to other procedures if θi 6= θl and θi 6= θu (Finkelman, 2008a). In light of this concern, the Generalized Likelihood Ratio (GLR; Bartroff, Finkelman, & Lai, 2008; Thompson,
2009, 2010) was proposed as a simple modification of the SPRT that tests composite hypotheses. Second, the SPRT controls the error rate for infinitely long experiments under certain conditions, but every CAT must be terminated after a maximum number of items. Finkelman (2003, 2008a) proposed several procedures that use the likelihood ratio test statistic to estimate the probability of examinee i switching categories by jmax. In the next several sub-sections, I explore each of the common adjustments to the SPRT algorithm.