1. Aprendizaje móvil y Educación Superior
1.6. Hacia un modelo de educación flexible y personalizada
This thesis defined a new type of classification in the multidimensional case, namely, grid classification, and compared it with the performance of several ACT item selection methods and termination criteria using a between-item multidimensional polytomous item bank. Generally, ACT was more efficient than the measurement CAT- based two-step approach for grid classification. The newly developed PWCD-optimal was the best item selection method whereas the MMI did not perform well in ACT. Thus, it was suggested that MMI be abandoned from ACT. Paired with the two best item
114 selection methods, D-optimal and PWCD-optimal, the best termination criterion was SPRT. The classification was more difficult when examinees were closer to cutoff scores. PWCD-optimal and D-optimal led to stable test length and classification accuracy. SPRT and CI resulted in the most stable test length and the most dramatically changed
classification accuracy when the examinees were close to the cutoff scores.
Due to the promising performance of ACT, it is the recommended procedure to conduct grid classification. Thus, more studies are needed to further explore the
performance of ACT in different scenarios, such as using within-item multidimensional item banks. Only between-item multidimensional item banks were used in this thesis. Although all the item selection methods and termination criteria are applicable in the within-item multidimensional test, it is still imperative to conduct grid classification ACT using within-item multidimensional item banks and explore the potential influence of item bank type.
Moreover, the current item selection methods and termination criteria were either adopted directly or generalized from unidimensional ACT and measurement CAT. More item selection methods and termination criteria specifically designed for ACT are in need.
115
References
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.
Armitage, P. (1950). Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis. Journal of the Royal Statistical
Society. Series B (Methodological), 12(1), 137–144.
Bartroff, J., Finkelman, M., & Lai, T. L. (2008). Modern sequential analysis and its applications to computerized adaptive testing. Psychometrika, 73(3), 473–486. Bates, D. & Eddelbuettel, D. (2013). Fast and Elegant Numerical Linear Algebra Using
the RcppEigen Package. Journal of Statistical Software, 52(5), 1-24. URL http://www.jstatsoft.org/v52/i05/.
Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. Statistical Theories of Mental Test Scores, 395–479.
Birenbaum, M. & Tatsuoka, K. K. (1987). Open-ended versus multiple-choice response formats--it does make a difference for diagnostic purposes. Applied Psychological
Measurement, 11(4), 385-395.
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29-51.
Boyd, A. M., Dodd, B. G., & Choi, S. W. (2010). Polytomous models in computerized adaptive testing. Handbook of polytomous item response theory models, 229-255.
116 Brown, A. & Croudace, T. J. (2014). 15 Scoring and Estimating Score Precision Using
Multidimensional IRT Models. Handbook of item response theory modeling:
Applications to typical performance assessment, 307.
Chang, H. H. & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3), 213-229. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).
Hillsdale, NJ: Erlbaum.
De Ayala, R. J. (1989). A comparison of the nominal response model and the three- parameter logistic model in computerized adaptive testing. Educational and
psychological measurement, 49(4), 789-805.
De Ayala, R. J., Dodd, B. G., & Koch, W. R. (1992). A comparison of the partial credit and graded response models in computerized adaptive testing. Applied
Measurement in Education, 5(1), 17-34.
Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied
Psychological Measurement, 13(2), 129-143.
Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1993). Computerized adaptive testing using the partial credit model: Effects of item pool characteristics and different stopping rules. Educational and psychological measurement, 53(1), 61-77. Dodd, B. G., De Ayala, R., & Koch, W. R. (1995). Computerized adaptive testing with
117 Donoghue, J. R. (1994). An empirical examination of the IRT information function of
polytomously scored reading items under the generalized partial credit model.
Journal of Educational Measurement, 31, 295-311.
Eddelbuettel, D. & Francois, F. (2011). Rcpp: Seamless R and C++ Integration. Journal
of Statistical Software, 40(8), 1-18. URL http://www.jstatsoft.org/v40/i08/.
Eggen, T. J. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249–261.
Eggen, T. J. (2009). Three-category adaptive classification testing. In Elements of
adaptive testing (pp. 373–387). Springer.
Eggen, T. & Straetmans, G. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement,
60(5), 713–734.
Embretson, S. E. & Reise, S. P. (2000). Multivariate Applications Books Series. Item
response theory for psychologists. Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.
Ercikan, K., Sehwarz, R. D., Julian, M. W., Burket, G. R., Weber, M. M., & Link, V. (1998). Calibration and scoring of tests with multiple‐choice and constructed‐ response item types. Journal of Educational Measurement, 35(2), 137-154. Ferguson, R. L. (1969). Computer-assisted criterion-referenced measurement (Working
Paper No. 41). Pittsburgh, PA: University of Pittsburgh, Learning and Research Development Center.
118 Ferrando, P. J. & Chico, E. (2001). The construct of sensation seeking as measured by
Zuckerman’s SSS-V and Arnett’s AISS: A structural equation model. Personality
and Individual Differences, 31(7), 1121–1133.
Finkelman, M. (2003). An adaptation of stochastic curtailment to truncate Wald’s SPRT
in computerized adaptive testing. Center for the Study of Evaluation, National
Center for Research on Evaluation, Standards; Student Testing, Graduate School of Education & Information Studies, University of California, Los Angeles. Finkelman, M. (2008). On using stochastic curtailment to shorten the sprt in sequential
mastery testing. Journal of Educational and Behavioral Statistics, 33(4), 442– 463.
Finkelman, M. D. (2010). Variations on stochastic curtailment in sequential mastery testing. Applied Psychological Measurement, 34(1), 27–45.
Forero, C. G., Vilagut, G., Adroher, N. D., & Alonso, J. (2013). Multidimensional item response theory models yielded good fit and reliable scores for the short form-12 questionnaire. Journal of Clinical Epidemiology, 66(7), 790–801.
Frey, A. & Seitz, N. N. (2009). Multidimensional adaptive testing in educational and psychological measurement: Current state and future challenges. Studies in
Educational Evaluation, 35(2-3), 89–94.
George, C. & Berger, R. (1990). Statistical inference. Wadsworth; Brooks.
Ghosh, B. K. & Ghosh, B. K. (1970). Sequential tests of statistical hypotheses. Addison- Wesley Reading.
119 Glas, C. A. & Vos, H. J. (2009). Adaptive mastery testing using a multidimensional irt
model. In Elements of adaptive testing (pp. 409–431). Springer.
Gnambs, T. & Batinic, B. (2011). Polytomous adaptive classification testing: Effects of item pool size, test termination criterion, and number of cutscores. Educational
and Psychological Measurement, 71(6), 1006-1022.
Gordon Lan, K., Simon, R., & Halperin, M. (1982). Stochastically curtailed tests in long– term clinical trials. Sequential Analysis, 1(3), 207–219.
Govindarajulu, Z. (1987). The Sequential Analysis of Hypotheses Testing, Point and Interval Estimation, and Decision Theory.
Hamming, R. W. (1950). Error detecting and error correcting codes. The Bell system
technical journal, 29(2), 147-160.
Hsieh, C. A., von Eye, A. A., & Maier, K. S. (2010). Using a multivariate multilevel polytomous item response theory model to study parallel processes of change: The dynamic association between adolescents' social isolation and engagement with delinquent peers in the National Youth Survey. Multivariate Behavioral
Research, 45(3), 508-552.
Huang, W. (2004). Stepwise likelihood ratio statistics in sequential studies. Journal of the
Royal Statistical Society: Series B (Statistical Methodology), 66(2), 401-409.
Jennison, C. & Turnbull, B. W. (1999). Group sequential methods with applications to
120 Jha, S. K., Clarke, E. M., Langmead, C. J., Legay, A., Platzer, A., & Zuliani, P. (2009). A
Bayesian approach to model checking biological systems. In International
conference on computational methods in systems biology (pp. 218–234). Springer.
Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in
Psychology, 7, 109.
Kasai, M. (1997). Application of the rule space model to the reading comprehension
section of the TOEFL (Unpublished doctoral dissertation). University of Illinois at
Urbana-Champaign, Urbana, IL.
Kingsbury, G. G. & Weiss, D. J. (1979). An adaptive testing strategy for mastery decisions. Psychometric Method Program Research Report, 79–5.
Lau, C. A. & Wang, T. (1998). Comparing and Combining Dichotomous and Polytomous Items with SPRT Procedure in Computerized Classification Testing.
Lau, C. A. & Wang, T. (1999). Computerized Classification Testing under Practical Constraints with a Polytomous Model.
Lewis, C. & Sheehan, K. (1990). Using Bayesian decision theory to design a
computerized mastery test. Applied Psychological Measurement, 14, 367–386. Lin, C. J. (2011). Item selection criteria with practical constraints for computerized
classification testing. Educational and Psychological Measurement, 71(1), 20–36. Lin, C. J. & Spray, J. (2000). Effects of item-selection criteria on classification testing
121 Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification
or licensure context. Applied psychological measurement, 20(4), 389-404. Maurelli, V. A. & Weiss, D. J. (1981). Factors Influencing the Psychometric
Characteristics of an Adaptive Testing Strategy for Test Batteries.
McKinley, R. L. & Way, W. D. (1992). The feasibility of modeling secondary TOEFL ability dimensions using multidimensional IRT models. ETS Research Report
Series, 1992(1), i-22.
Mulder, J. & van der Linden, W. J. (2009). Multidimensional adaptive testing with kullback–leibler information item selection. In Elements of adaptive testing (pp. 77–101). Springer.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i-30.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i-30.
Norcini, J. & Guille, R. (2002). Combining tests and setting standards. In International
handbook of research in medical education (pp. 811-834). Springer, Dordrecht.
Nydick, S. W. (2013). Multidimensional mastery testing with cat (PhD thesis). University of Minnesota.
Qiu, Y., Balan, S., Beall, M., Sauder, M., Okazaki, N., & Hahn, T. (2018).
RcppNumerical: 'Rcpp' Integration for Numerical Computing Libraries. R
122 R Core Team (2018). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. URL https://www.R- project.org/.
Reckase, M. (2009). Multidimensional item response theory (Vol. 150). Springer. Reckase, M. D. (1985). The difficulty of test items that measure more than one ability.
Applied Psychological Measurement, 9(4), 401–412.
Rudner, L. M. (2009). An examination of decision-theory adaptive testing procedures. In
D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.
Rupp, A. A. & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-
art. Measurement, 6(4), 219-262.
Samejima, F. (1968). Estimation of latent ability using a response pattern of graded scores. ETS Research Report Series, 1968(1).
Samejima, F. (1976). Graded response model of the latent trait theory and tailored
testing. In C. K. Clark (Ed.), Proceedings of the first Conference on Computerized
Adaptive Testing (pp. 5-17). Washington, DC: U.S. Government Printing Office.
Scott, H. S. (1998). Cognitive diagnostic perspectives of a second language reading test. (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Urbana, IL.
123 Seitz, N. N. & Frey, A. (2013). The sequential probability ratio test for multidimensional
adaptive testing with between-item multidimensionality. Psychological Test and
Assessment Modeling, 55(1), 105–123.
Smits, N. & Finkelman, M. D. (2013). A comparison of computerized classification testing and computerized adaptive testing in clinical psychology. Journal of
Computerized Adaptive Testing, 1, 19–37.
Sobel, M. & Wald, A. (1949). A sequential decision procedure for choosing one of three hypotheses concerning the unknown mean of a normal distribution. The Annals of
Mathematical Statistics, 20(4), 502–522.
Spray, J. A. (1993). Multiple-Category Classification Using a Sequential Probability Ratio Test.
Spray, J. A., Abdel-fattah, A. F., A., Huang, C. Y., & Lau, C. A. (1997). Unidimensional approximations for a computerized classification test when the item pool and latent space are multidimensional. ACT research report series 97-5.
Spray, J. A. & Reckase, M. D. (1994). The selection of test items for decision making
with a computer adaptive test. Paper presented at the annual meeting of the
National Council on Measurement in Education, New Orleans, LA.
Thompson, N. A. (2007). A comparison of two methods of polytomous computerized
classification testing for multiple cutscores (PhD thesis). University of Minnesota.
Thompson, N. A. (2009a). Item selection in computerized classification testing.
124 Thompson, N. A. (2009b). Utilizing the generalized likelihood ratio as a termination
criterion. In GMAC conference on computerized adaptive testing, Minneapolis,
MN.
Thompson, N. A. (2010). Nominal error rates in computerized classification testing. In
First annual conference of the international association for computerized adaptive testing, Arnhem, the Netherlands.
Thompson, N. A. & Ro, S. (2007). Computerized classification testing with composite hypotheses. In Proceedings of the 2007 gmac conference on computerized
adaptive testing.
van der Linden, W. J. (1998). Bayesian item selection criteria for adaptive testing.
Psychometrika, 63(2), 201-216.
van der Linden, W. J. (1999). Multidimensional adaptive testing with a minimum error- variance criterion. Journal of educational and behavioral statistics, 24(4), 398- 412.
van Groen, M. M. (2014). Adaptive testing for making unidimensional and
multidimensional classification decisions (PhD thesis). Universiteit Twente.
van Groen, M. M., Eggen, T. J., & Veldkamp, B. P. (2014). Item selection methods based on multiple objective approaches for classifying respondents into multiple levels.
Applied Psychological Measurement, 38(3), 187–200.
van Groen, M. M., Eggen, T. J., & Veldkamp, B. P. (2016). Multidimensional computerized adaptive testing for classifying examinees with within- dimensionality. Applied Psychological Measurement, 40(6), 387–404.
125 van Rijin, P. W., Eggen, T. J. H. M., Hemker, B. T., & Sanders, P. F. (2002). Evaluation
of selection procedures for computerized adaptive testing with polytomous items.
Applied Psychological Measurement, 26 (4), 393-411.
Veerkamp, W. J. & Berger, M. P. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22(2), 203–226.
Veldkamp, B. P. & van der Linden, W. J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575-588.
Wald, A. (1945). Sequential tests of statistical hypotheses. The annals of mathematical
statistics, 16(2), 117-186.
Wald, A. (1947). Sequential analysis. 1947. Zbl0029, 15805.
Waller, N. G. (2018). Direct Schmid–Leiman transformations and rank-deficient loadings matrices. Psychometrika, 83(4), 858-870.
Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and
Psychological Measurement, 73, 1017-1035.
Wang, C. & Chang, H.-H. (2011). Item selection in multidimensional computerized adaptive testing—gaining information from different angles. Psychometrika,
76(3), 363–384.
Wang, C., Su, S., & Weiss, D. J. (2018). Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model.
126 Wang, C., Weiss, D. J., & Shang, Z. (2019). Variable-length termination criteria for
multidimensional computerized adaptive testing. Psychometrika, 84(3), 749-771. Wang, W. C. & Chen, P. H. (2004). Implementation and measurement efficiency of
multidimensional computerized adaptive testing. Applied Psychological
Measurement, 28(5), 295-316.
Weiss, D. J. & Kingsbury, G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. Weissman, A. (2007). Mutual information item selection in adaptive classification
testing. Educational and Psychological Measurement, 67(1), 41–58.
Welch, R. E. & Frick, T. W. (1993). Computerized adaptive testing in instructional settings. Educational Technology Research and Development, 41(3), 47-62. Wouda, J. & Eggen, T. (2009). Computerized classification testing in more than two
categories by using stochastic curtailment. In Proceedings of the 2009 gmac