• No se han encontrado resultados

1. Aprendizaje móvil y Educación Superior

1.6. Hacia un modelo de educación flexible y personalizada

This thesis defined a new type of classification in the multidimensional case, namely, grid classification, and compared it with the performance of several ACT item selection methods and termination criteria using a between-item multidimensional polytomous item bank. Generally, ACT was more efficient than the measurement CAT- based two-step approach for grid classification. The newly developed PWCD-optimal was the best item selection method whereas the MMI did not perform well in ACT. Thus, it was suggested that MMI be abandoned from ACT. Paired with the two best item

114 selection methods, D-optimal and PWCD-optimal, the best termination criterion was SPRT. The classification was more difficult when examinees were closer to cutoff scores. PWCD-optimal and D-optimal led to stable test length and classification accuracy. SPRT and CI resulted in the most stable test length and the most dramatically changed

classification accuracy when the examinees were close to the cutoff scores.

Due to the promising performance of ACT, it is the recommended procedure to conduct grid classification. Thus, more studies are needed to further explore the

performance of ACT in different scenarios, such as using within-item multidimensional item banks. Only between-item multidimensional item banks were used in this thesis. Although all the item selection methods and termination criteria are applicable in the within-item multidimensional test, it is still imperative to conduct grid classification ACT using within-item multidimensional item banks and explore the potential influence of item bank type.

Moreover, the current item selection methods and termination criteria were either adopted directly or generalized from unidimensional ACT and measurement CAT. More item selection methods and termination criteria specifically designed for ACT are in need.

115

References

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.

Armitage, P. (1950). Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis. Journal of the Royal Statistical

Society. Series B (Methodological), 12(1), 137–144.

Bartroff, J., Finkelman, M., & Lai, T. L. (2008). Modern sequential analysis and its applications to computerized adaptive testing. Psychometrika, 73(3), 473–486. Bates, D. & Eddelbuettel, D. (2013). Fast and Elegant Numerical Linear Algebra Using

the RcppEigen Package. Journal of Statistical Software, 52(5), 1-24. URL http://www.jstatsoft.org/v52/i05/.

Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. Statistical Theories of Mental Test Scores, 395–479.

Birenbaum, M. & Tatsuoka, K. K. (1987). Open-ended versus multiple-choice response formats--it does make a difference for diagnostic purposes. Applied Psychological

Measurement, 11(4), 385-395.

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29-51.

Boyd, A. M., Dodd, B. G., & Choi, S. W. (2010). Polytomous models in computerized adaptive testing. Handbook of polytomous item response theory models, 229-255.

116 Brown, A. & Croudace, T. J. (2014). 15 Scoring and Estimating Score Precision Using

Multidimensional IRT Models. Handbook of item response theory modeling:

Applications to typical performance assessment, 307.

Chang, H. H. & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3), 213-229. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).

Hillsdale, NJ: Erlbaum.

De Ayala, R. J. (1989). A comparison of the nominal response model and the three- parameter logistic model in computerized adaptive testing. Educational and

psychological measurement, 49(4), 789-805.

De Ayala, R. J., Dodd, B. G., & Koch, W. R. (1992). A comparison of the partial credit and graded response models in computerized adaptive testing. Applied

Measurement in Education, 5(1), 17-34.

Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied

Psychological Measurement, 13(2), 129-143.

Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1993). Computerized adaptive testing using the partial credit model: Effects of item pool characteristics and different stopping rules. Educational and psychological measurement, 53(1), 61-77. Dodd, B. G., De Ayala, R., & Koch, W. R. (1995). Computerized adaptive testing with

117 Donoghue, J. R. (1994). An empirical examination of the IRT information function of

polytomously scored reading items under the generalized partial credit model.

Journal of Educational Measurement, 31, 295-311.

Eddelbuettel, D. & Francois, F. (2011). Rcpp: Seamless R and C++ Integration. Journal

of Statistical Software, 40(8), 1-18. URL http://www.jstatsoft.org/v40/i08/.

Eggen, T. J. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249–261.

Eggen, T. J. (2009). Three-category adaptive classification testing. In Elements of

adaptive testing (pp. 373–387). Springer.

Eggen, T. & Straetmans, G. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement,

60(5), 713–734.

Embretson, S. E. & Reise, S. P. (2000). Multivariate Applications Books Series. Item

response theory for psychologists. Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.

Ercikan, K., Sehwarz, R. D., Julian, M. W., Burket, G. R., Weber, M. M., & Link, V. (1998). Calibration and scoring of tests with multiple‐choice and constructed‐ response item types. Journal of Educational Measurement, 35(2), 137-154. Ferguson, R. L. (1969). Computer-assisted criterion-referenced measurement (Working

Paper No. 41). Pittsburgh, PA: University of Pittsburgh, Learning and Research Development Center.

118 Ferrando, P. J. & Chico, E. (2001). The construct of sensation seeking as measured by

Zuckerman’s SSS-V and Arnett’s AISS: A structural equation model. Personality

and Individual Differences, 31(7), 1121–1133.

Finkelman, M. (2003). An adaptation of stochastic curtailment to truncate Wald’s SPRT

in computerized adaptive testing. Center for the Study of Evaluation, National

Center for Research on Evaluation, Standards; Student Testing, Graduate School of Education & Information Studies, University of California, Los Angeles. Finkelman, M. (2008). On using stochastic curtailment to shorten the sprt in sequential

mastery testing. Journal of Educational and Behavioral Statistics, 33(4), 442– 463.

Finkelman, M. D. (2010). Variations on stochastic curtailment in sequential mastery testing. Applied Psychological Measurement, 34(1), 27–45.

Forero, C. G., Vilagut, G., Adroher, N. D., & Alonso, J. (2013). Multidimensional item response theory models yielded good fit and reliable scores for the short form-12 questionnaire. Journal of Clinical Epidemiology, 66(7), 790–801.

Frey, A. & Seitz, N. N. (2009). Multidimensional adaptive testing in educational and psychological measurement: Current state and future challenges. Studies in

Educational Evaluation, 35(2-3), 89–94.

George, C. & Berger, R. (1990). Statistical inference. Wadsworth; Brooks.

Ghosh, B. K. & Ghosh, B. K. (1970). Sequential tests of statistical hypotheses. Addison- Wesley Reading.

119 Glas, C. A. & Vos, H. J. (2009). Adaptive mastery testing using a multidimensional irt

model. In Elements of adaptive testing (pp. 409–431). Springer.

Gnambs, T. & Batinic, B. (2011). Polytomous adaptive classification testing: Effects of item pool size, test termination criterion, and number of cutscores. Educational

and Psychological Measurement, 71(6), 1006-1022.

Gordon Lan, K., Simon, R., & Halperin, M. (1982). Stochastically curtailed tests in long– term clinical trials. Sequential Analysis, 1(3), 207–219.

Govindarajulu, Z. (1987). The Sequential Analysis of Hypotheses Testing, Point and Interval Estimation, and Decision Theory.

Hamming, R. W. (1950). Error detecting and error correcting codes. The Bell system

technical journal, 29(2), 147-160.

Hsieh, C. A., von Eye, A. A., & Maier, K. S. (2010). Using a multivariate multilevel polytomous item response theory model to study parallel processes of change: The dynamic association between adolescents' social isolation and engagement with delinquent peers in the National Youth Survey. Multivariate Behavioral

Research, 45(3), 508-552.

Huang, W. (2004). Stepwise likelihood ratio statistics in sequential studies. Journal of the

Royal Statistical Society: Series B (Statistical Methodology), 66(2), 401-409.

Jennison, C. & Turnbull, B. W. (1999). Group sequential methods with applications to

120 Jha, S. K., Clarke, E. M., Langmead, C. J., Legay, A., Platzer, A., & Zuliani, P. (2009). A

Bayesian approach to model checking biological systems. In International

conference on computational methods in systems biology (pp. 218–234). Springer.

Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in

Psychology, 7, 109.

Kasai, M. (1997). Application of the rule space model to the reading comprehension

section of the TOEFL (Unpublished doctoral dissertation). University of Illinois at

Urbana-Champaign, Urbana, IL.

Kingsbury, G. G. & Weiss, D. J. (1979). An adaptive testing strategy for mastery decisions. Psychometric Method Program Research Report, 79–5.

Lau, C. A. & Wang, T. (1998). Comparing and Combining Dichotomous and Polytomous Items with SPRT Procedure in Computerized Classification Testing.

Lau, C. A. & Wang, T. (1999). Computerized Classification Testing under Practical Constraints with a Polytomous Model.

Lewis, C. & Sheehan, K. (1990). Using Bayesian decision theory to design a

computerized mastery test. Applied Psychological Measurement, 14, 367–386. Lin, C. J. (2011). Item selection criteria with practical constraints for computerized

classification testing. Educational and Psychological Measurement, 71(1), 20–36. Lin, C. J. & Spray, J. (2000). Effects of item-selection criteria on classification testing

121 Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification

or licensure context. Applied psychological measurement, 20(4), 389-404. Maurelli, V. A. & Weiss, D. J. (1981). Factors Influencing the Psychometric

Characteristics of an Adaptive Testing Strategy for Test Batteries.

McKinley, R. L. & Way, W. D. (1992). The feasibility of modeling secondary TOEFL ability dimensions using multidimensional IRT models. ETS Research Report

Series, 1992(1), i-22.

Mulder, J. & van der Linden, W. J. (2009). Multidimensional adaptive testing with kullback–leibler information item selection. In Elements of adaptive testing (pp. 77–101). Springer.

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i-30.

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i-30.

Norcini, J. & Guille, R. (2002). Combining tests and setting standards. In International

handbook of research in medical education (pp. 811-834). Springer, Dordrecht.

Nydick, S. W. (2013). Multidimensional mastery testing with cat (PhD thesis). University of Minnesota.

Qiu, Y., Balan, S., Beall, M., Sauder, M., Okazaki, N., & Hahn, T. (2018).

RcppNumerical: 'Rcpp' Integration for Numerical Computing Libraries. R

122 R Core Team (2018). R: A language and environment for statistical computing. R

Foundation for Statistical Computing, Vienna, Austria. URL https://www.R- project.org/.

Reckase, M. (2009). Multidimensional item response theory (Vol. 150). Springer. Reckase, M. D. (1985). The difficulty of test items that measure more than one ability.

Applied Psychological Measurement, 9(4), 401–412.

Rudner, L. M. (2009). An examination of decision-theory adaptive testing procedures. In

D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.

Rupp, A. A. & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-

art. Measurement, 6(4), 219-262.

Samejima, F. (1968). Estimation of latent ability using a response pattern of graded scores. ETS Research Report Series, 1968(1).

Samejima, F. (1976). Graded response model of the latent trait theory and tailored

testing. In C. K. Clark (Ed.), Proceedings of the first Conference on Computerized

Adaptive Testing (pp. 5-17). Washington, DC: U.S. Government Printing Office.

Scott, H. S. (1998). Cognitive diagnostic perspectives of a second language reading test. (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Urbana, IL.

123 Seitz, N. N. & Frey, A. (2013). The sequential probability ratio test for multidimensional

adaptive testing with between-item multidimensionality. Psychological Test and

Assessment Modeling, 55(1), 105–123.

Smits, N. & Finkelman, M. D. (2013). A comparison of computerized classification testing and computerized adaptive testing in clinical psychology. Journal of

Computerized Adaptive Testing, 1, 19–37.

Sobel, M. & Wald, A. (1949). A sequential decision procedure for choosing one of three hypotheses concerning the unknown mean of a normal distribution. The Annals of

Mathematical Statistics, 20(4), 502–522.

Spray, J. A. (1993). Multiple-Category Classification Using a Sequential Probability Ratio Test.

Spray, J. A., Abdel-fattah, A. F., A., Huang, C. Y., & Lau, C. A. (1997). Unidimensional approximations for a computerized classification test when the item pool and latent space are multidimensional. ACT research report series 97-5.

Spray, J. A. & Reckase, M. D. (1994). The selection of test items for decision making

with a computer adaptive test. Paper presented at the annual meeting of the

National Council on Measurement in Education, New Orleans, LA.

Thompson, N. A. (2007). A comparison of two methods of polytomous computerized

classification testing for multiple cutscores (PhD thesis). University of Minnesota.

Thompson, N. A. (2009a). Item selection in computerized classification testing.

124 Thompson, N. A. (2009b). Utilizing the generalized likelihood ratio as a termination

criterion. In GMAC conference on computerized adaptive testing, Minneapolis,

MN.

Thompson, N. A. (2010). Nominal error rates in computerized classification testing. In

First annual conference of the international association for computerized adaptive testing, Arnhem, the Netherlands.

Thompson, N. A. & Ro, S. (2007). Computerized classification testing with composite hypotheses. In Proceedings of the 2007 gmac conference on computerized

adaptive testing.

van der Linden, W. J. (1998). Bayesian item selection criteria for adaptive testing.

Psychometrika, 63(2), 201-216.

van der Linden, W. J. (1999). Multidimensional adaptive testing with a minimum error- variance criterion. Journal of educational and behavioral statistics, 24(4), 398- 412.

van Groen, M. M. (2014). Adaptive testing for making unidimensional and

multidimensional classification decisions (PhD thesis). Universiteit Twente.

van Groen, M. M., Eggen, T. J., & Veldkamp, B. P. (2014). Item selection methods based on multiple objective approaches for classifying respondents into multiple levels.

Applied Psychological Measurement, 38(3), 187–200.

van Groen, M. M., Eggen, T. J., & Veldkamp, B. P. (2016). Multidimensional computerized adaptive testing for classifying examinees with within- dimensionality. Applied Psychological Measurement, 40(6), 387–404.

125 van Rijin, P. W., Eggen, T. J. H. M., Hemker, B. T., & Sanders, P. F. (2002). Evaluation

of selection procedures for computerized adaptive testing with polytomous items.

Applied Psychological Measurement, 26 (4), 393-411.

Veerkamp, W. J. & Berger, M. P. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22(2), 203–226.

Veldkamp, B. P. & van der Linden, W. J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575-588.

Wald, A. (1945). Sequential tests of statistical hypotheses. The annals of mathematical

statistics, 16(2), 117-186.

Wald, A. (1947). Sequential analysis. 1947. Zbl0029, 15805.

Waller, N. G. (2018). Direct Schmid–Leiman transformations and rank-deficient loadings matrices. Psychometrika, 83(4), 858-870.

Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and

Psychological Measurement, 73, 1017-1035.

Wang, C. & Chang, H.-H. (2011). Item selection in multidimensional computerized adaptive testing—gaining information from different angles. Psychometrika,

76(3), 363–384.

Wang, C., Su, S., & Weiss, D. J. (2018). Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model.

126 Wang, C., Weiss, D. J., & Shang, Z. (2019). Variable-length termination criteria for

multidimensional computerized adaptive testing. Psychometrika, 84(3), 749-771. Wang, W. C. & Chen, P. H. (2004). Implementation and measurement efficiency of

multidimensional computerized adaptive testing. Applied Psychological

Measurement, 28(5), 295-316.

Weiss, D. J. & Kingsbury, G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361–375. Weissman, A. (2007). Mutual information item selection in adaptive classification

testing. Educational and Psychological Measurement, 67(1), 41–58.

Welch, R. E. & Frick, T. W. (1993). Computerized adaptive testing in instructional settings. Educational Technology Research and Development, 41(3), 47-62. Wouda, J. & Eggen, T. (2009). Computerized classification testing in more than two

categories by using stochastic curtailment. In Proceedings of the 2009 gmac