DOI: 10.3724/SP.J.1041.2016.01047

Acta Psychologica Sinica (心理学报) 2016/48:8 PP.1047-1056

Warm'sweighted maximum likelihood estimation of latent trait in the four-parameter logistic model

There are two types of aberrant responses, the correct responses resulting from lucky guesses, and the false responses resulting fromcarelessness. Because the two aberrant responses do not reflect the examinee's actual knowledge, they may cause an erroneous estimation of the latent trait of examinee.Compared with guesses, careless errors might cause more serious estimation biases, especially if these errors occur at the beginning of a test. To account for the effect of careless errors, Barton and Lord (1981) developed a four-parameter logistic (4PL) model by adding an upper asymptote parameter in the three-parameter logistic (3PL) model. Recently, the 4PLmodel received more attentions, and some literatures highlighted its potential and usefulness both from a methodological point of view and for practical purposes. It can be expected that the 4PL model will be promoted as a competing item response model in psychological and educational measurement.
This paper focuses on one important aspect of the 4PL model, that is, the estimation of latent trait levels. In general, unbiased parameter estimation is desirable. Reducing bias in the latent trait estimator is very important for the application of IRT model. Warm (1989) proposed a weighted maximum likelihood (WML) method for estimating the latent trait parameter in the 3PL model, which was found to be less bias than the maximum likelihood (ML) and expected a posteriori (EAP) estimates. The WML estimate has also been extended to the generalized partial credit model (GPCM). In light of the superior performance of the WML method in previous studies, this studyapplies a WML latent trait estimator to the 4PL model. The main works of this article are to present the derivations of the WML estimator under the 4PL model, and to construct a simulation study to compare the properties of the WML estimator to that of the ML and EAP estimators.
The results of the simulation study suggested that, the bias of the WML estimator was consistently smaller than that of the ML and EAP estimators, particularly, the accuracy of the WML estimator was superior to that of the ML estimator and nearly equivalent to the EAPE. The difference in bias (and accuracy)of the three estimators was substantial when the latent trait is far away from the location of test, but was negligible when the latent trait matches the location of test. Furthermore, both the test length and the item discriminationhad a greater impacton the performanceof the ML and EAP estimatorsthan that of the WML estimator. In the relatively short tests of low discriminating items, the EAP estimator displayed grossly inflated levels of bias, the ML estimator displayed the largest decrease in accuracy, but theWML estimator performed more robustly.
In general, the WML estimator maintains better properties than both the ML and EAP estimators, especially under conditions thatthe test information function was relatively small. Such conditions include, but are not limited to:(a) the mismatch between the latent trait and the location of test; (b) the shortness of the tests (e.g., n ≤12); and (c) the low-discrimination ofitems. In our paper, the findings are not extended to the framework of computer adaptive testing (CAT), asthe simulation was conducted under the linear testing. As a result, our research may be of greatvalue to test developers concerned with constructing fixed and non-adaptive tests.

Key words:item response theory,four-parameter logistic model,Warm’s weighted maximum likelihood estimation

ReleaseDate:2017-01-05 18:12:40

Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques. New York: Marcel Dekker.

Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item response model. In Research Bulletin (pp. 81-20). Princeton, NJ: Educational Testing Service.

Green, B. F. (2011). A comment on early student blunders on computer-based adaptive tests. Applied Psychological Measurement, 35, 165-174.

Liao, W. W., Ho, R. G., Yen, Y. C., & Cheng, H. C. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality, 40, 1679-1694.

Linacre, J. M. (2004). Discrimination, guessing and carelessness asymptotes: Estimating IRT parameters with Rasch. Rasch Measurement Transactions, 18, 959-960.

Loken, E., & Rulison, K. L. (2010). Estimation of a four- parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63, 509-525.

Magis, D. A. (2013). Note on the item information function of the four-parameter logistic model. Applied Psychological Measurement, 37, 304-315.

Magis, D. A. (2014). Accuracy of asymptotic standard errors of the maximum and weighted likelihood estimators of proficiency levels with short tests. Applied Psychological Measurement, 38, 105-121.

Magis, D. A. (2015). A note on weighted likelihood and Jeffreys modal estimation of proficiency levels in polytomous item response models. Psychometrika, 80, 200-204.

Magis, D. A., & Raiche, G. (2012). On the relationships between Jeffreys modal and weighted likelihood estimation of ability under logistic IRT models. Psychometrika, 77, 163-169.

Mathilda, D. T. (2003). IRT from SSI: BILOG-MG, MULTILOG, PARSCALE, TESTFACT. Chicago, IL: Scientific Software International.

Osgood, D. W., McMorris, B. J., & Potenza, M. T. (2002). Analyzing multiple-item measures of crime and deviance I: Item response theory scaling. Journal of Quantitative Criminology, 18, 267-296.

Penfield, R. D., & Bergeron, J. M. (2005). Applying a weighted maximum likelihood latent trait estimator to the generalized partial credit model. Applied Psychological Measurement, 29, 218-233.

Qi, S. Q., Dai, H. Q., & Ding, S. L. (2002). Principles of modern educational and psychological measurement. Beijing: Higher Education Press.

[漆书青, 戴海琦, 丁树良. (2002). 现代教育和心理测量学原理. 北京: 高等教育出版社.]

Rulison, K. L., & Loken, E. (2009). I've fallen and I can't get up: Can high ability students recover from early mistakes in computerized adaptive testing?. Applied Psychological Measurement, 33, 83-101.

Rupp, A. A. (2003). Item response modeling with BILOG-MG and MULTILOG for Windows. International Journal of Testing, 3, 365-384.

Tavares, H. R., de Andrade, D. F., & Pereira, C. A. (2004). Detection of determinant genes and diagnostic via item response theory. Genetics and Molecular Biology, 27, 679-685.

Waller, N. G., & Reise, S. P. (2009). Measuring psychopathology with non-standard IRT models: Fitting the four parameter model to the MMPI. In S. Embretson& J. S. Roberts (Eds.), New directions in psychological measurement with model- based approaches (pp. 147-173). Washington, DC: American Psychological Association.

Wang, S. D., & Wang, T. Y. (2001). Precision of Warm's weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25, 317-331.

Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427-450.

Yen, Y. C., Ho, R. G., Liao, W. W., Chen, L. J., & Kuo, C. C. (2012). An empirical evaluation of the slip correction in the four parameter logistic models with computerized adaptive testing. Applied Psychological Measurement, 36, 75-87.