DOI: 10.3724/SP.J.1041.2017.01473

Acta Psychologica Sinica (心理学报) 2017/49:11 PP.1473-1482

Performance of the entropy as an index of classification accuracy in latent profile analysis: A Monte Carlo simulation study

Latent Profile Analysis (LPA) is a latent variable modeling technique that identifies latent (unobserved) subgroups of individuals within a population based on continuous indicators. LPA has become a popular statistical method for modelling unobserved population heterogeneity in social and behavioral science. Entropy is a standardized index of model-based classification accuracy, with higher values indicating more precise assignment of individuals to latent profiles. In lots of conditions, the aim of substantial research was to assign individual to different latent subgroup. Therefore, Entropy was chosen to report as an index reflecting accuracy of class membership assignment. Unfortunately, very few methodological studies have examined the behavior of Entropy under the conditions where sample sizes, latent class separations, number of indicators, and number of classes are varying. Thus, the primary purpose of this study was to examine how Entropy will perform with different sample sizes, latent class separations, number of indicators, and number of classes.
By using Monte Carlo simulation techniques, we generated artificial data to fit true models and evaluated the performance of Entropy and entropy-based indexes (CLC, ICL_BIC, sample adjusted ICL_BIC) under different modeling conditions. The simulation was repeated 100 times for each condition of the 120 combinations:sample sizes (50, 100, 500, 1000, 3000), latent class separations (0.5, 1.2, 3), number of indicators (4, 8, 12, 20), and number of latent classes (3, 5). The continuous indicators of the latent class are not allowed to correlate. Different mean levels on the observed variables are calculated by Mahalanobis distance (MD). The simulations and analyses of the sample data were conducted using the Monte Carlo facilities of Mplus7.4.
For 3 latent classes, Entropy values round 0.76 and above are related to at least 90% correct assignment, and Entropy values round 0.64 and below are related to at least 20% classification error rate. When the latent classes is 5, Entropy value around 0.84 and above are related to at least 90% correct assignment. The Entropy value decreases and the classification error rate increases as sample size increases. Entropy performs well under small sample sizes (50-100) and more indicators conditions. Entropy consistently performs better when latent class separation is large (MD=3), and the result is quite consistent across the sample size and number of latent classes. The tendency of CLC, ICL_BIC, and sample adjusted ICL_BIC were similar, which increases as sample size increases, and it also increases under large class separation but the differences of Entropy caused by class separation were more noticeable.
This simulation indicates that the Entropy values are strongly correlated with the correct class membership assignment, but it varies according to number of latent classes, sample sizes, latent class separation and number of indicators. Hence, it is hard to determine cutoff values for Entropy, the indicator of class assignment.

Key words:latent profile analysis,accuracy of class membership assignment,Entropy,latent class separation,Monte Carlo simulation

ReleaseDate:2017-12-29 15:49:13

Asparouhov, T., & Muthén, B. (2014). Auxiliary variables in mixture modeling:Three-step approaches using Mplus. Structural Equation Modeling:A Multidisciplinary Journal, 21, 329-341.

Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis & Machine Intelligence, 22, 719-725.

Biernacki, C., & Govaert, G. (1997). Using the classification likelihood to choose the number of clusters. Computing Science and Statistics, 29, 451-457.

Carragher, N., Adamson, G., Bunting, B., & McCann, S. (2009). Subtypes of depression in a nationally representative sample. Journal of Affective Disorders, 113, 88-99.

Celeux, G., & Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13, 195-212.

Collins, L. M., & Lanza, S. T. (2009). Latent class and latent transition analysis:With applications in the social, behavioral, and health sciences. London:John Wiley & Sons, Inc.

Helzer, J. E., Kraemer, H. C., & Krueger, R. F. (2006). The feasibility and need for dimensional psychiatric diagnoses. Psychological Medicine, 36, 1671-1680.

Hou, J. T., Wen, Z. L., & Cheng, Z. J. (2004). Structural equation model and its applications. Beijing:Education Science Press.[侯杰泰, 温忠麟, 成子娟. (2004). 结构方程模型及其应用. 北京:教育科学出版社.]

Kim, S. Y. (2012). Sample size requirements in single-and multiphase growth mixture models:A Monte Carlo simulation study. Structural Equation Modeling, 19, 457-476.

Kyriakopoulos, M., Stringaris, A., Manolesou, S., Radobuljac, M. D., Jacobs, B., Reichenberg, A., … Frangou, S. (2015). Determination of psychosis-related clinical profiles in children with autism spectrum disorders using latent class analysis. European Child & Adolescent Psychiatry, 24, 301-307.

Lanza, S. T., Rhoades, B. L., Greenberg, M. T., Cox, M., & The Family Life Project Key Investigators. (2011). Modeling multiple risks during infancy to predict quality of the caregiving environment:Contributions of a person-centered approach. Infant Behavior and Development, 34, 390-406.

Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston:Houghton Mifflin.

Lubke, G. H., & Miller, P. J. (2015). Does nature have joints worth carving? A discussion of taxometrics, model-based clustering and latent variable mixture modeling. Psychological Medicine, 45, 705-715.

Lubke, G., & Muthén, B. O. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling, 14, 26-47.

Lubke, G., & Neale, M. C. (2006). Distinguishing between latent classes and continuous factors:Resolution by maximum likelihood? Multivariate Behavioral Research, 41, 499-532.

Lubke, G. H., & Tueller, S. (2010). Latent class detection and class assignment:A comparison of the MAXEIG taxometric procedure and factor mixture modeling approaches. Structural Equation Modeling, 17, 605-628.

Marsh, H. W., Hau, K. T., & Wen, Z. L. (2004). In search of golden rules:Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler's (1999) findings. Structural Equation Modeling, 11, 320-341.

McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York, NY:Wiley.

McClintock, M. K., Dale, W., Laumann, E. O., & Waite, L. (2016). Empirical redefinition of comprehensive health and well-being in the older adults of the United States. Proceedings of the National Academy of Sciences of the United States of America, 113, E3071-E3080.

Meehl, P. E. (1995). Bootstraps taxometrics:Solving the classification problem in psychopathology. American Psychologist, 50, 266-275.

Mokros, A., Hare, R. D., Neumann, C. S., Santtila, P., Habermeyer, E., & Nitschke, J. (2015). Variants of psychopathy in adult male offenders:A latent profile analysis. Journal of Abnormal Psychology, 124, 372-386.

Muthén, B. (2004). Latent variable analysis:growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (Eds.), The SAGE handbook of quantitative methodology for the social sciences (pp. 345-368). Thousand Oaks, CA:Sage Publications.

Muthén, L. K., & Muthén, B. O. (1998-2015). Mplus user's guide (7.4 Ed.). Los Angeles, CA:Muthén & Muthén.

Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling:A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.

Pastor, D. A., Barron, K. E., Miller, B. J., & Davis, S. L. (2007). A latent profile analysis of college students' achievement goal orientation. Contemporary Educational Psychology, 32, 8-47.

Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., & Chen, F. N. (2001). Monte Carlo experiments:Design and implementation. Structural Equation Modeling, 8, 287-312.

Peugh, J., & Fan, X. T. (2013). Modeling unobserved heterogeneity using latent profile analysis:A Monte Carlo simulation. Structural Equation Modeling, 20, 616-639.

Ruscio, J., Haslam, N., & Ruscio, A. M. (2006). Introduction to the taxometric method:A practical guide. London:Routledge.

Sterba, S. K. (2013). Understanding linkages among mixture models. Multivariate Behavioral Research, 48, 775-815.

Tein, J. Y., Coxe, S., & Cham, H. (2013). Statistical power to detect the correct number of classes in latent profile analysis. Structural Equation Modeling, 20, 640-657.

Tofighi, D., & Enders, C. K. (2008). Identifying the correct number of classes in growth mixture models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 317-341). Greenwich, CT:Information Age Pub.

Vannucci, A., Tanofsky-Kraff, M., Crosby, R. D., Ranzenhofer, L. M., Shomaker, L. B., Field, S. E.,... & Yanovski, J. A. (2013). Latent profile analysis to determine the typology of disinhibited eating behaviors in children and adolescents. Journal of Consulting and Clinical Psychology, 81, 494-507.

Wade, T. D., Crosby, R. D., & Martin, N. G. (2006). Use of latent profile analysis to identify eating disorder phenotypes in an adult Australian twin cohort. Archives of General Psychiatry, 63, 1377-1384.

Wang, M. C. (2014). Latent variable modeling with Mplus. Chongqing:Chongqing University Press.[王孟成. (2014). 潜变量建模与Mplus应用. 重庆:重庆大学出版社.]

Wang, M. C., Bi, X. Y., & Ye, H. S. (2014). Growth mixture modeling:A method for describing specific class growth trajectory. Social Studies, 29, 220-241.[王孟成, 毕向阳, 叶浩生. (2014). 增长混合模型——分析不同类别个体发展趋势.社会学研究, 29, 220-241.]

Wang, M., & Hanges, P. J. (2011). Latent class procedures:Applications to organizational research. Organizational Research Methods, 14, 24-31.

Widiger, T. A., Livesley, W. J., & Clark, L. A. (2009). An integrative dimensional classification of personality disorder. Psychological Assessment, 21, 243-255.

Widiger, T. A., & Samuel, D. B. (2005). Diagnostic categories or dimensions? A question for the diagnostic and statistical manual of mental disorders-fifth edition. Journal of Abnormal Psychology, 114, 494-504.

Wurpts, I. C., & Geiser, C. (2014). Is adding more indicators to a latent class analysis beneficial or detrimental? Results of a Monte-Carlo study. Frontiers in Psychology, 5, 920.

Yang, C. C. (2006). Evaluating latent class analysis models in qualitative phenotype identification. Computational Statistics & Data Analysis, 50, 1090-1104.

Zachar, P., & Kendler, K. S. (2007). Psychiatric disorders:a conceptual taxonomy. American Journal of Psychiatry, 164, 557-565.

Zhang, J. T., Jiao, C., & Zhang, M. Q. (2010). Application of latent class analysis in psychological research. Advances in Psychological Science, 18, 1991-1998.[张洁婷, 焦璨, 张敏强. (2010). 潜在类别分析技术在心理学研究中的应用. 心理科学进展, 18, 1991-1998.]