Advances in Psychological Science (心理科学进展) 2018/26:5 PP.770-780

Technology of text analysis in the big data era: Application of the topic model

Topic Model is a computerized text analysis method and has been used widely in the field of psychology.For counseling research,this method has the potential for exploring themes of conversations between the therapist and patient,comparing the semantic similarity of different treatments and establishing behavioral coding systems.Using data from social media,researchers may use topic model to identify and predict various mental disorders,carry out calculations pertaining personality.Further,this paper discusses needed improvements of the topic model,and its application in the Chinese language environment.Topic model can be used to explore the psychological meaning of Chinese texts.

Key words:topic model,text analysis,psychological counseling research,mental health,personality calculation

ReleaseDate:2018-07-02 16:10:46

陈凯, 朱钰. (2007). 机器学习及其相关算法综述. 统计与信息论坛, 22(5), 105-112.

丁轶群. (2010). 基于概率生成模型的文本主题建模及其应用(博士学位论文). 浙江大学, 杭州.

高锐, 郝碧波, 李琳, 白朔天, 朱廷绍. (2013). 中文语言心理分析软件系统的建立. 心理学与创新能力提升——第十六届全国心理学学术会议论文集. 南京.

Heidegger, M. (2009). 路标 (孙周兴 译) 上海:商务印书馆.

乐国安, 董颖红, 陈浩, 赖凯生. (2013). 在线文本情感分析技术及应用. 心理科学进展, 21(10), 1711-1719.

李湘东, 高凡, 丁丛. (2017). Lda模型下不同分词方法对文本分类性能的影响研究. 计算机应用研究, 34(1), 62-66.

刘郁文. (2017). 忧郁症线上讨论言谈之主题分析 (硕士学位论文). 台湾大学, 台北.

鲁忠义, 孙锦绣. (2007). 语义空间的研究方法. 心理学探新, 27(3), 22-28.

王甦, 汪安圣. (2006). 认知心理学. 北京:北京大学出版社.

徐戈, 王厚峰. (2011). 自然语言处理中主题模型的发展. 计算机学报, 34(8), 1423-1436.

薛孚, 陈红兵. (2015). 大数据隐私伦理问题探究. 自然辩证法研究, 31(2), 44-48.

张信勇. (2015). LIWC:一种基于语词计量的文本分析工具. 西南民族大学学报:人文社会科学版, 36(4), 101-104.

朱廷劭. (2016). 大数据时代的心理学研究及应用. 北京:科学出版社.

朱廷劭, 汪静莹, 赵楠, 刘晓倩. (2015). 论大数据时代的心理学研究变革. 新疆师范大学学报:哲学社会科学版, (4), 100-107.

Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews:Computational Statistics, 2(4), 433-459.

Andrews, M., & Vigliocco, G. (2010). The hidden Markov topic model:A probabilistic model of semantic representation. Topics in Cognitive Science, 2(1), 101-113.

Atkins, C., Rubin, T. N., Steyvers, M., Doeden, M. A., Baucom, B. R., & Christensen, A. (2012). Topic models:A novel method for modeling couple and family text data. Journal of Family Psychology, 26(5), 816-827.

Atkins, D. C., Steyvers, M., Imel, Z. E., & Smyth, P. (2014). Scaling up the evaluation of psychotherapy:Evaluating motivational interviewing fidelity via statistical text classification. Implementation Science, 9, 49.

Back, M. D., Stopfer, J. M., Vazire, S., Gaddis, S., Schmukle, S. C., Egloff, B., & Gosling, S. D. (2010). Facebook profiles reflect actual personality, not self-idealization. Psychological Science, 21(3), 372-374.

Blei, D. M., & Lafferty, J. D. (2005). Correlated topic models. In Proceedings of the 18th international conference on neural information processing systems (pp. 147-154). Vancouver, British Columbia, Canada:MIT Press.

Blei, D. M., Ng, A.Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993-1022.

Boyd-Graber, J. L., & Blei, D. M. (2009). Syntactic topic models. In Advances in Neural Information Processing Systems 26 (pp. 185-192). Lake Tahoe, Nevada, USA:MIT Press.

Choo, J., Lee, C., Reddy, C. K., & Park, H. (2013). Utopian:User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Transactions on Visualization and Computer Graphics, 19(12), 1992-2001.

Cohn, D. A., & Hofmann, T. (2001). The missing link-a probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13 (pp. 430-436). London, England:MIT Press.

de Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. In Proceedings of the Seventh international AAAI conference on weblogs and social media (pp. 128-137). Boston, MA:AAAI Publications.

De Deyne, S., Verheyen, S., Ameel, E., Vanpaemel, W., Dry, M., Voorspoels, W., & Storms, G. (2008). Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavior Research Methods, 40(4), 1030-1048.

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407.

Gaut, G., Steyvers, M., Imel, Z. E., Atkins, D. C., & Smyth, P. (2017). Content coding of psychotherapy transcripts using labeled topic models. IEEE Journal of Biomedical and Health Informatics, 21(2), 476-487.

Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science, 333(6051), 1878-1881.

Graesser, A. C., McNamara, D. S., & Kulikowich, J. M. (2011). Coh-Metrix:Providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223-234.

Greenberg, L. S., & Newman, F. L. (1996). An approach to psychotherapy change process research:Introduction to the special section. Journal of Consulting and Clinical Psychology, 64(3), 435-438.

Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211-244.

Hofmann, T. (1999, August). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 50-57). Berkeley, California, USA:ACM.

Hughes, D. J., Rowe, M., Batey, M., & Lee, A. (2012). A tale of two sites:Twitter vs. Facebook and the personality predictors of social media usage. Computers in Human Behavior, 28(2), 561-569.

Hu, Y. N., Boyd-Graber, J., Satinoff, B., & Smith, A. (2014). Interactive topic modeling. Machine Learning, 95(3), 423-469.

Hu, Z., Liu, Y. S., Zhang, C. H., & Xu, Y. N. (2017, June). The analysis of topic's personality traits using a new topic model. In 2017 2nd international conference on image, vision and computing (ICIVC) (pp. 1079-1083). Chengdu:IEEE.

Imel, Z. E., Steyvers, M., & Atkins, D. C. (2015). Computational psychotherapy research:Scaling up the evaluation of patient-provider interactions. Psychotherapy, 52(1), 19-30.

Ji, Y. F., Hong, H., Arriaga, R., Rozga, A., Abowd, G., & Eisenstein, J. (2014). Mining themes and interests in the Asperger's and autism community. In Workshop on computational linguistics and clinical psychology:From linguistic signal to clinical reality (pp. 97-106). Baltimore, Maryland USA:ACL.

John Lu, Z. Q. (2010). The elements of statistical learning:Data mining, inference, and prediction. Journal of the Royal Statistical Society:Series A (Statistics in Society), 173(3), 693-694.

Kosinski, M., Matz, S. C., Gosling, S. D., Popov, V., & Stillwell, D. (2015). Facebook as a research tool for the social sciences:Opportunities, challenges, ethical considerations, and practical guidelines. American Psychologist, 70(6), 543-556.

Kosinski, M., Wang, Y. L., Lakkaraju, H., & Leskovec, J. (2016). Mining big data to extract patterns and predict real-life outcomes. Psychological Methods, 21(4), 493-506.

Lee, H., Kihm, J., Choo, J., Stasko, J., & Park, H. (2012). iVisClustering:An interactive visual document clustering via topic modeling. Computer Graphics Forum, 31, 1155-1164.

Lee, T. Y., Smith, A., Seppi, K., Elmqvist, N., Boyd-Graber, J., & Findlater, L. (2017). The human touch:How non-expert users perceive, interpret, and fix topic models. International Journal of Human-Computer Studies, 105, 28-42.

Liu, Y. Z., Wang, J. J., & Jiang, Y. C. (2016). PT-LDA:A latent variable model to predict personality traits of social network users. Neurocomputing, 210, 155-163.

Miller, W. R., Moyers, T. B., Ernst, D., & Amrhein, P. (2008). Manual for the Motivational Interviewing Skill Code (MISC). Version 2. 1. University of New Mexico, Center on Alcoholism.

Mitchell, M., Hollingshead, K., & Coppersmith, G. (2015, June). Quantifying the language of schizophrenia in social media. In Proceedings of the 2nd workshop on computational linguistics and clinical psychology:From linguistic signal to clinical reality (pp. 11-20). Denver, Colorado:ACL.

Nguyen, T., Phung, D., Dao, B., Venkatesh, S., & Berk, M. (2014). Affective and content analysis of online depression communities. IEEE Transactions on Affective Computing, 5(3), 217-226.

Ortigosa, A., Carro, R. M., & Quiroga, J. I. (2014). Predicting user personality by mining social interactions in Facebook. Journal of Computer and System Sciences, 80(1), 57-71.

Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J.,... Seligman, M. E. P. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108(6), 934-952.

Paul, M. J., & Dredze, M. (2014). Discovering health topics in social media using topic models. PLoS One, 9(8), e103408.

Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The development and psychometric properties of liwc2007. Austin, 29(11), 1020-1025.

Pennebaker, J. W., & King, L. A. (1999). Linguistic styles:Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296-1312.

Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language use:Our words, our selves. Annual Review of Psychology, 54(1), 547-577.

Preotiuc-Pietro, D., Eichstaedt, J., Park, G., Sap, M., Smith, L., Tobolsky, V.,... Ungar, L. (2015, June). The role of personality, age and gender in tweeting about mental illnesses. In Proceedings of the 2nd workshop on computational linguistics and clinical psychology:From linguistic signal to clinical reality (pp. 21-30). Denver, Colorado:Association for Computational Linguistics.

Priva, U. C., & Austerweil, J. L. (2015). Analyzing the history of Cognition using topic models. Cognition, 135, 4-9.

Quercia, D., Lambiotte, R., Stillwell, D., Kosinski, M., & Crowcroft, J. (2012, February). The personality of popular Facebook users. In Proceedings of the ACM 2012 conference on computer supported cooperative work (pp. 955-964). Seattle, Washington, USA:ACM.

Rubin, T. N., Chambers, A., Smyth, P., & Steyvers, M. (2012). Statistical topic models for multi-label document classification. Machine Learning, 88(1-2), 157-208.

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M.,... Ungar, L. H. (2013). Personality, gender, and age in the language of social media:The open-vocabulary approach. PLoS One, 8(9), e73791.

Schwartz, H. A., Eichstaedt, J., Kern, M. L., Park, G., Sap, M., Stillwell, D.,... Ungar, L. (2014, June). Towards assessing changes in degree of depression through Facebook. In Proceedings of the workshop on computational linguistics and clinical psychology:From linguistic signal to clinical reality (pp. 118-125). Baltimore, Maryland USA:Association for Computational Linguistics.

Steyvers, M., Smyth, P., & Chemuduganta, C. (2011). Combining background knowledge and learned topics. Topics in Cognitive Science, 3(1), 18-47.

Tanana, M., Hallgren, K. A., Imel, Z. E., Atkins, D. C., & Srikumar, V. (2016). A comparison of natural language processing methods for automated coding of motivational interviewing. Journal of Substance Abuse Treatment, 65, 43-50.

Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words:LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24-54.

Tucker, G. J., & Rosenberg, S. D. (1975). Computer content analysis of schizophrenic speech:A preliminary report. The American Journal of Psychiatry, 132(6), 611-616.

Wallach, H. M. (2006, June). Topic modeling:Beyond bag-of-words. In Proceedings of the 23rd international conference on machine learning (pp. 977-984). Pittsburgh, Pennsylvania, USA:ACM.

Wang, C., Blei, D., & Heckerman, D. (2012). Continuous time dynamic topic models. arXiv preprint arXiv:1206.3298.

Wang, X. R., McCallum, A., & Wei, X. (2007, October). Topical n-grams:Phrase and topic discovery, with an application to information retrieval. In Seventh IEEE international conference on data mining, 2007 (pp. 697-702). Omaha, NE:IEEE.

Weusthoff, S., Gaut, G., Steyvers, M., Atkins, D. C., Hahlweg, K., Hogan, J.,... Narayanan, S. (2016). The Language of Interpersonal Interaction:An Interdisciplinary Approach to Assessing and Processing Vocal and Speech Data. The European Journal of Counselling Psychology.

Wu, Y. Y., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences of the United States of America, 112(4), 1036-1040.

Zimmermann, T., Baucom, D. H., Irvin, J. T., & Heinrichs, N. (2015). Cross-country perspectives on social support in couples coping with breast cancer. Frontiers in Psychological and Behavioral Science, 4(4), 52-61.