doi:

DOI: 10.3724/SP.J.1042.2018.00951

Advances in Psychological Science (心理科学进展) 2018/26:6 PP.951-965

The Bayes factor and its implementation in JASP: A practical primer


Abstract:
Statistical inference plays a critical role in modern scientific research, however, the dominant method for statistical inference in science, null hypothesis significance testing (NHST), is often misunderstood and misused, which leads to unreproducible findings. To address this issue, researchers propose to adopt the Bayes factor as an alternative to NHST. The Bayes factor is a principled Bayesian tool for model selection and hypothesis testing, and can be interpreted as the strength for both the null hypothesis H0 and the alternative hypothesis H1 based on the current data. Compared to NHST, the Bayes factor has the following advantages:it quantifies the evidence that the data provide for both the H0 and the H1, it is not "violently biased" against H0, it allows one to monitor the evidence as the data accumulate, and it does not depend on sampling plans. Importantly, the recently developed open software JASP makes the calculation of Bayes factor accessible for most researchers in psychology, as we demonstrated for the t-test. Given these advantages, adopting the Bayes factor will improve psychological researchers' statistical inferences. Nevertheless, to make the analysis more reproducible, researchers should keep their data analysis transparent and open.

Key words:Bayes factor,Bayesian statistics,Frequentist,NHST,JASP

ReleaseDate:2018-07-26 11:53:56



胡传鹏, 王非, 过继成思, 宋梦迪, 隋洁, 彭凯平. (2016). 心理学研究中的可重复性问题:从危机到契机. 心理科学进展, 24(9), 1504-1518.

骆大森. (2017). 心理学可重复性危机两种根源的评估. 心理与行为研究, 15(5), 577-586.

钟建军, Dienes, Z., 陈中永. (2017). 心理研究中引入贝叶斯统计推断的必要性、应用思路与领域. 心理科学, 40(6), 1477-1482.

Bahadur, R. R., & Bickel, P. J. (2009). An optimality property of Bayes' test statistics. Lecture Notes-Monograph Series, 57, 18-30.

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533, 452-454.

Begley, C. G., & Ellis, L. M. (2012). Drug development:Raise standards for preclinical cancer research. Nature, 483(7391), 531-533.

Bem, D. J. (2011). Feeling the future:Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407-425.

Bem, D. J., Utts, J., & Johnson, W. O. (2011). Must psychologists change the way they analyze their data? Journal of Personality and Social Psychology, 101(4), 716-719.

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6-10.

Berger, J. O., & Berry, D. A. (1988). Statistical analysis and the illusion of objectivity. American Scientist, 76(2), 159-165.

Berger, J. O., & Delampady, M. (1987). Testing precise hypotheses. Statistical Science, 2(3), 317-335.

Berger, J. O., & Wolpert, R. L. (1988). The likelihood principle (2nd ed.). Hayward (CA):Institute of Mathematical Statistics.

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., … Riddell, A. (2017). Stan:A probabilistic programming language. Journal of Statistical Software, 76(1), 1-32.

Chambers, C. D., Feredoes, E., Muthukumaraswamy, S. D., & Etchells, P. J. (2014). Instead of "playing the game" it is time to change the rules:Registered Reports at AIMS Neuroscience and beyond. AIMS Neuroscience, 1(1), 4-17.

Chen, X., Lu, B., & Yan, C.-G. (2018). Reproducibility of R-fMRI metrics on the impact of different strategies for multiple comparison correction and sample sizes. Human Brain Mapping, 39(1), 300-318.

Cumming, G. (2014). The new statistics:Why and how. Psychological Science, 25(1), 7-29.

Depaoli, S., & van de Schoot, R. (2017). Improving transparency and replication in Bayesian statistics:The WAMBS-Checklist. Psychological Methods, 22(2), 240-261.

Dienes, Z. (2008). Understanding psychology as a science:An introduction to scientific and statistical inference. London, UK:Palgrave Macmillan.

Dienes, Z. (2011). Bayesian versus orthodox statistics:Which side are you on? Perspectives on Psychological Science, 6(3), 274-290.

Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5, 781.

Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B.,... Nosek, B. A. (2016). Many Labs 3:Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68-82.

Edwards, W. (1965). Tactical note on the relation between scientific and statistical hypotheses. Psychological Bulletin, 63(6), 400-402.

Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70(3), 193-242.

Etz, A. (in press). Introduction to the concept of likelihood and its applications. Advances in Methods and Practices in Psychological Science.

Francis, G. (2013). Replication, statistical consistency, and publication bias. Journal of Mathematical Psychology, 57(5), 153-169.

Gallistel, C. R. (2009). The importance of proving the null. Psychological Review, 116(2), 439-453.

Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587-606.

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., … Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power:A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337-350.

Gronau, Q. F., & Wagenmakers, E.-J. (2017). Bayesian evidence accumulation in experimental mathematics:A case study of four irrational numbers. Experimental Mathematics, 1-10.

Halsey, L. G., Curran-Everett, D., Vowler, S. L., & Drummond, G. B. (2015). The fickle P value generates irreproducible results. Nature Methods, 12(3), 179-185.

Hoijtink, H. (2011). Informative hypotheses:Theory and practice for behavioral and social scientists. Boca Raton, FL:Chapman & Hall/CRC.

Hoijtink, H., van Kooten, P., & Hulsker, K. (2016). Why Bayesian psychologists should change the way they use the Bayes factor. Multivariate Behavioral Research, 51(1), 2-10.

JASP Team. (2017). JASP (Version 0.8.2)[Computer software].

Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. Mathematical Proceedings of the Cambridge Philosophical Society, 31(2), 203-222.

Jeffreys, H. (1938). Significance tests when several degrees of freedom arise simultaneously. Proceedings of the Royal Society A:Mathematical, Physical and Engineering Sciences, 165(921), 161-198.

Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford, UK:Oxford University Press.

Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences of the United States of America, 110(48), 19313-19317.

Kerr, N. L. (1998). HARKing:Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196-217.

Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Jr., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating variation in replicability:A "many labs" replication project. Social Psychology, 45(3), 142-152.

Klugkist, I., Laudy, O., & Hoijtink, H. (2005). Inequality constrained analysis of variance:A Bayesian approach. Psychological Methods, 10(4), 477-493.

Kruschke, J. K. (2014). Doing Bayesian data analysis:A tutorial with R, JAGS, and stan (2nd ed.). San Diego, CA:Academic Press/Elsevier.

Kruschke, J. K., & Liddell, T. M. (2017a). Bayesian data analysis for newcomers. Psychonomic Bulletin & Review, 1-23.

Kruschke, J. K., & Liddell, T. M. (2017b). The Bayesian New Statistics:Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review, 1-29.

Lakens, D. (2017). Equivalence tests:A practical primer for t-Tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8(4), 355-362.

Lindley, D. V. (1993). The analysis of experimental data:The appreciation of tea and wine. Teaching Statistics, 15(1), 22-25.

Lindsay, D. S. (2015). Replication in psychological science. Psychological Science, 26(12), 1827-1832.

Lunn, D., Spiegelhalter, D., Thomas, A., & Best, N. (2009). The BUGS project:Evolution, critique and future directions. Statistics in Medicine, 28(25), 3049-3067.

Ly, A., Etz, A., Marsman, M., & Wagenmakers, E.-J. (2017). Replication Bayes factors from evidence updating. PsyArXiv. Retrieved from https://osf.io/preprints/psyarxiv/u8m2s/

Ly, A., Marsman, M., & Wagenmakers, E.-J. (2018). Analytic posteriors for Pearson's correlation coefficient. Statistica Neerlandica, 72, 4-13.

Ly, A., Verhagen, J., & Wagenmakers, E.-J. (2016a). An evaluation of alternative methods for testing hypotheses, from the perspective of Harold Jeffreys. Journal of Mathematical Psychology, 72, 43-55.

Ly, A., Verhagen, J., & Wagenmakers, E.-J. (2016b). Harold Jeffreys's default Bayes factor hypothesis tests:Explanation, extension, and application in psychology. Journal of Mathematical Psychology, 72, 19-32.

Marsman, M., & Wagenmakers, E.-J. (2017a). Bayesian benefits with JASP. European Journal of Developmental Psychology, 14(5), 545-555.

Marsman, M., & Wagenmakers, E.-J. (2017b). Three insights from a bayesian interpretation of the one-sided P value. Educational and Psychological Measurement, 77(3), 529-539.

Masson, M. E. J. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing. Behavior Research Methods, 43(3), 679-690.

Matzke, D., Nieuwenhuis, S., van Rijn, H., Slagter, H. A., van der Molen, M. W., & Wagenmakers, E.-J. (2015). The effect of horizontal eye movements on free recall:A preregistered adversarial collaboration. Journal of Experimental Psychology:General, 144(1), e1-e15.

Miller, G. (2011). ESP paper rekindles discussion about statistics. Science, 331(6015), 272-273.

Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E.-J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 23(1), 103-123.

Morey, R. D., & Rouder, J. N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16(4), 406-419.

Mulder, J., Klugkist, I., van de Schoot, R., Meeus, W. H. J., Selfhout, M., & Hoijtink, H. (2009). Bayesian model selection of informative hypotheses for repeated measurements. Journal of Mathematical Psychology, 53(6), 530-546.

Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., … Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 0021.

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., … Yarkoni, T. (2015). Promoting an open research culture. Science, 348(6242), 1422-1425.

Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific Utopia:Ⅱ. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6), 615-631.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Plummer, M. (2003). JAGS:A program for analysis of Bayesian graphical models using Gibbs sampling. Paper presented at the Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003).

Poldrack, R. A., Baker, C. I., Durnez, J., Gorgolewski, K. J., Matthews, P. M., Munafò, M. R., … Yarkoni, T. (2017). Scanning the horizon:Towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2), 115-126.

Poldrack, R. A., & Gorgolewski, K. J. (2017). OpenfMRI:Open sharing of task fMRI data. NeuroImage, 144, 259-261.

Rouder, J. N. (2014). Optional stopping:No problem for Bayesians. Psychonomic Bulletin & Review, 21(2), 301-308.

Rouder, J. N., & Morey, R. D. (2011). A Bayes factor meta-analysis of Bem's ESP claim. Psychonomic Bulletin & Review, 18(4), 682-689.

Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56(5), 356-374.

Rouder, J. N., Morey, R. D., Verhagen, J., Swagman, A. R., & Wagenmakers, E.-J. (2017). Bayesian analysis of factorial designs. Psychological Methods, 22(2), 304-321.

Rouder, J. N., Speckman, P. L., Sun, D. C., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225-237.

Salsburg, D. (2001). The lady tasting tea:How statistics revolutionized science in the twentieth century. New York, NY:W. H. Freeman and Company.

Salvatier, J., Wiecki, T. V., & Fonnesbeck, C. (2016). Probabilistic programming in Python using PyMC3. Peer J Computer Science, 2, e55.

Schervish, M. J. (1996). P values:What they are and what they are not. The American Statistician, 50(3), 203-206.

Schlaifer, R., & Raiffa, H. (1961). Applied statistical decision theory. Boston:Harvard University.

Schönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini, M. (2017). Sequential hypothesis testing with Bayes factors:Efficiently testing mean differences. Psychological Methods, 22(2), 322-339.

Scott, J. G., & Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference, 136(7), 2144-2162.

Scott, J. G., & Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statististics, 38(5), 2587-2619.

Sellke, T., Bayarri, M. J., & Berger, J. O. (2001). Calibration of ρ values for testing precise null hypotheses. The American Statistician, 55(1), 62-71.

Stephens, M., & Balding, D. J. (2009). Bayesian statistical methods for genetic association studies. Nature Reviews Genetics, 10(10), 681-690.

Stulp, G., Buunk, A. P., Verhulst, S., & Pollet, T. V. (2013). Tall claims? Sense and nonsense about the importance of height of US presidents. The Leadership Quarterly, 24(1), 159-171.

Topolinski, S., & Sparenberg, P. (2012). Turning the hands of time. Social Psychological and Personality Science, 3(3), 308-314.

van de Schoot, R., Winter, S., Ryan, O., Zondervan-Zwijnenburg, M., & Depaoli, S. (2017). A systematic review of Bayesian papers in psychology:The last 25 years. Psychological Methods, 22(2), 217-239.

Vanpaemel, W. (2010). Prior sensitivity in theory testing:An apologia for the Bayes factor. Journal of Mathematical Psychology, 54(6), 491-498.

Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779-804.

Wagenmakers, E.-J., Beek, T. F., Rotteveel, M., Gierholz, A., Matzke, D., Steingroever, H., … Pinto, Y. (2015). Turning the hands of time again:A purely confirmatory replication study and a Bayesian analysis. Frontiers in Psychology, 6, 494.

Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists:A tutorial on the Savage-Dickey method. Cognitive Psychology, 60(3), 158-189.

Wagenmakers, E.-J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., … van Doorn, J. (2017). Bayesian inference for psychology. Part Ⅱ:Example applications with JASP. Psychonomic Bulletin & Review, 1-19.

Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., … Morey, R. D. (2017). Bayesian inference for psychology. Part I:Theoretical advantages and practical ramifications. Psychonomic Bulletin & Review, 1-23.

Wagenmakers, E.-J., Verhagen, J., Ly, A., Matzke, D., Steingroever, H., Rouder, J. N., & Morey, R. D. (2017). The need for Bayesian hypothesis testing in psychological science. In S. O. Lilienfeld & I. D. Waldman (Eds.), Psychological science under scrutiny (pp. 123-138). Chichester:John Wiley & Sons, Inc.

Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data:The case of psi:Comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426-432.

Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632-638.

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values:Context, process, and purpose. The American Statistician, 70(2), 129-133.

Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E.-J. (2011). Statistical evidence in experimental psychology:An empirical comparison ssing 855 t tests. Perspectives on Psychological Science, 6(3), 291-298.

Zhu, J., Chen, J. F., Hu, W. B., & Zhang, B. (2017). Big Learning with Bayesian methods. National Science Review, 4(4), 627-651.

Ziliak, S. T., & McCloskey, D. N. (2008). The cult of statistical significance. Ann Arbor:University of Michigan Press.

Zuo, X.-N., Anderson, J. S., Bellec, P., Birn, R. M., Biswal, B. B., Blautzik, J., … Milham, M. P. (2014). An open science resource for establishing reliability and reproducibility in functional connectomics. Nature Scientific Data, 1, 140049.

Zuo, X.-N., & Xing, X.-X. (2014). Test-retest reliabilities of resting-state FMRI measurements in human brain functional connectomics:A systems neuroscience perspective. Neuroscience & Biobehavioral Reviews, 45, 100-118.