One of my recent Blog discoveries is this excellent Xi’ An’ OG. This is a Blog about Statistics with an unusual layout and title. The title may indeed confuse the newcomer as to the real origin of its author… But anyway, the content is of assured quality, specially on statistics topics.
I couldn’t but repost here today the recent post in Xi’ An’ Og about Bayesian parameter estimation and how it gets easily misinterpreted with Bayesian model comparison regarding the null hypothesis test, when possibly no such confusion should ever arise. The author of the post tries to precisely give us his take on this, and obviously I thought it appropriate to reproduce it as an enhancement to statistical and data literacy efforts of The Information Age.
Interesting also the motive of the post being a paper, which is not fully reviewed but was fully read through. As it should always be for serious researchers and serious critical thinking on any particular subject:
John Kruschke [of puppies’ fame!] wrote a paper in Perspectives in Psychological Science a few years ago on the comparison between two Bayesian approaches to null hypotheses. Of which I became aware through a X validated question that seemed to confuse Bayesian parameter estimation with Bayesian hypothesis testing.
After reading this paper, I realised that Kruschke meant something completely different, namely that a Bayesian approach to null hypothesis testing could operate from the posterior on the corresponding parameter, rather than to engage into formal Bayesian model comparison (null versus the rest of the World). The notion is to check whether or not the null value stands within the 95% [why 95?] HPD region [modulo a buffer zone], which offers the pluses of avoiding a Dirac mass at the null value and a long-term impact of the prior tails on the decision, with the minus of replacing the null with a tolerance region around the null and calibrating the rejection level. This opposition is thus a Bayesian counterpart of running tests on point null hypotheses either by Neyman-Pearson procedures or by confidence intervals. Note that in problems with nuisance parameters this solution requires a determination of the 95% HPD region associated with the marginal on the parameter of interest, which may prove a challenge.
While I agree with most of the critical assessment of Bayesian model comparison, including Kruschke’s version of Occam’s razor [and Lindley’s paradox] above, I do not understand how Bayesian model comparison fails to return a full posterior on both the model indices [for model comparison] and the model parameters [for estimation]. To state that it does not because the Bayes factor only depends on marginal likelihoods (p.307) sounds unfair if only because most numerical techniques to approximate the Bayes factors rely on preliminary simulations of the posterior. The point that the Bayes factor strongly depends on the modelling of the alternative model is well-taken, albeit the selection of the null in the “estimation” approach does depend as well on this alternative modelling. Which is an issue if one ends up accepting the null value and running a Bayesian analysis based on this null value.
I would add here this passage from the Lindley’s paradox Wikipedia’s entry:
Although referred to as a paradox, the differing results from the Bayesian and frequentist approaches can be explained as using them to answer fundamentally different questions, rather than actual disagreement between the two methods.
Nevertheless, for a large class of priors the differences between the frequentist and Bayesian approach are caused by keeping the significance level fixed: as even Lindley recognized, “the theory does not justify the practice of keeping the significance level fixed” and even “some computations by Prof. Pearson in the discussion to that paper emphasized how the significance level would have to change with the sample size, if the losses and prior probabilities were kept fixed.” In fact, if the critical value increases with the sample size suitably fast, then the disagreement between the frequentist and Bayesian approaches becomes negligible as the sample size increases.
Featured Image: Tutorial on testing at O’Bayes 2015, Valencià, June 1, 2015