Jeffreys paradoxes and the Bayes factor controversy

The new schedule of posts here from The Information Age continues with its three per week installment. I hope that with this new schedule the selection of topics to post about ends up being improved.

Anyone following this blog might already have noticed that I often share some of the Xi’an’s Og blog content. A blog by a professional statistician and a person with real knowledge of the advanced mathematics used in cutting edge data science and machine learning research.

The blog does not post only on the mentioned topics, as it also posts a lot of photos, images and even some graphic novel style comedy dialogues, that the author deemed interesting or appropriate. But we here will exclusively be concerned with the data science/machine learning topics this blogs usually deals with.

One entry recently was quite appropriate to read and share. It was about the author of this blog controversy about his positions about what he himself called the Bayes factor demise, and a reply by colleagues in the Journal of Mathematical Psychology. I also had posted/shared a former Xi’an’s Og comment about a post dealing with similar issues involving the Lindley’s paradox, so this is somewhat of a continuation of mindsets. The links to the relevant papers found in this post are well worth a click by the interested reader, with an obviously more than just reading of the abstract take on the subject. But I here just want to highlight some of the passages I think would be worth remembering and note. Below is the Abstract and the concluding remarks of the paper by the author of Xi’ an’s Og about this controversy (worthwhile to check the full paper):

a response by Ly, Verhagen, and Wagenmakers:

 Rather unsurprisingly (!), the authors agree with my position on the dangers to ignore decisional aspects when using the Bayes factor. A point of dissension is the resolution of the Jeffreys[-Lindley-Bartlett] paradox. One consequence derived by Alexander and co-authors is that priors should change between testing and estimating. Because the parameters have a different meaning under the null and under the alternative, a point I agree with in that these parameters are indexed by the model [index!]. But with which I disagree when arguing that the same parameter (e.g., a mean under model M¹) should have two priors when moving from testing to estimation. To state that the priors within the marginal likelihoods “are not designed to yield posteriors that are good for estimation” (p.45) amounts to wishful thinking. I also do not find a strong justification within the paper or the response about choosing an improper prior on the nuisance parameter, e.g. σ, with the same constant. Another a posteriori validation in my opinion. However, I agree with the conclusion that the Jeffreys paradox prohibits the use of an improper prior on the parameter being tested (or of the test itself). A second point made by the authors is that Jeffreys’ Bayes factor is information consistent, which is correct but does not solved my quandary with the lack of precise calibration of the object, namely that alternatives abound in a non-informative situation.
The paper mentioned as a response to Xi’ author’s paper is also well worth to check. Not least because it is published in the Journal of Mathematical Psychology. I was wondering if serious psychology is treated just like any other serious scientific field, as too often it is rather seen as a poorer between peers of Science with capital letter.
The second part of the comments is highly supportive of our mixture approach and I obviously appreciate very much this support! Especially if we ever manage to turn the paper into a discussion paper! The authors also draw a connection with Harold Jeffreys’ distinction between testing and estimation, based upon Laplace’s succession rule. Unbearably slow succession law. Which is well-taken if somewhat specious since this is a testing framework where a single observation can send the Bayes factor to zero or +∞. (I further enjoyed the connection of the Poisson-versus-Negative Binomial test with Jeffreys’ call for common parameters. And the supportive comments on our recent mixture reparameterisation paper with Kaniav Kamari and Kate Lee.) The other point that the Bayes factor is more sensitive to the choice of the prior (beware the tails!) can be viewed as a plus for mixture estimation, as acknowledged there.

The expected demise of the Bayes factor


This note is a discussion commenting on the paper by Ly et al. on “Harold Jeffreys’s Default Bayes Factor Hypothesis Tests: Explanation, Extension, and Application in Psychology” and on the perceived shortcomings of the classical Bayesian approach to testing, while reporting on an alternative approach advanced by Kamary, Mengersen, Robert and Rousseau (2014. arxiv:1412.2044) as a solution to this quintessential inference problem.

From the conclusion of this paper we could read:
“In induction there is no harm in being occasionally wrong; it is inevitable that we shall be.” H. Jeffreys, ToP (p.302)
As a genuine pioneer in the field, Harold Jeffreys1 set a well-defined track, namely the Bayes factor, for conducting Bayesian testing and by extension model selection, a track that has become the norm in Bayesian analysis, while incorporating the fundamental aspect of reference priors and highly limited prior information. However, I see this solution as a child of its time, namely, as impacted by the ongoing formalisation of testing by other pioneers like Jerzy Neyman or Egon Pearson. Returning a single quantity for the comparison of two models fits naturally in decision making, but I strongly feel in favour of the alternative route that Bayesian model comparison should abstain from automated and hard decision making. Looking at the marginal likelihood of a model as evidence makes it harder to refrain from setting decision bounds when compared with returning a posterior distribution on α or an associated predictive quantity, as further discussed in Kamary et al. (2014). Different perspectives on this issue of constructing reference testing solutions are obviously welcome, from the incorporation of testing into the PC priors and baseline models of (Simpson et al., 2014) to the non-local tests of Johnson and Rossell (2010), and I would most gladly welcome exchanges on such perspectives.
The tree of knowledge (inspired by the featured image of this post resembling an ancient book frontispiece) is enhanced with a readership such as this, certainly. 😊
featured image: a response by Ly, Verhagen, and Wagenmakers

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s