Following yesterday’s post about Approximate Bayesian Computation (ABC) I will continue today with this topic. I will share, comment and briefly review a short paper I found while doing a bit of research. It also featured in yesterday’s post from the Xi’an’s Og Blog post and it is about an implementation of ABC using the python scripting scientific computing language. The paper introduces the Engine for Likelihood-free Inference (ELFI), a software package in the language mentioned built for performing approximate Bayesian inference that can be used when the likelihood function is unknown or hard to obtain.
The likelihood function is an important function in the context of statistical inference and Bayesian statistics. But, and its Wikipedia entry points out, there is an important distinction to be made about the use and context of the likelihood function. If we are using the likelihood function in the context of unavailable data and from here to determine future outcomes, then Likelihood is an informal synonym of Probability. If instead we have available data and need to determine the function or parameter vector that best describe it given an outcome, then the correct term is Likelihood Function. It is this definition that is appropriate in dealing with inference problems:
In statistics, a likelihood function (often simply the likelihood) is a function of the parameters of a statistical model given data. Likelihood functions play a key role in statistical inference, especially methods of estimating a parameter from a set of statistics. In informal contexts, “likelihood” is often used as a synonym for “probability.” In statistics, a distinction is made depending on the roles of outcomes vs. parameters. Probability is used before data are available to describe possible future outcomes given a fixed value for the parameter (or parameter vector). Likelihood is used after data are available to describe a function of a parameter (or parameter vector) for a given outcome.
We introduce an Engine for Likelihood-Free Inference (ELFI), a software package for approximate Bayesian inference that can be used when the likelihood function is difficult to evaluate or unknown, but a generative simulator model exists. The software is in Python, and its modular library design emphasizes both ease-of-use and expandability, allowing arbitrary user-defined simulators and implementation of new inference methods with minimal effort. Probabilistic inference models can be represented intuitively as graphs, and users can execute the inference in a computational environment best suited for their needs, from single laptops to cluster computers. The whole inference pipeline is automatically parallelized, and intermediate results may be stored to disk for later use. The package includes implementations of some of the most advanced likelihood-free inference techniques. One example of these is BOLFI, which estimates the discrepancy function using Gaussian processes and uses Bayesian optimization for parameter search, which has recently been shown to accelerate likelihood-free inference up to several orders of magnitude.
It is important to underline the fact that Likelihood-free does not mean that the Bayesian likelihood function is absent in the models. What is done is a clever way to bypass it given the difficulty in precisely determine it. That clever way involves indirect inference, synthetic likelihood and approximate Bayesian computation (ABC), that often assumes the properties of the simulated data has a smoothly variation with changes with the model parameters:
The likelihood function associated with a probabilistic model is often difficult or impossible to evaluate directly, in particular when the generative model is only defined as an executable simulator. Inference of such models may be accomplished with indirect inference, synthetic likelihood or Approximate Bayesian Computation (ABC) [2, 7, 9]. All of these methods are based on the assumption that the qualities of the simulated data vary relatively smoothly with changes in the model parameters. The general approach is that observed data are systematically compared to data acquired from simulations with sampled parameters. For example, in the case of ABC, this allows us to create an approximate posterior distribution for the parameters of interest. [5, 6, 4].
The development of ELFI is particularly interesting for two sorts of reasons: first it is a modular and parallelizable structure, and this increases the computational power of existent inference pipelines; second it is written in a programming language the ease but significance of which ensures a provision of motivated developers. Indeed statistical inference has been the R language programming garden of Eden, but now the addition of Python with a highly modular pipeline broadens the scope of this statistical computation task, and in a way the is both beneficial to R (or any other language) and Python:
Multiple software packages for ABC already exist in many programming languages . However, with the increasing popularity of the Python programming language also among data scientists, an expandable Python package for likelihood-free inference is attractive, in particular because the most efficient machine learning based tools for accelerating ABC inference have recently become available in Python and these options remain unavailable in existing ABC packages. We introduce a general, modular ABC framework: Engine for Likelihood-Free Inference (ELFI).
Description of ELFI
One important aspect of ELFI is how it allows the researcher or developer to define the inference problem in the form of graphical models, which in turn allows for the creation of complex probabilistic models. Furthermore its built-in simulator defines implicitly the Likelihood Function, the parameters of which are contained in that same probabilistic models:
The ELFI Python package has been designed with three main requirements in mind. First ELFI is easy to use and features a user interface that allows the researcher to define the inference problem in the form of graphical models. This supports the intuitive creation of complex probabilistic models. In likelihood free inference (LFI), the probabilistic model contains parameters for which the likelihood is defined implicitly by a simulator model. The simulator may be any user-defined Python function or a binary executable that fulfils minimal interface requirements.
The parallelization is guaranteed by Dask, a Python package scaled specifically to perform distributed computing from a laptop to a distributed cluster environment…
Second, any modern software should take advantage of the increasing number of processing units in computing hardware. Therefore, ELFI has been implemented so that running the user-defined simulator, the calculation of the similarity metric and feasible parts of the ABC algorithms, are automatically parallelized for distributed computing. The parallelization has been implemented with the Python package Dask , which scales well from a laptop computer up to a distributed cluster environment.
… in a modular object-oriented paradigm fashion, the computational efficiency of which surprises:
Third, ELFI is modular and follows the object-oriented paradigm. The development and testing of new inference methods on top of the framework requires minimal effort and provides all the advantages of existing functionality, such as parallelization, handling the states of the pseudo random number generators and persisting results to disk. Due to the object-oriented approach, one can readily extend existing functionality. Currently included LFI methods are the rejection sampling, the sequential Monte Carlo sampling and the Bayesian Optimization for Likelihood-Free Inference (BOLFI) framework, which has been shown to reduce the total computational burden by several orders of magnitude .
The import to Python and conclusion
This short 4-page paper is nevertheless significant from a technological and scientific perspective, worth several hundreds of pages of efforts in other papers or even academic thesis. Many times smaller is really beautiful. I will finish this post with a figure of the Python code used to compute a simple graph simulator of a moving average (MA) problem, where ELFI computes the inference parameters of interest: