Accelerating Metropolis-Hastings Algorithms by Delayed Acceptance

COUV_CAHIER_EGND_A11by M. Banterle, C. Grazian, A. Lee & C. P. Robert

MCMC algorithms such as Metropolis-Hastings algorithms are slowed down by the computation of complex target distributions as exemplified by huge datasets. We offer in this paper an approach to reduce the computational costs of such algorithms by a simple and universal divide-and-conquer strategy. The idea behind the generic acceleration is to divide the acceptance step into several parts, aiming at a major reduction in computing time that outranks the corresponding reduction in acceptance probability. The division decomposes the “prior x likelihood” term into a product such that some of its components are much cheaper to compute than others. Each of the components can be sequentially compared with a uniform variate, the first rejection signalling that the proposed value is considered no further, This approach can in turn be accelerated as part of a prefetching algorithm taking advantage of the parallel abilities of the computer at hand. We illustrate those accelerating features on a series of toy and realistic examples.

Download the paper

Bayesian Computation: A Summary of the Current State, and Samples Backwards and Forwards

COUV_CAHIER_EGND_A10by P.J. Green, K. Latuszynski, M. Pereyra & C.P. Robert

The past decades have seen enormous improvements in computational inference based on statistical models, with continual enhancement in a wide range of computational tools, in competition. In Bayesian inference, first and foremost, MCMC techniques continue to evolve, moving from random walk proposals to Langevin drift, to Hamiltonian Monte Carlo, and so on, with both theoretical and algorithmic inputs opening wider access to practitioners. However, this impressive evolution in capacity is confronted by an even steeper increase in the complexity of the models and datasets to be addressed. The difficulties of modelling and then handling ever more complex datasets most likely call for a new type of tool for computational inference that dramatically reduce the dimension and size of the raw data while capturing its essential aspects. Approximate models and algorithms may thus be at the core of the next computational revolution.

Download the paper

Expectation Propagation as a Way of Life

COUV_CAHIER_EGND_A9by Gelman, A. Vehtari, P. Jylänki, C. Robert, N. Chopin & J.P. Cunningham

We revisit expectation propagation (EP) as a prototype for scalable algorithms that partition big datasets into many parts and analyze each part in parallel to perform inference of shared parameters. The algorithm should be particularly efficient for hierarchical models, for which the EP algorithm works on the shared parameters (hyperparameters) of the model.

The central idea of EP is to work at each step with a “tilted distribution” that combines the likelihood for a part of the data with the “cavity distribution,” which is the approximate model for the prior and all other parts of the data. EP iteratively approximates the moments of the tilted distributions and incorporates those approximations into a global posterior approximation. As such, EP can be used to divide the computation for large models into manageable sizes. The computation for each partition can be made parallel with occasional exchanging of information between processes through the global posterior approximation. Moments of multivariate tilted distributions can be approximated in various ways, including, MCMC, Laplace approximations, and importance sampling.

Download the paper

Approximate Bayesian Computation in State Space Models

COUV_CAHIER_EGND_A08by G.M. Martiny, B.P.M. McCabe, W. Maneesoonthorn & C.P. Robert

A new approach to inference in state space models is proposed, based on approximate Bayesian computation (ABC). ABC avoids evaluation of the likelihood function by matching observed summary statistics with statistics computed from data simulated from the true process; exact inference being feasible only if the statistics are sufficient. With finite sample sufficiency unattainable in the state space setting, we seek asymptotic sufficiency via the maximum likelihood estimator (MLE) of the parameters of an auxiliary model. We prove that this auxiliary model-based approach achieves Bayesian consistency, and that – in a precise limiting sense – the proximity to (asymptotic) sufficiency yielded by the MLE is replicated by the score. In multiple parameter settings a separate treatment of scalar parameters, based on integrated likelihood techniques, is advocated as a way of avoiding the curse of dimensionality. Some attention is given to a structure in which the state variable is driven by a continuous time process, with exact inference typically infeasible in this case as a result of intractable transitions. The ABC method is demonstrated using the unscented Kalman filter as a fast and simple way of producing an approximation in this setting, with a stochastic volatility model for financial returns used for illustration.

Download the paper

Reliable ABC Model Choice via Random Forests

COUV_CAHIER_EGND_A07by Pudlo, J.-M. Marin, A. Estoup, J.-M. Cornuet, M. Gauthier & C.P. Robert

Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.

Download the paper

Pre-Processing for Approximate Bayesian computation in image analysis

COUV_CAHIER_EGND_A6by T. Moores, C. C. Drovandi, K. Mengersen, C.P. Robert

Most of the existing algorithms for approximate Bayesian computation (ABC) assume that it is feasible to simulate pseudo-data from the model at each iteration. However, the computational cost of these simulations can be prohibitive for high dimensional data. An important example is the Potts model, which is commonly used in image analysis. Images encountered in real world applications can have millions of pixels, therefore scalability is a major concern. We apply ABC with a synthetic likelihood to the hidden Potts model with additive Gaussian noise. Using a pre-processing step, we fit a binding function to model the relationship between the model parameters and the synthetic likelihood parameters. Our numerical experiments demonstrate that the precomputed binding function dramatically improves the scalability of ABC, reducing the average runtime required for model fitting from 71 hours to only 7 minutes. We also illustrate the method by estimating the smoothing parameter for remotely sensed satellite imagery. Without precomputation, Bayesian inference is impractical for datasets of that scale.

Download the paper

Supervised Clustering in The Data Cube

COUV_CAHIER_EGND_A04by Vincent Roulet, Fajwel Fogel, Alexandre D’aspremont & Francis Bach

We study a supervised clustering problem seeking to cluster either features, tasks or sample points using losses extracted from supervised learning problems. We formulate a unified optimization problem handling these three settings and derive algorithms whose core iteration complexity is concentrated in a k-means clustering step, which can be approximated efficiently. We test our methods on both artificial and realistic data sets extracted from movie reviews and 20NewsGroup.

Download the paper

Convex Relaxations for Permutation Problems

COUV_CAHIER_EGND_A03by Fajwel Fogel, Rodolphe Jenatton, Francis Bach & Alexandre D’aspremont

Seriation seeks to reconstruct a linear order between variables using unsorted similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We prove the equivalence between the seriation and the combinatorial 2-sum problem (a quadratic minimization problem over permutations) over a class of similarity matrices. The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we produce a convex relaxation for the 2-sum problem to improve the robustness of solutions in a noisy setting. This relaxation also allows us to impose additional structural constraints on the solution, to solve semi-supervised seriation problems. We present numerical experiments on archeological data, Markov chains and gene sequences.

Download the paper



Spectral Ranking Using Seriation

COUV_CAHIER_EGND_A2by Fajwel Fogel, Alexandre D’aspremont & Milan Vojnovic

We describe a seriation algorithm for ranking a set of n items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation algorithm recovers the true ranking when all pairwise comparisons are observed and consistent with a total order. We then show that ranking reconstruction is still exact even when some pairwise comparisons are corrupted or missing, and that seriation based spectral ranking is more robust to noise than other scoring methods. An additional benefit of the seriation formulation is that it allows us to solve semi-supervised ranking problems. Experiments on both synthetic and real datasets demonstrate that seriation based spectral ranking achieves competitive and in some cases superior performance compared to classical ranking methods.

Download the paper