Given a collection of estimators, the problem of linear, convex or model selection type aggregation consists in constructing a new estimator, called the aggregate, which is nearly as good as the best among them (or nearly as good as their best linear or convex combination), with respect to a given risk criterion. When the underlying model is sparse, which means that it is well approximated by a linear combination of a small number of functions in the dictionary, the aggregation techniques turn out to be very useful in taking advantage of sparsity. On the other hand, aggregation is a general technique of producing adaptive nonparametric estimators, which is more powerful than the classical methods since it allows one to combine estimators of different nature. Aggregates are usually constructed by mixing the initial estimators or functions of the dictionary with data-dependent weights that can be computed is several possible ways. Important example is given by aggregates with exponential weights. They satisfy sharp oracle inequalities that allow one to treat in a unified way three different problems: Adaptive nonparametric estimation, aggregation and sparse estimation.
High dimensional Poisson regression has become a standard framework for the analysis of massive counts datasets. In this work we estimate the intensity function of the Poisson regression model by using a dictionary approach, which generalizes the classical basis approach, combined with a Lasso or a group-Lasso procedure. Selection depends on penalty weights that need to be calibrated. Standard methodologies developed in the Gaussian framework can not be directly applied to Poisson models due to heteroscedasticity. Here we provide data-driven weights for the Lasso and the group-Lasso derived from concentration inequalities adapted to the Poisson case. We show that the associated Lasso and group-Lasso procedures are theoretically optimal in the oracle approach. Simulations are used to assess the empirical performance of our procedure, and an original application to the analysis of Next Generation Sequencing data is provided.
We consider the problem of estimating a low rank covariance function K(t, u) of a Gaussian process S(t), t ∈ [0, 1] based on n i.i.d. copies of S observed in a white noise. We suggest a new estimation procedure adapting simultaneously to the low rank structure and the smoothness of the covariance function. The new procedure is based on nuclear norm penalization and exhibits superior performances as compared to the sample covariance function by a polynomial factor in the sample size n. Other results include a minimax lower bound for estimation of low-rank covariance functions showing that our procedure is optimal as well as a scheme to estimate the unknown noise variance of the Gaussian process.
We consider the problem of recovering sparse vectors from underdetermined linear measurements via ℓp-constrained basis pursuit. Previous analyses of this problem based on generalized restricted isometry properties have suggested that two phenomena occur if p≠2. First, one may need substantially more than s log (en/s) measurements (optimal for p = 2) for uniform recovery of all s-sparse vectors. Second, the matrix that achieves recovery with the optimal number of measurements may not be Gaussian (as for p = 2). We present a new, direct analysis which shows that in fact neither of these phenomena occur. Via a suitable version of the null space property we show that a standard Gaussian matrix provides ℓq/ℓ1-recovery guarantees for ℓp-constrained basis pursuit in the optimal measurement regime. Our result extends to several heavier-tailed measurement matrices. As an application, we show that one can obtain a consistent reconstruction from uniform scalar quantized measurements in the optimal measurement regime.
This paper considers the problem of recovery of a low-rank matrix in the situation when most of its entries are not observed and a fraction of observed entries are corrupted. The observations are noisy realizations of the sum of a low rank matrix, which we wish to recover, with a second matrix having a complementary sparse structure such as element-wise or column-wise sparsity. We analyze a class of estimators obtained by solving a constrained convex optimization problem that combines the nuclear norm and a convex relaxation for a sparse constraint. Our results are obtained for the simultaneous presence of random and deterministic patterns in the sampling scheme. We provide guarantees for recovery of low-rank and sparse components from partial and corrupted observations in the presence of noise and show that the obtained rates of convergence are minimax optimal.
Several new estimation methods have been recently proposed for the linear regression model with observation error in the design. Different assumptions on the data generating process have motivated different estimators and analysis. In particular, the literature considered (1) observation errors in the design uniformly bounded by some ¯δ, and (2) zero mean independent observation errors. Under the first assumption, the rates of convergence of the proposed estimators depend explicitly on ¯δ, while the second assumption has been applied when an estimator for the second moment of the observational error is available. This work proposes and studies two new estimators which, compared to other procedures for regression models with errors in the design, exploit an additional ℓ∞-norm regularization. The first estimator is applicable when both (1) and (2) hold but does not require an estimator for the second moment of the observational error. The second estimator is applicable under (2) and requires an estimator for the second moment of the observation error. Importantly, we impose no assumption on the accuracy of this pilot estimator, in contrast to the previously known procedures. As the recent proposals, we allow the number of covariates to be much larger than the sample size. We establish the rates of convergence of the estimators and compare them with the bounds obtained for related estimators in the literature. These comparisons show interesting insights on the interplay of the assumptions and the achievable rates of convergence.
We consider the linear regression model with observation error in the design. In this setting, we allow the number of covariates to be much larger than the sample size. Several new estimation methods have been recently introduced for this model. Indeed, the standard Lasso estimator or Dantzig selector turn out to become unreliable when only noisy regressors are available, which is quite common in practice. We show in this work that under suitable sparsity assumptions, the procedure introduced in  is almost optimal in a minimax sense and, despite non-convexities, can be efficiently computed by a single linear programming problem. Furthermore, we provide an estimator attaining the minimax efficiency bound. This estimator is written as a second order cone programming minimisation problem which can be solved numerically in polynomial time.
by Simon Barthelmé & Nicolas Chopin
Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Extending classical results on the multinomial-Poisson transform (Baker, 1994), we show that inferring the parameters of a unnormalised model on a space Ω can be mapped onto an equivalent problem of estimating the intensity of a Poisson point process on Ω.
The unnormalised statistical model now specifies an intensity function that does not need to be normalised. Effectively, the normalisation constant may now be inferred as just another parameter, at no loss of information. The result can be extended to cover non-IID models, which includes for example unnormalised models for sequences of graphs (dynamical graphs), or for sequences of binary vectors. As a consequence, we prove that unnormalised parameteric inference in non-IID models can be turned into a semi-parametric estimation problem. Moreover, we show that the noise-contrastive estimation method of Gutmann and Hyvärinen (2012) can be understood as an approximation of the Poisson transform, and extended to non-IID settings. We use our results to fit spatial Markov chain models of eye movements, where the Poisson transform allows us to turn a highly non-standard model into vanilla semi-parametric logistic regression.
We develop a scoring and classification procedure based on the PAC-Bayesian approach and the AUC (Area Under Curve) criterion. We focus initially on the class of linear score functions. We derive PAC-Bayesian non-asymptotic bounds for two types of prior for the score parameters: a Gaussian prior, and a spike-and-slab prior; the latter makes it possible to perform feature selection. One important advantage of our approach is that it is amenable to powerful Bayesian computational tools. We derive in particular a Sequential Monte Carlo algorithm, as an efficient method which may be used as a gold standard, and an Expectation-Propagation algorithm, as a much faster but approximate method. We also extend our method to a class of non-linear score functions, essentially leading to a nonparametric procedure, by considering a Gaussian process prior.
A non-parametric extension of control variates is presented. These leverage gradient information on the sampling density to achieve substantial variance reduction. It is not required that the sampling density be normalised. The novel contribution of this work is based on two important insights; (i) a trade-off between random sampling and deterministic approximation and (ii) a new gradient-based function space derived from Stein’s identity. Unlike classical control variates, our estimators achieve super-root-n convergence, often requiring orders of magnitude fewer simulations to achieve a fixed level of precision. Theoretical and empirical results are presented, the latter focusing on integration problems arising in hierarchical models and models based on non-linear ordinary differential equations.
Nonlinear non-Gaussian state-space models are ubiquitous in statistics, econometrics, information engineering and signal processing. Particle methods, also known as Sequential Monte Carlo (SMC) methods, provide reliable numerical approximations to the associated state inference problems. However, in most applications, the state-space model of interest also depends on unknown static parameters that need to be estimated from the data. In this context, standard particle methods fail and it is necessary to rely on more sophisticated algorithms. The aim of this paper is to present a comprehensive review of particle methods that have been proposed to perform static parameter estimation in state-space models. We discuss the advantages and limitations of these methods and illustrate their performance on simple models.
Sequential Monte Carlo algorithms (also known as particle filters) are popular methods to approximate filtering (and related) distributions of state-space models. However, they converge at the slow 1/√N rate, which may be an issue in real-time data-intensive scenarios. We give a brief outline of SQMC (Sequential Quasi-Monte Carlo), a variant of SMC based on low-discrepancy point sets proposed by Gerber and Chopin (2015), which converges at a faster rate, and we illustrate the greater performance of SQMC on autonomous positioning problems.
This article is a self-contained introduction to the Metropolis- Hastings algorithm, this ubiquitous tool for producing dependent simula- tions from an arbitrary distribution. The document illustrates the principles of the methodology on simple examples with R codes and provides entries to the recent extensions of the method.
The PAC-Bayesian approach is a powerful set of techniques to derive nonasymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately intractable. One may sample from it using Markov chain Monte Carlo, but this is often too slow for big datasets. We consider instead variational approximations of the Gibbs posterior, which are fast to compute. We undertake a general study of the properties of such approximations. Our main finding is that such a variational approximation has often the same rate of convergence as the original PAC-Bayesian procedure it approximates. We specialise our results to several learning tasks (classication, ranking, matrix completion), discuss how to implement a variational approximation in each case, and illustrate the good properties of said approximation on real datasets.
Gerber and Chopin (2015) recently introduced Sequential quasi-Monte Carlo (SQMC) algorithms as an efficient way to perform filtering in state-space models. The basic idea is to replace random variables with low-discrepancy point sets, so as to obtain faster convergence than with standard particle filtering. Gerber and Chopin (2015) describe briefly several ways to extend SQMC to smoothing, but do not provide supporting theory for this extension. We discuss more thoroughly how smoothing may be performed within SQMC, and derive convergence results for the so-obtained smoothing algorithms. We consider in particular SQMC equivalents of forward smoothing and forward filtering backward sampling, which are the most well-known smoothing techniques. As a preliminary step, we provide a generalization of the classical result of Hlawka and Mück (1972) on the transformation of QMC point sets into low discrepancy point sets with respect to non uniform distributions. As a corollary of the latter, we note that we can slightly weaken the assumptions to prove the consistency of SQMC.