Bayesian Sparse Recovery
Why a Probabilistic Model for Sparsity?
Sections 14.1-14.3 treated sparsity as a deterministic structural constraint. A richer viewpoint places a prior on that expresses sparsity probabilistically. The point is that a prior gives us more than a single point estimate: it yields a posterior whose mean is the MMSE estimator, whose mode is the MAP estimator, and whose marginals quantify support uncertainty. In Bayesian language, LASSO is the MAP under a Laplace prior, -minimization is the MAP under a spike prior, and sparse Bayesian learning supplies a tractable approximation to the otherwise combinatorial marginalisation over supports.
Definition: Spike-and-Slab Prior
Spike-and-Slab Prior
The spike-and-slab prior on is the mixture where is the activity probability and the nonzero-component variance. The are assumed independent across .
The Dirac mass at (the "spike") encodes exact sparsity; the Gaussian (the "slab") models nonzero magnitudes. Marginalising over the latent activity indicator recovers the spike-and-slab distribution.
Definition: Bernoulli-Gaussian Model
Bernoulli-Gaussian Model
The Bernoulli-Gaussian (BG) model is the hierarchical formulation of the spike-and-slab: with observation , .
Theorem: MAP under Spike-and-Slab Recovers -Regularized Estimation
For the Bernoulli-Gaussian model with observation noise , the MAP estimator solves where . In the limit , the penalty reduces to .
Sparser priors () increase and force more entries to zero. MAP under a Laplace prior gives LASSO; MAP under spike-and-slab gives the NP-hard problem. The probabilistic hierarchy is what separates computationally tractable (Laplace) from intractable (spike) MAP.
Write the log-posterior
By Bayes' rule, . The Gaussian likelihood gives .
Evaluate the prior on the nonzero set
For each , if and otherwise. Summing, .
Identify the regularizer
Maximizing the log-posterior is equivalent to minimizing , with as stated. In the weak-prior limit , the term vanishes and we obtain the program.
Posterior Mean is the MMSE Estimator
Under the Bernoulli-Gaussian model, the posterior mean is the MMSE estimator β it minimizes the Bayesian MSE . Unlike MAP, the posterior mean averages over all support configurations weighted by posterior probability, producing a soft estimate that is generally not exactly sparse. For recovery tasks where the MSE is the operational metric (e.g.\ channel estimation), the posterior mean outperforms MAP; for support identification, MAP or thresholded posterior mean is preferred.
Definition: Sparse Bayesian Learning (SBL)
Sparse Bayesian Learning (SBL)
SBL replaces the Bernoulli mixture with a continuous automatic-relevance-determination (ARD) prior: each has its own variance , and SBL estimates by type-II maximum likelihood (marginal-likelihood maximization), then computes the posterior mean of conditioned on . Components with are pruned from the model.
SBL via EM
Complexity: Per-iteration cost for the posterior covariance; can be reduced to via Woodbury.Theorem: SBL Fixed Points are Sparse
Every fixed point of the SBL EM iteration has at most nonzero entries, where is the number of observations. Equivalently, the recovered is supported on at most indices.
The reader should verify this against the LASSO solution: LASSO too has at most nonzero entries (under mild genericity). SBL inherits the same "at-most- active components" structure because both reduce to a rank- subspace selection.
Characterize the type-II likelihood
The marginal log-likelihood of under ARD is, up to constants, with .
Exploit rank arguments
Any stationary point satisfies . Wipf & Rao show that is convex on faces of the positive orthant and that its unconstrained maxima lie on faces where at most coordinates are positive.
Support reduction
Indices with force exactly. Thus .
Example: SBL vs LASSO on a Small Problem
Consider , , true sparsity , i.i.d.\ Gaussian sensing matrix, and SNR dB. Compare support recovery and MSE of SBL versus LASSO (cross-validated ) over 50 trials.
Experimental setup
Generate with i.i.d.\ entries. Draw a random -sparse with nonzero entries , form , .
Observed outcome
In typical runs, SBL identifies the true support with higher probability ( vs for LASSO) and achieves - lower reconstruction MSE. The gap widens as grows.
Interpretation
LASSO's bias toward shrinking large coefficients (a consequence of the Laplace prior) hurts magnitude reconstruction. SBL's ARD prior is approximately scale-invariant once a component is deemed relevant, avoiding the shrinkage penalty on active entries.
SBL vs LASSO: Support Recovery and MSE
Vary the sparsity level , the undersampling ratio , and the SNR. Observe the gap between SBL and LASSO in probability of exact support recovery and in MSE.
Parameters
Posterior Marginals for a Toy Bernoulli-Gaussian Model
Exhaustively compute the posterior over supports for , , and display the posterior inclusion probability per index.
Parameters
When to Use SBL in Practice
SBL is most attractive when: (i) the sensing matrix is highly coherent (LASSO's guarantees degrade, SBL's do not), (ii) the signal has decaying-magnitude structure where shrinkage is harmful, (iii) uncertainty quantification per component is needed. It is less attractive when is very large (cubic per-iteration cost) or when a single convex solve suffices. For massive-MIMO channel estimation with coherent pilot sequences, SBL consistently beats LASSO β see Ch 15.
Common Mistake: Confusing MAP with Posterior Mean
Mistake:
Reporting the MAP estimate as "the Bayesian answer" when the task requires minimum MSE.
Correction:
MAP minimizes the - loss over ; the posterior mean minimizes the squared-error loss. They coincide only for unimodal posteriors (e.g.\ Gaussian). Under a spike-and-slab posterior, the two can differ substantially: MAP prefers a single support, while the mean averages over supports.
Spike-and-slab prior
A two-component mixture with a point mass at zero (the spike) and a continuous component (the slab). The canonical prior for exact-sparsity Bayesian models.
Related: Spike-and-Slab Prior
Sparse Bayesian Learning (SBL)
A hierarchical Bayesian approach using per-component variance hyperparameters (ARD prior) estimated by type-II maximum likelihood. Tipping (2001); Wipf-Rao (2004).
Related: Sparse Bayesian Learning (SBL)
Historical Note: From RVM to SBL to Deep Sparse Models
1996-2020Tipping's relevance-vector machine (2001) introduced the ARD prior as a sparsity-inducing mechanism in Bayesian regression. Wipf and Rao (2004) showed that SBL outperforms LASSO in support recovery under adverse dictionary conditions β an observation that motivated two decades of follow-up work (T-MSBL for temporally correlated sources, PC-SBL for pattern-coupled sparsity, and their unrolling into deep-learning architectures). The method also reappears in massive-random-access detectors (Ch 15).
Key Takeaway
The Bernoulli-Gaussian / spike-and-slab framework unifies sparse estimation under a probabilistic lens: MAP under a Laplace prior gives LASSO; MAP under spike-and-slab gives ; the posterior mean is the MMSE estimator. Sparse Bayesian Learning replaces the intractable support-marginalization by an ARD hyperparameter optimization solved via EM. SBL delivers sparser, less-biased, and typically more accurate estimates than LASSO under coherent dictionaries, at the price of cubic per-iteration cost.