Sparsity-Promoting Priors
Beyond Gaussianity — Why Sparse Scenes Need Sparse Priors
Gaussian priors produce smooth reconstructions and correspond to Tikhonov (quadratic) penalties. Most RF imaging scenes, however, are sparse: a small number of point targets against a largely empty background. Radar scenes of vehicles or aircraft, SAR images of buildings, and indoor localization maps are all dominated by zeros. Gaussian priors apply equal shrinkage to every component, making large (true target) coefficients smaller than they should be and failing to produce exact zeros.
Sparsity-promoting priors assign higher probability to sparse configurations. The MAP estimate then corresponds to -type regularization (connecting to §Variational Regularization and Sparsity), but the full Bayesian posterior additionally provides uncertainty quantification — knowing not just where the target is but how confident we are.
Definition: Laplace Prior and the LASSO
Laplace Prior and the LASSO
The Laplace prior (double exponential) on the scene vector is
The negative log-prior is . Under Gaussian noise the MAP estimate therefore solves
which is the LASSO (Least Absolute Shrinkage and Selection Operator). The Laplace prior concentrates mass at zero and has heavier tails than the Gaussian, encouraging sparse solutions with exact zeros in the MAP estimate.
Gaussian vs Laplace Prior — Key Differences
| Property | Gaussian | Laplace |
|---|---|---|
| Density at | Finite | Maximum |
| Tail decay | (super-exponential) | (exponential) |
| MAP penalty | (ridge) | (LASSO) |
| MAP solution | Linear shrinkage (no exact zeros) | Soft thresholding (exact zeros) |
| MMSE solution | Linear (same as MAP) | Nonlinear, smooth (no exact zeros) |
| Tail behavior | Light-tailed (aggressively shrinks large signals) | Heavier-tailed (allows large signals to survive) |
Definition: Bernoulli-Gaussian Prior
Bernoulli-Gaussian Prior
The Bernoulli-Gaussian (BG) prior models a sparse scene as a mixture of a point mass at zero and a Gaussian component:
where is the sparsity rate (prior probability of a non-zero component) and is the variance of the active component.
The BG prior is the standard model for sparse RF scenes: each pixel is either empty (with probability ) or contains a scatterer of random reflectivity (with probability ). The posterior probability that pixel is active provides a natural detection score at each pixel.
Conditional on the support , the posterior is Gaussian restricted to : with exact formulas from Theorem TGaussian Prior Yields Gaussian Posterior.
Definition: Spike-and-Slab Prior
Spike-and-Slab Prior
The spike-and-slab prior is a two-component mixture where the "spike" is a point mass at zero and the "slab" is a diffuse distribution:
where is a latent binary indicator with .
The spike-and-slab is the "gold standard" sparsity prior: it exactly encodes the sparsity model. However, exact posterior inference requires summing over support configurations — computationally intractable for . Approximate inference (variational Bayes, MCMC on ) is required in practice.
Definition: Horseshoe Prior
Horseshoe Prior
The horseshoe prior (Carvalho, Polson, and Scott, 2010) is a continuous shrinkage prior that approximates spike-and-slab behavior while remaining computationally tractable:
where is a global shrinkage parameter. The Half-Cauchy hyperprior on produces:
- Heavy tails: large signals are barely shrunk (robustness to strong scatterers).
- Infinite spike at zero: small signals are aggressively shrunk.
The marginal prior has a pole at zero and Cauchy-like tails — the "horseshoe" refers to the shape of the shrinkage profile as a function of the signal strength.
Comparing Shrinkage Profiles for a Scalar Problem
For the scalar model , , each prior induces a different shrinkage profile — the posterior mean as a function of :
| Prior | Shrinkage profile | Key property |
|---|---|---|
| Gaussian | Linear, uniform across all magnitudes | |
| Laplace | Soft thresholding (MAP); nonlinear shrinkage (MMSE) | Dead zone near zero in MAP |
| Spike-and-slab | Hard thresholding (MAP) | Exact zeros; combinatorial inference |
| Horseshoe | Nearly no shrinkage for large , aggressive for small | Adaptive, near-minimax |
The horseshoe achieves near-optimal performance across sparse and dense signal regimes simultaneously — a property not shared by Laplace or Gaussian priors.
Theorem: Laplace MAP Equals Soft Thresholding
For the scalar model , , with Laplace prior , the MAP estimate is the soft-thresholding operator:
In contrast, the MMSE estimate (posterior mean) satisfies for all but is never exactly zero.
Differentiating the log-posterior
The log-posterior is .
Setting the subgradient to zero: .
- If : requires .
- If : requires .
- If : satisfies the subgradient condition.
Therefore .
Example: Prior Comparison on a Sparse RF Scene
Consider pixels with active scatterers at positions with reflectivities . The sensing matrix has random Gaussian entries (normalized columns). Noise variance . Compare the MAP estimates under Gaussian, Laplace, and horseshoe priors.
Gaussian prior MAP (Tikhonov)
With , : with . All components are nonzero with small magnitudes; the two true targets appear as local maxima but are surrounded by noise artifacts with magnitudes .
Laplace prior MAP (LASSO)
With : only components 2 and 6 exceed the soft threshold. , , all others exactly zero. The reconstruction is sparser but slightly biased toward zero due to shrinkage.
Horseshoe prior (MMSE via MCMC)
The posterior mean under the horseshoe prior recovers , with near-zero posterior variance for these active components. The six inactive components have posterior means with correctly narrow credible intervals. The horseshoe achieves the best bias-variance tradeoff by adaptively shrinking based on signal strength.
Sparse Bayesian Inference — Prior Comparison
Compare MAP and MMSE estimates under different sparsity priors for a 1D sparse signal recovery problem with a random sensing matrix.
Left panel: Prior density for a single component (log scale), showing concentration at zero and tail behavior.
Center panel: True sparse signal (black stems), MAP reconstruction (blue), and MMSE reconstruction (orange) with credible bands.
Right panel: Shrinkage profile vs , illustrating the nonlinear, adaptive nature of sparsity-promoting priors.
Parameters
Common Mistake: MAP Shrinkage Bias Under Sparse Priors
Mistake:
Practitioners often report the LASSO (Laplace MAP) estimate as their final reconstruction and treat it as an unbiased estimate of the active coefficients.
Correction:
The LASSO MAP estimate suffers from shrinkage bias: even active (true nonzero) components are shrunk toward zero by the regularization penalty. The MMSE estimate under the same Laplace prior does not produce exact zeros but has smaller bias on large coefficients. For quantitative reconstruction (not just support recovery), a two-stage approach is standard: first use LASSO to identify the support, then re-estimate on the support via least squares (the "debiased LASSO" or "post-LASSO").
Shrinkage (Bayesian)
The tendency of a Bayesian estimator to pull estimates toward the prior mean (typically zero). The shrinkage profile vs characterizes how much each observation is pulled toward zero. Linear shrinkage (Gaussian prior) applies uniform pull; nonlinear shrinkage (Laplace, horseshoe) applies adaptive pull that depends on signal strength.
Related: Laplace Prior and the LASSO, Horseshoe Prior
Key Takeaway
-
The Laplace prior yields MAP estimates equivalent to the LASSO ( regularization), promoting exact zeros via soft thresholding.
-
The Bernoulli-Gaussian prior is the standard model for sparse RF scenes: it directly provides a posterior probability of target presence at each pixel.
-
The spike-and-slab prior is theoretically ideal ( sparsity) but computationally intractable for ; the horseshoe prior provides a tractable continuous approximation.
-
The horseshoe prior achieves adaptive shrinkage — nearly unshrunk large signals, aggressively shrunk small signals — and achieves near-minimax estimation rates.
-
Full Bayesian inference with sparsity priors provides uncertainty quantification beyond point estimates — essential for deciding whether a detected target is real or an artifact.
Quick Check
For the scalar model with and Laplace prior with , and observation , the MAP estimate is:
Correct. The soft threshold level is , so .