Sparsity-Promoting Priors

Beyond Gaussianity — Why Sparse Scenes Need Sparse Priors

Gaussian priors produce smooth reconstructions and correspond to Tikhonov (quadratic) penalties. Most RF imaging scenes, however, are sparse: a small number of point targets against a largely empty background. Radar scenes of vehicles or aircraft, SAR images of buildings, and indoor localization maps are all dominated by zeros. Gaussian priors apply equal shrinkage to every component, making large (true target) coefficients smaller than they should be and failing to produce exact zeros.

Sparsity-promoting priors assign higher probability to sparse configurations. The MAP estimate then corresponds to 1\ell_1-type regularization (connecting to §Variational Regularization and Sparsity), but the full Bayesian posterior additionally provides uncertainty quantification — knowing not just where the target is but how confident we are.

Definition:

Laplace Prior and the LASSO

The Laplace prior (double exponential) on the scene vector γRn\boldsymbol{\gamma} \in \mathbb{R}^n is

π(γ)=i=1nλ2exp(λγi),λ>0.\pi(\boldsymbol{\gamma}) = \prod_{i=1}^n \frac{\lambda}{2}\exp(-\lambda|\gamma_i|), \qquad \lambda > 0.

The negative log-prior is logπ(γ)=λγ1+const-\log\pi(\boldsymbol{\gamma}) = \lambda\|\boldsymbol{\gamma}\|_1 + \text{const}. Under Gaussian noise the MAP estimate therefore solves

γ^MAP=argminγ{12σ2Aγy2+λγ1},\hat{\boldsymbol{\gamma}}_{\text{MAP}} = \arg\min_{\boldsymbol{\gamma}} \left\{ \frac{1}{2\sigma^2}\|\mathbf{A}\boldsymbol{\gamma} - \mathbf{y}\|^2 + \lambda\|\boldsymbol{\gamma}\|_1\right\},

which is the LASSO (Least Absolute Shrinkage and Selection Operator). The Laplace prior concentrates mass at zero and has heavier tails than the Gaussian, encouraging sparse solutions with exact zeros in the MAP estimate.

Gaussian vs Laplace Prior — Key Differences

PropertyGaussian N(0,γ2)\mathcal{N}(0, \gamma^2)Laplace Lap(0,1/λ)\text{Lap}(0, 1/\lambda)
Density at γi=0\gamma_i = 0Finite 12πγ\frac{1}{\sqrt{2\pi}\gamma}Maximum λ2\frac{\lambda}{2}
Tail decayexp(γi2/2γ2)\exp(-\gamma_i^2/2\gamma^2) (super-exponential)exp(λγi)\exp(-\lambda|\gamma_i|) (exponential)
MAP penalty12γ2γ2\frac{1}{2\gamma^2}\|\boldsymbol{\gamma}\|^2 (ridge)λγ1\lambda\|\boldsymbol{\gamma}\|_1 (LASSO)
MAP solutionLinear shrinkage (no exact zeros)Soft thresholding (exact zeros)
MMSE solutionLinear (same as MAP)Nonlinear, smooth (no exact zeros)
Tail behaviorLight-tailed (aggressively shrinks large signals)Heavier-tailed (allows large signals to survive)

Definition:

Bernoulli-Gaussian Prior

The Bernoulli-Gaussian (BG) prior models a sparse scene as a mixture of a point mass at zero and a Gaussian component:

π(γi)=(1w)δ0(γi)+wN(0,τ2),\pi(\gamma_i) = (1 - w)\,\delta_0(\gamma_i) + w\,\mathcal{N}(0, \tau^2),

where w(0,1)w \in (0,1) is the sparsity rate (prior probability of a non-zero component) and τ2\tau^2 is the variance of the active component.

The BG prior is the standard model for sparse RF scenes: each pixel is either empty (with probability 1w1-w) or contains a scatterer of random reflectivity (with probability ww). The posterior probability that pixel ii is active provides a natural detection score at each pixel.

Conditional on the support S={i:γi0}\mathcal{S} = \{i : \gamma_i \neq 0\}, the posterior is Gaussian restricted to S\mathcal{S}: p(γSy,S)=N(γ^S,ΓSy),p(\boldsymbol{\gamma}_\mathcal{S} \mid \mathbf{y}, \mathcal{S}) = \mathcal{N}(\hat{\boldsymbol{\gamma}}_\mathcal{S},\,\mathbf{\Gamma}_{\mathcal{S}|\mathbf{y}}), with exact formulas from Theorem TGaussian Prior Yields Gaussian Posterior.

Definition:

Spike-and-Slab Prior

The spike-and-slab prior is a two-component mixture where the "spike" is a point mass at zero and the "slab" is a diffuse distribution:

π(γiθi)=(1θi)δ0(γi)+θiN(0,τ2),\pi(\gamma_i \mid \theta_i) = (1 - \theta_i)\,\delta_0(\gamma_i) + \theta_i\,\mathcal{N}(0, \tau^2),

where θi{0,1}\theta_i \in \{0, 1\} is a latent binary indicator with θiBernoulli(w)\theta_i \sim \text{Bernoulli}(w).

The spike-and-slab is the "gold standard" sparsity prior: it exactly encodes the 0\ell_0 sparsity model. However, exact posterior inference requires summing over 2n2^n support configurations — computationally intractable for n30n \gtrsim 30. Approximate inference (variational Bayes, MCMC on θ\boldsymbol{\theta}) is required in practice.

Definition:

Horseshoe Prior

The horseshoe prior (Carvalho, Polson, and Scott, 2010) is a continuous shrinkage prior that approximates spike-and-slab behavior while remaining computationally tractable:

γiλiN(0,λi2τ2),λiHalf-Cauchy(0,1),\gamma_i \mid \lambda_i \sim \mathcal{N}(0, \lambda_i^2 \tau^2), \qquad \lambda_i \sim \text{Half-Cauchy}(0, 1),

where τ>0\tau > 0 is a global shrinkage parameter. The Half-Cauchy hyperprior on λi\lambda_i produces:

  • Heavy tails: large signals are barely shrunk (robustness to strong scatterers).
  • Infinite spike at zero: small signals are aggressively shrunk.

The marginal prior π(γi)\pi(\gamma_i) has a pole at zero and Cauchy-like tails — the "horseshoe" refers to the shape of the shrinkage profile κi=1/(1+λi2τ2)\kappa_i = 1/(1 + \lambda_i^2\tau^2) as a function of the signal strength.

Comparing Shrinkage Profiles for a Scalar Problem

For the scalar model yi=γi+wiy_i = \gamma_i + w_i, wiN(0,1)w_i \sim \mathcal{N}(0, 1), each prior induces a different shrinkage profile — the posterior mean E[γiyi]\mathbb{E}[\gamma_i \mid y_i] as a function of yiy_i:

Prior Shrinkage profile Key property
Gaussian γ^i=τ2τ2+1yi\hat{\gamma}_i = \frac{\tau^2}{\tau^2 + 1} y_i Linear, uniform across all magnitudes
Laplace Soft thresholding Sλ(yi)\mathcal{S}_\lambda(y_i) (MAP); nonlinear shrinkage (MMSE) Dead zone near zero in MAP
Spike-and-slab Hard thresholding (MAP) Exact zeros; combinatorial inference
Horseshoe Nearly no shrinkage for large yi|y_i|, aggressive for small yi|y_i| Adaptive, near-minimax

The horseshoe achieves near-optimal performance across sparse and dense signal regimes simultaneously — a property not shared by Laplace or Gaussian priors.

Theorem: Laplace MAP Equals Soft Thresholding

For the scalar model y=γ+wy = \gamma + w, wN(0,σ2)w \sim \mathcal{N}(0, \sigma^2), with Laplace prior π(γ)=λ2exp(λγ)\pi(\gamma) = \frac{\lambda}{2}\exp(-\lambda|\gamma|), the MAP estimate is the soft-thresholding operator:

γ^MAP=Sλσ2(y)=sign(y)max(yλσ2,0).\hat{\gamma}_{\text{MAP}} = \mathcal{S}_{\lambda\sigma^2}(y) = \text{sign}(y)\,\max\bigl(|y| - \lambda\sigma^2,\, 0\bigr).

In contrast, the MMSE estimate (posterior mean) satisfies γ^MMSE<y|\hat{\gamma}_{\text{MMSE}}| < |y| for all yy but is never exactly zero.

Example: Prior Comparison on a Sparse RF Scene

Consider n=8n = 8 pixels with k=2k = 2 active scatterers at positions i=2,6i = 2, 6 with reflectivities γ2=2,γ6=1.5\gamma_2 = 2, \gamma_6 = -1.5. The sensing matrix AR5×8\mathbf{A} \in \mathbb{R}^{5 \times 8} has random Gaussian entries (normalized columns). Noise variance σ2=0.1\sigma^2 = 0.1. Compare the MAP estimates under Gaussian, Laplace, and horseshoe priors.

Sparse Bayesian Inference — Prior Comparison

Compare MAP and MMSE estimates under different sparsity priors for a 1D sparse signal recovery problem with a random sensing matrix.

Left panel: Prior density π(γi)\pi(\gamma_i) for a single component (log scale), showing concentration at zero and tail behavior.

Center panel: True sparse signal (black stems), MAP reconstruction (blue), and MMSE reconstruction (orange) with ±2σ\pm 2\sigma credible bands.

Right panel: Shrinkage profile E[γiyi]\mathbb{E}[\gamma_i \mid y_i] vs yiy_i, illustrating the nonlinear, adaptive nature of sparsity-promoting priors.

Parameters
3
0.15

Common Mistake: MAP Shrinkage Bias Under Sparse Priors

Mistake:

Practitioners often report the LASSO (Laplace MAP) estimate as their final reconstruction and treat it as an unbiased estimate of the active coefficients.

Correction:

The LASSO MAP estimate suffers from shrinkage bias: even active (true nonzero) components are shrunk toward zero by the regularization penalty. The MMSE estimate under the same Laplace prior does not produce exact zeros but has smaller bias on large coefficients. For quantitative reconstruction (not just support recovery), a two-stage approach is standard: first use LASSO to identify the support, then re-estimate on the support via least squares (the "debiased LASSO" or "post-LASSO").

Shrinkage (Bayesian)

The tendency of a Bayesian estimator to pull estimates toward the prior mean (typically zero). The shrinkage profile E[γiyi]\mathbb{E}[\gamma_i \mid y_i] vs yiy_i characterizes how much each observation is pulled toward zero. Linear shrinkage (Gaussian prior) applies uniform pull; nonlinear shrinkage (Laplace, horseshoe) applies adaptive pull that depends on signal strength.

Related: Laplace Prior and the LASSO, Horseshoe Prior

Key Takeaway

  1. The Laplace prior yields MAP estimates equivalent to the LASSO (1\ell_1 regularization), promoting exact zeros via soft thresholding.

  2. The Bernoulli-Gaussian prior is the standard model for sparse RF scenes: it directly provides a posterior probability of target presence at each pixel.

  3. The spike-and-slab prior is theoretically ideal (0\ell_0 sparsity) but computationally intractable for n30n \gtrsim 30; the horseshoe prior provides a tractable continuous approximation.

  4. The horseshoe prior achieves adaptive shrinkage — nearly unshrunk large signals, aggressively shrunk small signals — and achieves near-minimax estimation rates.

  5. Full Bayesian inference with sparsity priors provides uncertainty quantification beyond point estimates — essential for deciding whether a detected target is real or an artifact.

Quick Check

For the scalar model y=γ+wy = \gamma + w with wN(0,1)w \sim \mathcal{N}(0, 1) and Laplace prior π(γ)exp(λγ)\pi(\gamma) \propto \exp(-\lambda|\gamma|) with λ=0.5\lambda = 0.5, and observation y=0.3y = 0.3, the MAP estimate γ^MAP\hat{\gamma}_{\text{MAP}} is:

0.30.3

0.150.15

00

0.2-0.2