Ferkans — Interactive Telecom Tutor

Beyond Gaussianity — Why Sparse Scenes Need Sparse Priors

Gaussian priors produce smooth reconstructions and correspond to Tikhonov (quadratic) penalties. Most RF imaging scenes, however, are sparse: a small number of point targets against a largely empty background. Radar scenes of vehicles or aircraft, SAR images of buildings, and indoor localization maps are all dominated by zeros. Gaussian priors apply equal shrinkage to every component, making large (true target) coefficients smaller than they should be and failing to produce exact zeros.

Sparsity-promoting priors assign higher probability to sparse configurations. The MAP estimate then corresponds to $\ell_1$ -type regularization (connecting to §Variational Regularization and Sparsity), but the full Bayesian posterior additionally provides uncertainty quantification — knowing not just where the target is but how confident we are.

Definition:
Laplace Prior and the LASSO

The Laplace prior (double exponential) on the scene vector $\boldsymbol{\gamma} \in \mathbb{R}^n$ is

$\pi(\boldsymbol{\gamma}) = \prod_{i=1}^n \frac{\lambda}{2}\exp(-\lambda|\gamma_i|), \qquad \lambda > 0.$

The negative log-prior is $-\log\pi(\boldsymbol{\gamma}) = \lambda\|\boldsymbol{\gamma}\|_1 + \text{const}$ . Under Gaussian noise the MAP estimate therefore solves

$\hat{\boldsymbol{\gamma}}_{\text{MAP}} = \arg\min_{\boldsymbol{\gamma}} \left\{ \frac{1}{2\sigma^2}\|\mathbf{A}\boldsymbol{\gamma} - \mathbf{y}\|^2 + \lambda\|\boldsymbol{\gamma}\|_1\right\},$

which is the LASSO (Least Absolute Shrinkage and Selection Operator). The Laplace prior concentrates mass at zero and has heavier tails than the Gaussian, encouraging sparse solutions with exact zeros in the MAP estimate.

Gaussian vs Laplace Prior — Key Differences

Property	Gaussian $\mathcal{N}(0, \gamma^2)$	Laplace $\text{Lap}(0, 1/\lambda)$
Density at $\gamma_i = 0$	Finite $\frac{1}{\sqrt{2\pi}\gamma}$	Maximum $\frac{\lambda}{2}$
Tail decay	$\exp(-\gamma_i^2/2\gamma^2)$ (super-exponential)	$\exp(-\lambda\|\gamma_i\|)$ (exponential)
MAP penalty	$\frac{1}{2\gamma^2}\\|\boldsymbol{\gamma}\\|^2$ (ridge)	$\lambda\\|\boldsymbol{\gamma}\\|_1$ (LASSO)
MAP solution	Linear shrinkage (no exact zeros)	Soft thresholding (exact zeros)
MMSE solution	Linear (same as MAP)	Nonlinear, smooth (no exact zeros)
Tail behavior	Light-tailed (aggressively shrinks large signals)	Heavier-tailed (allows large signals to survive)

Definition:
Bernoulli-Gaussian Prior

The Bernoulli-Gaussian (BG) prior models a sparse scene as a mixture of a point mass at zero and a Gaussian component:

$\pi(\gamma_i) = (1 - w)\,\delta_0(\gamma_i) + w\,\mathcal{N}(0, \tau^2),$

where $w \in (0,1)$ is the sparsity rate (prior probability of a non-zero component) and $\tau^2$ is the variance of the active component.

The BG prior is the standard model for sparse RF scenes: each pixel is either empty (with probability $1-w$ ) or contains a scatterer of random reflectivity (with probability $w$ ). The posterior probability that pixel $i$ is active provides a natural detection score at each pixel.

Conditional on the support $\mathcal{S} = \{i : \gamma_i \neq 0\}$ , the posterior is Gaussian restricted to $\mathcal{S}$ : $p(\boldsymbol{\gamma}_\mathcal{S} \mid \mathbf{y}, \mathcal{S}) = \mathcal{N}(\hat{\boldsymbol{\gamma}}_\mathcal{S},\,\mathbf{\Gamma}_{\mathcal{S}|\mathbf{y}}),$ with exact formulas from Theorem TGaussian Prior Yields Gaussian Posterior.

Definition:
Spike-and-Slab Prior

The spike-and-slab prior is a two-component mixture where the "spike" is a point mass at zero and the "slab" is a diffuse distribution:

$\pi(\gamma_i \mid \theta_i) = (1 - \theta_i)\,\delta_0(\gamma_i) + \theta_i\,\mathcal{N}(0, \tau^2),$

where $\theta_i \in \{0, 1\}$ is a latent binary indicator with $\theta_i \sim \text{Bernoulli}(w)$ .

The spike-and-slab is the "gold standard" sparsity prior: it exactly encodes the $\ell_0$ sparsity model. However, exact posterior inference requires summing over $2^n$ support configurations — computationally intractable for $n \gtrsim 30$ . Approximate inference (variational Bayes, MCMC on $\boldsymbol{\theta}$ ) is required in practice.

Definition:
Horseshoe Prior

The horseshoe prior (Carvalho, Polson, and Scott, 2010) is a continuous shrinkage prior that approximates spike-and-slab behavior while remaining computationally tractable:

$\gamma_i \mid \lambda_i \sim \mathcal{N}(0, \lambda_i^2 \tau^2), \qquad \lambda_i \sim \text{Half-Cauchy}(0, 1),$

where $\tau > 0$ is a global shrinkage parameter. The Half-Cauchy hyperprior on $\lambda_i$ produces:

Heavy tails: large signals are barely shrunk (robustness to strong scatterers).
Infinite spike at zero: small signals are aggressively shrunk.

The marginal prior $\pi(\gamma_i)$ has a pole at zero and Cauchy-like tails — the "horseshoe" refers to the shape of the shrinkage profile $\kappa_i = 1/(1 + \lambda_i^2\tau^2)$ as a function of the signal strength.

Comparing Shrinkage Profiles for a Scalar Problem

For the scalar model $y_i = \gamma_i + w_i$ , $w_i \sim \mathcal{N}(0, 1)$ , each prior induces a different shrinkage profile — the posterior mean $\mathbb{E}[\gamma_i \mid y_i]$ as a function of $y_i$ :

Prior	Shrinkage profile	Key property
Gaussian	$\hat{\gamma}_i = \frac{\tau^2}{\tau^2 + 1} y_i$	Linear, uniform across all magnitudes
Laplace	Soft thresholding $\mathcal{S}_\lambda(y_i)$ (MAP); nonlinear shrinkage (MMSE)	Dead zone near zero in MAP
Spike-and-slab	Hard thresholding (MAP)	Exact zeros; combinatorial inference
Horseshoe	Nearly no shrinkage for large $\|y_i\|$ , aggressive for small $\|y_i\|$	Adaptive, near-minimax

The horseshoe achieves near-optimal performance across sparse and dense signal regimes simultaneously — a property not shared by Laplace or Gaussian priors.

Theorem: Laplace MAP Equals Soft Thresholding

For the scalar model $y = \gamma + w$ , $w \sim \mathcal{N}(0, \sigma^2)$ , with Laplace prior $\pi(\gamma) = \frac{\lambda}{2}\exp(-\lambda|\gamma|)$ , the MAP estimate is the soft-thresholding operator:

$\hat{\gamma}_{\text{MAP}} = \mathcal{S}_{\lambda\sigma^2}(y) = \text{sign}(y)\,\max\bigl(|y| - \lambda\sigma^2,\, 0\bigr).$

In contrast, the MMSE estimate (posterior mean) satisfies $|\hat{\gamma}_{\text{MMSE}}| < |y|$ for all $y$ but is never exactly zero.

Proof

Differentiating the log-posterior

The log-posterior is $-\log p(\gamma \mid y) = \frac{1}{2\sigma^2}(y - \gamma)^2 + \lambda|\gamma| + \text{const}$ .

Setting the subgradient to zero: $\frac{\gamma - y}{\sigma^2} + \lambda\,\partial|\gamma| \ni 0$ .

If $\gamma > 0$ : $\gamma = y - \lambda\sigma^2 > 0$ requires $y > \lambda\sigma^2$ .
If $\gamma < 0$ : $\gamma = y + \lambda\sigma^2 < 0$ requires $y < -\lambda\sigma^2$ .
If $|y| \leq \lambda\sigma^2$ : $\gamma = 0$ satisfies the subgradient condition.

Therefore $\hat{\gamma}_{\text{MAP}} = \mathcal{S}_{\lambda\sigma^2}(y)$ . $\blacksquare$

Example: Prior Comparison on a Sparse RF Scene

Consider $n = 8$ pixels with $k = 2$ active scatterers at positions $i = 2, 6$ with reflectivities $\gamma_2 = 2, \gamma_6 = -1.5$ . The sensing matrix $\mathbf{A} \in \mathbb{R}^{5 \times 8}$ has random Gaussian entries (normalized columns). Noise variance $\sigma^2 = 0.1$ . Compare the MAP estimates under Gaussian, Laplace, and horseshoe priors.

Solution

Gaussian prior MAP (Tikhonov)

With $\mathbf{\Gamma} = \gamma^2 \mathbf{I}$ , $\gamma^2 = 1$ : $\hat{\boldsymbol{\gamma}}_{\text{MAP}} = (\mathbf{A}^T\mathbf{A} + \lambda\mathbf{I})^{-1}\mathbf{A}^T\mathbf{y}$ with $\lambda = \sigma^2/\gamma^2 = 0.1$ . All components are nonzero with small magnitudes; the two true targets appear as local maxima but are surrounded by noise artifacts with magnitudes $\sim 0.3-0.5$ .

Laplace prior MAP (LASSO)

With $\lambda = 0.15$ : only components 2 and 6 exceed the soft threshold. $\hat{\gamma}_2 = 1.85$ , $\hat{\gamma}_6 = -1.32$ , all others exactly zero. The reconstruction is sparser but slightly biased toward zero due to $\ell_1$ shrinkage.

Horseshoe prior (MMSE via MCMC)

The posterior mean under the horseshoe prior recovers $\hat{\gamma}_2 \approx 1.97$ , $\hat{\gamma}_6 \approx -1.47$ with near-zero posterior variance for these active components. The six inactive components have posterior means $< 0.05$ with correctly narrow credible intervals. The horseshoe achieves the best bias-variance tradeoff by adaptively shrinking based on signal strength.

Sparse Bayesian Inference — Prior Comparison

Compare MAP and MMSE estimates under different sparsity priors for a 1D sparse signal recovery problem with a random sensing matrix.

Left panel: Prior density $\pi(\gamma_i)$ for a single component (log scale), showing concentration at zero and tail behavior.

Center panel: True sparse signal (black stems), MAP reconstruction (blue), and MMSE reconstruction (orange) with $\pm 2\sigma$ credible bands.

Right panel: Shrinkage profile $\mathbb{E}[\gamma_i \mid y_i]$ vs $y_i$ , illustrating the nonlinear, adaptive nature of sparsity-promoting priors.

Parameters

Prior type

True sparsity

k

3

Noise

\sigma

0.15

Common Mistake: MAP Shrinkage Bias Under Sparse Priors

Mistake:

Practitioners often report the LASSO (Laplace MAP) estimate as their final reconstruction and treat it as an unbiased estimate of the active coefficients.

Correction:

The LASSO MAP estimate suffers from shrinkage bias: even active (true nonzero) components are shrunk toward zero by the regularization penalty. The MMSE estimate under the same Laplace prior does not produce exact zeros but has smaller bias on large coefficients. For quantitative reconstruction (not just support recovery), a two-stage approach is standard: first use LASSO to identify the support, then re-estimate on the support via least squares (the "debiased LASSO" or "post-LASSO").

Shrinkage (Bayesian)

The tendency of a Bayesian estimator to pull estimates toward the prior mean (typically zero). The shrinkage profile $\mathbb{E}[\gamma_i \mid y_i]$ vs $y_i$ characterizes how much each observation is pulled toward zero. Linear shrinkage (Gaussian prior) applies uniform pull; nonlinear shrinkage (Laplace, horseshoe) applies adaptive pull that depends on signal strength.

Key Takeaway

The Laplace prior yields MAP estimates equivalent to the LASSO ( $\ell_1$ regularization), promoting exact zeros via soft thresholding.
The Bernoulli-Gaussian prior is the standard model for sparse RF scenes: it directly provides a posterior probability of target presence at each pixel.
The spike-and-slab prior is theoretically ideal ( $\ell_0$ sparsity) but computationally intractable for $n \gtrsim 30$ ; the horseshoe prior provides a tractable continuous approximation.
The horseshoe prior achieves adaptive shrinkage — nearly unshrunk large signals, aggressively shrunk small signals — and achieves near-minimax estimation rates.
Full Bayesian inference with sparsity priors provides uncertainty quantification beyond point estimates — essential for deciding whether a detected target is real or an artifact.

Quick Check

For the scalar model $y = \gamma + w$ with $w \sim \mathcal{N}(0, 1)$ and Laplace prior $\pi(\gamma) \propto \exp(-\lambda|\gamma|)$ with $\lambda = 0.5$ , and observation $y = 0.3$ , the MAP estimate $\hat{\gamma}_{\text{MAP}}$ is:

$0.3$

$0.15$

$0$

$-0.2$