Variational Regularization and Sparsity

Beyond Quadratic Penalties — Sparsity and Edges

Tikhonov regularization uses the quadratic penalty x2\|x\|^2, which encodes a Gaussian prior: the reconstructed image should have small 2\ell_2 norm. This is sensible for smooth images, but many RF imaging scenes are not smooth:

  • Radar scenes consist of point scatterers, extended edges, or piecewise-constant regions — the wrong prior for 2\ell_2.
  • The scattering coefficients in a sparse scene are zero everywhere except at target locations — a sparse signal, not a small-norm one.

The variational regularization framework replaces x2\|x\|^2 with an arbitrary convex penalty R(x)R(x) chosen to encode problem-specific prior knowledge. Choosing R(x)=x1R(x) = \|x\|_1 promotes sparsity; choosing R(x)=TV(x)R(x) = \mathrm{TV}(x) promotes piecewise-constant images with sharp edges. Both have a Bayesian interpretation: R(x)=x1R(x) = \|x\|_1 corresponds to a Laplace prior; R(x)=TV(x)R(x) = \mathrm{TV}(x) to a total-variation prior.

Definition:

Variational Regularization

Given a forward operator A\mathcal{A}, noisy data yδy^\delta, and a convex penalty RΓ0(X)R \in \Gamma_0(\mathcal{X}), the variational regularization problem is

x^=argminxX{12Axyδ2+λR(x)},\hat{x} = \arg\min_{x \in \mathcal{X}} \left\{\frac{1}{2}\|\mathcal{A}x - y^\delta\|^2 + \lambda\, R(x)\right\},

where λ>0\lambda > 0 is the regularization parameter.

More generally, the data fidelity term can be replaced by any convex loss D(Ax,yδ)D(\mathcal{A}x, y^\delta) (e.g., the Kullback–Leibler divergence for Poisson noise, or the 1\ell_1 fidelity for impulsive noise).

The choice of RR encodes prior knowledge:

  • R(x)=12x2R(x) = \frac{1}{2}\|x\|^2: Tikhonov (smooth, Gaussian prior).
  • R(x)=x1R(x) = \|x\|_1: Sparsity in the canonical basis (Laplace prior).
  • R(x)=Φx1R(x) = \|\Phi x\|_1: Sparsity in a transform domain Φ\Phi (e.g., wavelets).
  • R(x)=TV(x)R(x) = \mathrm{TV}(x): Piecewise constancy (edge-preserving).
  • R(x)=ιC(x)R(x) = \iota_C(x): Hard constraint xCx \in C (support, non-negativity).
  • R(x)=x2,1R(x) = \|x\|_{2,1}: Group sparsity (multi-frequency imaging).
,

Theorem: Existence of Minimizers for Variational Regularization

Let A ⁣:RnRm\mathcal{A} \colon \mathbb{R}^n \to \mathbb{R}^m be linear and let RΓ0(Rn)R \in \Gamma_0(\mathbb{R}^n) be coercive (i.e., R(x)+R(x) \to +\infty as x\|x\| \to \infty). Then the functional

J(x)=12Axyδ2+λR(x)J(x) = \frac{1}{2}\|\mathcal{A}x - y^\delta\|^2 + \lambda\, R(x)

attains its minimum. If either A\mathcal{A} is injective or RR is strictly convex, the minimizer is unique.

Definition:

LASSO — 1\ell_1 Regularization

The LASSO (Least Absolute Shrinkage and Selection Operator) is the variational problem

x^=argminxRn{12Axyδ22+λx1}.\hat{x} = \arg\min_{x \in \mathbb{R}^n} \left\{\frac{1}{2}\|\mathcal{A}x - y^\delta\|_2^2 + \lambda\|x\|_1\right\}.

The 1\ell_1 ball {x:x1η}\{x : \|x\|_1 \leq \eta\} has corners on the coordinate axes, so the constraint set pushes the solution toward the axes — i.e., toward sparse solutions.

Bayesian interpretation: The LASSO solution is the MAP estimate under independent Laplace priors: p(x)exp(λx1/σn2)p(x) \propto \exp(-\lambda\|x\|_1/\sigma_n^2).

The LASSO was introduced by Tibshirani (1996) in statistics and independently as Basis Pursuit Denoising by Chen, Donoho, and Saunders (1998) in signal processing. The LASSO has no closed-form solution in general and requires iterative algorithms (ISTA/FISTA/ADMM — see Telecom Ch 03), but a proximal step (soft thresholding) efficiently handles the 1\ell_1 term.

,

Theorem: Sparsity of LASSO Solutions — Optimality Conditions

Let x^\hat{x} be a LASSO solution with λ>0\lambda > 0, and let r=Ax^yδr = \mathcal{A}\hat{x} - y^\delta be the residual. The optimality (KKT) conditions are:

[Ar]i+λsi=0,six^i,[\mathcal{A}^* r]_i + \lambda s_i = 0, \qquad s_i \in \partial|\hat{x}_i|,

i.e., si=sign(x^i)s_i = \mathrm{sign}(\hat{x}_i) if x^i0\hat{x}_i \neq 0 and si[1,1]s_i \in [-1,1] if x^i=0\hat{x}_i = 0.

Therefore:

  • If [Ar]i<λ|[\mathcal{A}^* r]_i| < \lambda, then x^i=0\hat{x}_i = 0.
  • The support satisfies supp(x^)m|\mathrm{supp}(\hat{x})| \leq m (at most as many non-zeros as measurements).
,

Theorem: Exact Recovery via 1\ell_1 Minimization

Let xRnx^\dagger \in \mathbb{R}^n be ss-sparse with support S=supp(x)S = \mathrm{supp}(x^\dagger). If A\mathcal{A} satisfies the Restricted Isometry Property (RIP) of order 2s2s with constant δ2s<21\delta_{2s} < \sqrt{2} - 1, then:

(a) (Noiseless) Basis pursuit recovers xx^\dagger exactly: x^BP=x\hat{x}_{BP} = x^\dagger.

(b) (Noisy) The BPDN solution satisfies

x^BPDNx2Cδ\|\hat{x}_{BPDN} - x^\dagger\|_2 \leq C\,\delta

where CC depends only on δ2s\delta_{2s}.

The RIP says A\mathcal{A} acts approximately as an isometry on all 2s2s-sparse vectors:

(1δ2s)x2Ax2(1+δ2s)x2.(1 - \delta_{2s})\|x\|^2 \leq \|\mathcal{A}x\|^2 \leq (1 + \delta_{2s})\|x\|^2.

This prevents any two distinct ss-sparse vectors from producing the same measurements. Random matrices (Gaussian, Bernoulli, partial Fourier) satisfy the RIP with high probability when mslog(n/s)m \gtrsim s \log(n/s) — far fewer measurements than the full nn. The compressed sensing guarantee is covered in depth in FSI Ch 13.

,

Definition:

Total Variation Regularization

For a discrete image xRn1×n2x \in \mathbb{R}^{n_1 \times n_2}, the discrete total variation is:

Isotropic TV: TViso(x)=i,j(1x)i,j2+(2x)i,j2,\mathrm{TV}_{\mathrm{iso}}(x) = \sum_{i,j} \sqrt{(\nabla_1 x)_{i,j}^2 + (\nabla_2 x)_{i,j}^2},

where (1x)i,j=xi+1,jxi,j(\nabla_1 x)_{i,j} = x_{i+1,j} - x_{i,j} and (2x)i,j=xi,j+1xi,j(\nabla_2 x)_{i,j} = x_{i,j+1} - x_{i,j}.

Anisotropic TV: TVaniso(x)=i,j((1x)i,j+(2x)i,j).\mathrm{TV}_{\mathrm{aniso}}(x) = \sum_{i,j} \bigl(|(\nabla_1 x)_{i,j}| + |(\nabla_2 x)_{i,j}|\bigr).

The TV-regularized reconstruction is then

x^=argminx  12Axyδ2+λTV(x).\hat{x} = \arg\min_x \;\frac{1}{2}\|\mathcal{A}x - y^\delta\|^2 + \lambda\,\mathrm{TV}(x).

TV promotes piecewise constant images with sharp edges: it allows large gradients at a few locations while penalising gradients everywhere else. This contrasts with Tikhonov, which penalises gradients everywhere uniformly (leading to blurred edges) and with LASSO, which promotes sparsity in the pixel domain (wrong for extended features).

The ROF denoising model (Rudin–Osher–Fatemi, 1992) is the special case A=I\mathcal{A} = I.

,

Definition:

Group Sparsity for Multi-Frequency Imaging

In multi-frequency RF imaging, measurements are taken at QQ different frequencies. The measurement model at each frequency qq is yq=Aqxq+ηqy_q = \mathcal{A}_q x_q + \eta_q, where xqx_q is the complex reflectivity map at frequency qq.

If the scatterer positions are the same at all frequencies (a sparse scene), the group-sparse (joint sparsity) model imposes

R(x)=x2,1=i=1n(x1(i),,xQ(i))2,R(x) = \|x\|_{2,1} = \sum_{i=1}^n \|(x_1(i), \ldots, x_Q(i))\|_2,

where xq(i)x_q(i) is the ii-th pixel of the qq-th frequency channel. The 2,1\ell_{2,1} norm promotes solutions where entire groups of components (same pixel across all frequencies) are simultaneously zero or non-zero — joint support recovery.

Group sparsity generalises LASSO (1\ell_1) to the multi-measurement case. The proximal operator of the 2,1\ell_{2,1} norm is group soft thresholding. When the group sizes are 1, group sparsity reduces to standard LASSO.

,

Example: LASSO for Sparse Radar Imaging

A radar system illuminates a scene containing ss point scatterers at unknown positions. The measurement model is y=Ag+ϵy = \mathcal{A}g + \epsilon where A\mathcal{A} is the discretized radar forward operator and gg is the reflectivity image. Formulate the reconstruction as a LASSO problem and discuss the choice of λ\lambda.

,

Example: MAP Interpretation of Variational Regularization

Show that the variational problem minx12σn2Axy2+λR(x)\min_x \frac{1}{2\sigma_n^2}\|\mathcal{A}x - y\|^2 + \lambda R(x) is the MAP estimate under the prior p(x)eλR(x)p(x) \propto e^{-\lambda R(x)} and Gaussian likelihood yxN(Ax,σn2I)y|x \sim \mathcal{N}(\mathcal{A}x, \sigma_n^2 I). Identify the prior for the cases R(x)=12x2R(x) = \frac{1}{2}\|x\|^2 and R(x)=x1R(x) = \|x\|_1.

,

Variational Regularization: Tikhonov vs. LASSO vs. TV

Compares three regularization penalties for 1D signal reconstruction. The true signal alternates between sparse (point impulses) and piecewise-constant (step) sections to highlight the strengths and weaknesses of each method.

Left panel: True signal (blue), noisy data (gray), and three reconstructions (red = Tikhonov, green = LASSO, orange = TV).

Right panel: Reconstruction error for each method vs. λ\lambda.

Observe: Tikhonov blurs all features equally. LASSO recovers the sparse components sharply but smooths the piecewise-constant regions (no single support per region). TV recovers the step edges but may produce staircase artefacts on smooth sections.

Parameters
0.1
0.1

Common Mistake: LASSO Is Wrong for Extended Targets

Mistake:

Applying LASSO (or 1\ell_1 regularization) to an RF imaging problem where the scene contains extended objects (walls, vehicles, terrain) rather than point scatterers.

Correction:

LASSO promotes solutions that are pixel-sparse: most pixels are zero, with a few large non-zero pixels. Extended objects violate this model — they have many non-zero pixels and the 1\ell_1 penalty will try to collapse them onto a few concentrated pixels.

For extended objects:

  • Use TV regularization for piecewise-constant objects with well-defined boundaries.
  • Use group sparsity (2,1\ell_{2,1}) if multi-frequency data is available and objects are spatially coherent across frequencies.
  • Use wavelet sparsity (analysis or synthesis) if objects are smooth in a wavelet basis.

LASSO

The Least Absolute Shrinkage and Selection Operator: the variational problem minx12Axy2+λx1\min_x \frac{1}{2}\|\mathcal{A}x - y\|^2 + \lambda\|x\|_1. It promotes sparse solutions and corresponds to MAP estimation with a Laplace prior.

Related: Picard Condition

Total Variation

The total variation of a discrete image xx is the 1\ell_1 norm of its discrete gradient: TV(x)=x2,1\mathrm{TV}(x) = \|\nabla x\|_{2,1} (isotropic) or x1\|\nabla x\|_1 (anisotropic). TV regularization promotes piecewise-constant images with preserved edges.

Related: LASSO

Why This Matters: Sparsity-Driven Imaging in Modern Radar Systems

The sparse recovery framework has had significant practical impact on radar and RF imaging:

  • ISAR (Inverse SAR): Ship and aircraft images in the range-Doppler domain are approximately sparse. LASSO-based ISAR achieves super-resolution by exploiting this sparsity, recovering targets from fewer pulses than traditional matched-filter methods require.

  • Compressed sensing radar: By designing waveforms whose sensing matrix A\mathcal{A} satisfies the RIP (randomised pulse coding, frequency hopping), one can reduce the number of required measurements from O(n)O(n) to O(slogn)O(s\log n) while preserving target recovery.

  • Through-wall imaging: Sparse recovery allows target detection behind walls even with very limited aperture, exploiting the sparsity of the interior scene.

The optimization algorithms for solving these LASSO and TV problems (ISTA, FISTA, ADMM, Chambolle–Pock) are developed in Telecom Ch 03. This section focuses on the modeling choices; their efficient solution is covered there.

See full treatment in The Matched Filter / Backpropagation Estimator

Key Takeaway

Variational regularization replaces the quadratic Tikhonov penalty with a task-specific functional R(x)R(x), yielding the MAP estimate under the prior p(x)eλR(x)p(x) \propto e^{-\lambda R(x)}. The LASSO (R=1R = \|\cdot\|_1) promotes sparse solutions and recovers point scatterers sharply under RIP conditions (mslognm \gtrsim s\log n measurements suffice). Total variation (R=TV()R = \mathrm{TV}(\cdot)) promotes piecewise-constant images with preserved edges — the method of choice for extended targets in RF imaging. Group sparsity (R=2,1R = \|\cdot\|_{2,1}) extends LASSO to multi-frequency imaging with joint support recovery. The optimization algorithms for these non-smooth problems are in Telecom Ch 03.