Variational Regularization and Sparsity
Beyond Quadratic Penalties — Sparsity and Edges
Tikhonov regularization uses the quadratic penalty , which encodes a Gaussian prior: the reconstructed image should have small norm. This is sensible for smooth images, but many RF imaging scenes are not smooth:
- Radar scenes consist of point scatterers, extended edges, or piecewise-constant regions — the wrong prior for .
- The scattering coefficients in a sparse scene are zero everywhere except at target locations — a sparse signal, not a small-norm one.
The variational regularization framework replaces with an arbitrary convex penalty chosen to encode problem-specific prior knowledge. Choosing promotes sparsity; choosing promotes piecewise-constant images with sharp edges. Both have a Bayesian interpretation: corresponds to a Laplace prior; to a total-variation prior.
Definition: Variational Regularization
Variational Regularization
Given a forward operator , noisy data , and a convex penalty , the variational regularization problem is
where is the regularization parameter.
More generally, the data fidelity term can be replaced by any convex loss (e.g., the Kullback–Leibler divergence for Poisson noise, or the fidelity for impulsive noise).
The choice of encodes prior knowledge:
- : Tikhonov (smooth, Gaussian prior).
- : Sparsity in the canonical basis (Laplace prior).
- : Sparsity in a transform domain (e.g., wavelets).
- : Piecewise constancy (edge-preserving).
- : Hard constraint (support, non-negativity).
- : Group sparsity (multi-frequency imaging).
Theorem: Existence of Minimizers for Variational Regularization
Let be linear and let be coercive (i.e., as ). Then the functional
attains its minimum. If either is injective or is strictly convex, the minimizer is unique.
Coercivity of $J$
Since is coercive and the data fidelity term is non-negative, as .
Lower semicontinuity
is the sum of a continuous function () and a lower semicontinuous function (), hence lower semicontinuous.
Existence and uniqueness
A coercive, lower semicontinuous function on attains its infimum (by compactness of sublevel sets). If is injective, the data fidelity is strictly convex, making strictly convex and the minimizer unique.
Definition: LASSO — Regularization
LASSO — Regularization
The LASSO (Least Absolute Shrinkage and Selection Operator) is the variational problem
The ball has corners on the coordinate axes, so the constraint set pushes the solution toward the axes — i.e., toward sparse solutions.
Bayesian interpretation: The LASSO solution is the MAP estimate under independent Laplace priors: .
The LASSO was introduced by Tibshirani (1996) in statistics and independently as Basis Pursuit Denoising by Chen, Donoho, and Saunders (1998) in signal processing. The LASSO has no closed-form solution in general and requires iterative algorithms (ISTA/FISTA/ADMM — see Telecom Ch 03), but a proximal step (soft thresholding) efficiently handles the term.
Theorem: Sparsity of LASSO Solutions — Optimality Conditions
Let be a LASSO solution with , and let be the residual. The optimality (KKT) conditions are:
i.e., if and if .
Therefore:
- If , then .
- The support satisfies (at most as many non-zeros as measurements).
Optimality condition
From the Fermat rule: . Component : where if and if .
Sparsity mechanism
If , then we need . Conversely, if , we must have . Only components where the correlation reaches the threshold can be non-zero. This is the threshold effect of regularization.
Theorem: Exact Recovery via Minimization
Let be -sparse with support . If satisfies the Restricted Isometry Property (RIP) of order with constant , then:
(a) (Noiseless) Basis pursuit recovers exactly: .
(b) (Noisy) The BPDN solution satisfies
where depends only on .
The RIP says acts approximately as an isometry on all -sparse vectors:
This prevents any two distinct -sparse vectors from producing the same measurements. Random matrices (Gaussian, Bernoulli, partial Fourier) satisfy the RIP with high probability when — far fewer measurements than the full . The compressed sensing guarantee is covered in depth in FSI Ch 13.
Invoke the compressed sensing guarantee
The proof uses the RIP to show that any deviation from the true support must satisfy (the off-support mass is bounded by the on-support mass).
Conclude
Combined with the null space property and the RIP, the constraint forces . The noisy case follows by adding a bias term.
Definition: Total Variation Regularization
Total Variation Regularization
For a discrete image , the discrete total variation is:
Isotropic TV:
where and .
Anisotropic TV:
The TV-regularized reconstruction is then
TV promotes piecewise constant images with sharp edges: it allows large gradients at a few locations while penalising gradients everywhere else. This contrasts with Tikhonov, which penalises gradients everywhere uniformly (leading to blurred edges) and with LASSO, which promotes sparsity in the pixel domain (wrong for extended features).
The ROF denoising model (Rudin–Osher–Fatemi, 1992) is the special case .
Definition: Group Sparsity for Multi-Frequency Imaging
Group Sparsity for Multi-Frequency Imaging
In multi-frequency RF imaging, measurements are taken at different frequencies. The measurement model at each frequency is , where is the complex reflectivity map at frequency .
If the scatterer positions are the same at all frequencies (a sparse scene), the group-sparse (joint sparsity) model imposes
where is the -th pixel of the -th frequency channel. The norm promotes solutions where entire groups of components (same pixel across all frequencies) are simultaneously zero or non-zero — joint support recovery.
Group sparsity generalises LASSO () to the multi-measurement case. The proximal operator of the norm is group soft thresholding. When the group sizes are 1, group sparsity reduces to standard LASSO.
Example: LASSO for Sparse Radar Imaging
A radar system illuminates a scene containing point scatterers at unknown positions. The measurement model is where is the discretized radar forward operator and is the reflectivity image. Formulate the reconstruction as a LASSO problem and discuss the choice of .
Formulation
The LASSO reconstruction is
This promotes a sparse reflectivity map — appropriate when the scene consists of isolated point scatterers.
Parameter selection
A principled choice is where is the noise standard deviation and is the number of pixels. This threshold ensures that noise-only components are suppressed with high probability (the universal threshold of Donoho and Johnstone, 1994).
Comparison with Tikhonov and TV
Unlike Tikhonov (DTikhonov Regularization), which spreads energy over all components, LASSO concentrates energy on a few components, giving sharper point-scatterer localisation at the cost of potential model mismatch when the scene is not truly sparse.
TV (DTotal Variation Regularization) is preferred when the scene consists of extended objects with well-defined boundaries rather than isolated point scatterers.
Example: MAP Interpretation of Variational Regularization
Show that the variational problem is the MAP estimate under the prior and Gaussian likelihood . Identify the prior for the cases and .
Posterior maximization
The log-posterior is (up to constants):
Maximising is equivalent to minimising , which is the variational problem.
Identify priors
-
: — Gaussian prior . MAP = Tikhonov solution.
-
: — Laplace prior (product of independent Laplace distributions). MAP = LASSO solution.
-
: — TV prior (Markov Random Field with Laplace clique potentials on adjacent pixel differences).
Variational Regularization: Tikhonov vs. LASSO vs. TV
Compares three regularization penalties for 1D signal reconstruction. The true signal alternates between sparse (point impulses) and piecewise-constant (step) sections to highlight the strengths and weaknesses of each method.
Left panel: True signal (blue), noisy data (gray), and three reconstructions (red = Tikhonov, green = LASSO, orange = TV).
Right panel: Reconstruction error for each method vs. .
Observe: Tikhonov blurs all features equally. LASSO recovers the sparse components sharply but smooths the piecewise-constant regions (no single support per region). TV recovers the step edges but may produce staircase artefacts on smooth sections.
Parameters
Common Mistake: LASSO Is Wrong for Extended Targets
Mistake:
Applying LASSO (or regularization) to an RF imaging problem where the scene contains extended objects (walls, vehicles, terrain) rather than point scatterers.
Correction:
LASSO promotes solutions that are pixel-sparse: most pixels are zero, with a few large non-zero pixels. Extended objects violate this model — they have many non-zero pixels and the penalty will try to collapse them onto a few concentrated pixels.
For extended objects:
- Use TV regularization for piecewise-constant objects with well-defined boundaries.
- Use group sparsity () if multi-frequency data is available and objects are spatially coherent across frequencies.
- Use wavelet sparsity (analysis or synthesis) if objects are smooth in a wavelet basis.
LASSO
The Least Absolute Shrinkage and Selection Operator: the variational problem . It promotes sparse solutions and corresponds to MAP estimation with a Laplace prior.
Related: Picard Condition
Total Variation
The total variation of a discrete image is the norm of its discrete gradient: (isotropic) or (anisotropic). TV regularization promotes piecewise-constant images with preserved edges.
Related: LASSO
Why This Matters: Sparsity-Driven Imaging in Modern Radar Systems
The sparse recovery framework has had significant practical impact on radar and RF imaging:
-
ISAR (Inverse SAR): Ship and aircraft images in the range-Doppler domain are approximately sparse. LASSO-based ISAR achieves super-resolution by exploiting this sparsity, recovering targets from fewer pulses than traditional matched-filter methods require.
-
Compressed sensing radar: By designing waveforms whose sensing matrix satisfies the RIP (randomised pulse coding, frequency hopping), one can reduce the number of required measurements from to while preserving target recovery.
-
Through-wall imaging: Sparse recovery allows target detection behind walls even with very limited aperture, exploiting the sparsity of the interior scene.
The optimization algorithms for solving these LASSO and TV problems (ISTA, FISTA, ADMM, Chambolle–Pock) are developed in Telecom Ch 03. This section focuses on the modeling choices; their efficient solution is covered there.
See full treatment in The Matched Filter / Backpropagation Estimator
Key Takeaway
Variational regularization replaces the quadratic Tikhonov penalty with a task-specific functional , yielding the MAP estimate under the prior . The LASSO () promotes sparse solutions and recovers point scatterers sharply under RIP conditions ( measurements suffice). Total variation () promotes piecewise-constant images with preserved edges — the method of choice for extended targets in RF imaging. Group sparsity () extends LASSO to multi-frequency imaging with joint support recovery. The optimization algorithms for these non-smooth problems are in Telecom Ch 03.