Ferkans — Interactive Telecom Tutor

Beyond Quadratic Penalties — Sparsity and Edges

Tikhonov regularization uses the quadratic penalty $\|x\|^2$ , which encodes a Gaussian prior: the reconstructed image should have small $\ell_2$ norm. This is sensible for smooth images, but many RF imaging scenes are not smooth:

Radar scenes consist of point scatterers, extended edges, or piecewise-constant regions — the wrong prior for $\ell_2$ .
The scattering coefficients in a sparse scene are zero everywhere except at target locations — a sparse signal, not a small-norm one.

The variational regularization framework replaces $\|x\|^2$ with an arbitrary convex penalty $R(x)$ chosen to encode problem-specific prior knowledge. Choosing $R(x) = \|x\|_1$ promotes sparsity; choosing $R(x) = \mathrm{TV}(x)$ promotes piecewise-constant images with sharp edges. Both have a Bayesian interpretation: $R(x) = \|x\|_1$ corresponds to a Laplace prior; $R(x) = \mathrm{TV}(x)$ to a total-variation prior.

Definition:
Variational Regularization

Given a forward operator $\mathcal{A}$ , noisy data $y^\delta$ , and a convex penalty $R \in \Gamma_0(\mathcal{X})$ , the variational regularization problem is

$\hat{x} = \arg\min_{x \in \mathcal{X}} \left\{\frac{1}{2}\|\mathcal{A}x - y^\delta\|^2 + \lambda\, R(x)\right\},$

where $\lambda > 0$ is the regularization parameter.

More generally, the data fidelity term can be replaced by any convex loss $D(\mathcal{A}x, y^\delta)$ (e.g., the Kullback–Leibler divergence for Poisson noise, or the $\ell_1$ fidelity for impulsive noise).

The choice of $R$ encodes prior knowledge:

$R(x) = \frac{1}{2}\|x\|^2$ : Tikhonov (smooth, Gaussian prior).
$R(x) = \|x\|_1$ : Sparsity in the canonical basis (Laplace prior).
$R(x) = \|\Phi x\|_1$ : Sparsity in a transform domain $\Phi$ (e.g., wavelets).
$R(x) = \mathrm{TV}(x)$ : Piecewise constancy (edge-preserving).
$R(x) = \iota_C(x)$ : Hard constraint $x \in C$ (support, non-negativity).
$R(x) = \|x\|_{2,1}$ : Group sparsity (multi-frequency imaging).

,

Theorem: Existence of Minimizers for Variational Regularization

Let $\mathcal{A} \colon \mathbb{R}^n \to \mathbb{R}^m$ be linear and let $R \in \Gamma_0(\mathbb{R}^n)$ be coercive (i.e., $R(x) \to +\infty$ as $\|x\| \to \infty$ ). Then the functional

$J(x) = \frac{1}{2}\|\mathcal{A}x - y^\delta\|^2 + \lambda\, R(x)$

attains its minimum. If either $\mathcal{A}$ is injective or $R$ is strictly convex, the minimizer is unique.

Proof

Coercivity of $J$

Since $R$ is coercive and the data fidelity term is non-negative, $J(x) \to +\infty$ as $\|x\| \to \infty$ .

Lower semicontinuity

$J$ is the sum of a continuous function ( $\frac{1}{2}\|\mathcal{A}x - y^\delta\|^2$ ) and a lower semicontinuous function ( $\lambda R$ ), hence lower semicontinuous.

Existence and uniqueness

A coercive, lower semicontinuous function on $\mathbb{R}^n$ attains its infimum (by compactness of sublevel sets). If $\mathcal{A}$ is injective, the data fidelity is strictly convex, making $J$ strictly convex and the minimizer unique. $\blacksquare$

Definition:
LASSO — $\ell_1$ Regularization

The LASSO (Least Absolute Shrinkage and Selection Operator) is the variational problem

$\hat{x} = \arg\min_{x \in \mathbb{R}^n} \left\{\frac{1}{2}\|\mathcal{A}x - y^\delta\|_2^2 + \lambda\|x\|_1\right\}.$

The $\ell_1$ ball $\{x : \|x\|_1 \leq \eta\}$ has corners on the coordinate axes, so the constraint set pushes the solution toward the axes — i.e., toward sparse solutions.

Bayesian interpretation: The LASSO solution is the MAP estimate under independent Laplace priors: $p(x) \propto \exp(-\lambda\|x\|_1/\sigma_n^2)$ .

The LASSO was introduced by Tibshirani (1996) in statistics and independently as Basis Pursuit Denoising by Chen, Donoho, and Saunders (1998) in signal processing. The LASSO has no closed-form solution in general and requires iterative algorithms (ISTA/FISTA/ADMM — see Telecom Ch 03), but a proximal step (soft thresholding) efficiently handles the $\ell_1$ term.

,

Theorem: Sparsity of LASSO Solutions — Optimality Conditions

Let $\hat{x}$ be a LASSO solution with $\lambda > 0$ , and let $r = \mathcal{A}\hat{x} - y^\delta$ be the residual. The optimality (KKT) conditions are:

$[\mathcal{A}^* r]_i + \lambda s_i = 0, \qquad s_i \in \partial|\hat{x}_i|,$

i.e., $s_i = \mathrm{sign}(\hat{x}_i)$ if $\hat{x}_i \neq 0$ and $s_i \in [-1,1]$ if $\hat{x}_i = 0$ .

Therefore:

If $|[\mathcal{A}^* r]_i| < \lambda$ , then $\hat{x}_i = 0$ .
The support satisfies $|\mathrm{supp}(\hat{x})| \leq m$ (at most as many non-zeros as measurements).

Proof

Optimality condition

From the Fermat rule: $0 \in \mathcal{A}^*(\mathcal{A}\hat{x} - y^\delta) + \lambda\,\partial\|\hat{x}\|_1$ . Component $i$ : $[\mathcal{A}^* r]_i + \lambda s_i = 0$ where $s_i = \mathrm{sign}(\hat{x}_i)$ if $\hat{x}_i \neq 0$ and $s_i \in [-1,1]$ if $\hat{x}_i = 0$ .

Sparsity mechanism

If $\hat{x}_i = 0$ , then we need $|[\mathcal{A}^* r]_i| \leq \lambda$ . Conversely, if $|[\mathcal{A}^* r]_i| < \lambda$ , we must have $\hat{x}_i = 0$ . Only components where the correlation $|[\mathcal{A}^* r]_i|$ reaches the threshold $\lambda$ can be non-zero. This is the threshold effect of $\ell_1$ regularization. $\blacksquare$

,

Theorem: Exact Recovery via $\ell_1$ Minimization

Let $x^\dagger \in \mathbb{R}^n$ be $s$ -sparse with support $S = \mathrm{supp}(x^\dagger)$ . If $\mathcal{A}$ satisfies the Restricted Isometry Property (RIP) of order $2s$ with constant $\delta_{2s} < \sqrt{2} - 1$ , then:

(a) (Noiseless) Basis pursuit recovers $x^\dagger$ exactly: $\hat{x}_{BP} = x^\dagger$ .

(b) (Noisy) The BPDN solution satisfies

$\|\hat{x}_{BPDN} - x^\dagger\|_2 \leq C\,\delta$

where $C$ depends only on $\delta_{2s}$ .

The RIP says $\mathcal{A}$ acts approximately as an isometry on all $2s$ -sparse vectors:

$(1 - \delta_{2s})\|x\|^2 \leq \|\mathcal{A}x\|^2 \leq (1 + \delta_{2s})\|x\|^2.$

This prevents any two distinct $s$ -sparse vectors from producing the same measurements. Random matrices (Gaussian, Bernoulli, partial Fourier) satisfy the RIP with high probability when $m \gtrsim s \log(n/s)$ — far fewer measurements than the full $n$ . The compressed sensing guarantee is covered in depth in FSI Ch 13.

Proof

Invoke the compressed sensing guarantee

The proof uses the RIP to show that any deviation $h = \hat{x} - x^\dagger$ from the true support must satisfy $\|h_{S^c}\|_1 \leq \|h_S\|_1$ (the off-support mass is bounded by the on-support mass).

Conclude

Combined with the null space property and the RIP, the $\ell_1$ constraint forces $h \to 0$ . The noisy case follows by adding a bias term. $\blacksquare$

,

Definition:
Total Variation Regularization

For a discrete image $x \in \mathbb{R}^{n_1 \times n_2}$ , the discrete total variation is:

Isotropic TV: $\mathrm{TV}_{\mathrm{iso}}(x) = \sum_{i,j} \sqrt{(\nabla_1 x)_{i,j}^2 + (\nabla_2 x)_{i,j}^2},$

where $(\nabla_1 x)_{i,j} = x_{i+1,j} - x_{i,j}$ and $(\nabla_2 x)_{i,j} = x_{i,j+1} - x_{i,j}$ .

Anisotropic TV: $\mathrm{TV}_{\mathrm{aniso}}(x) = \sum_{i,j} \bigl(|(\nabla_1 x)_{i,j}| + |(\nabla_2 x)_{i,j}|\bigr).$

The TV-regularized reconstruction is then

$\hat{x} = \arg\min_x \;\frac{1}{2}\|\mathcal{A}x - y^\delta\|^2 + \lambda\,\mathrm{TV}(x).$

TV promotes piecewise constant images with sharp edges: it allows large gradients at a few locations while penalising gradients everywhere else. This contrasts with Tikhonov, which penalises gradients everywhere uniformly (leading to blurred edges) and with LASSO, which promotes sparsity in the pixel domain (wrong for extended features).

The ROF denoising model (Rudin–Osher–Fatemi, 1992) is the special case $\mathcal{A} = I$ .

,

Definition:
Group Sparsity for Multi-Frequency Imaging

In multi-frequency RF imaging, measurements are taken at $Q$ different frequencies. The measurement model at each frequency $q$ is $y_q = \mathcal{A}_q x_q + \eta_q$ , where $x_q$ is the complex reflectivity map at frequency $q$ .

If the scatterer positions are the same at all frequencies (a sparse scene), the group-sparse (joint sparsity) model imposes

$R(x) = \|x\|_{2,1} = \sum_{i=1}^n \|(x_1(i), \ldots, x_Q(i))\|_2,$

where $x_q(i)$ is the $i$ -th pixel of the $q$ -th frequency channel. The $\ell_{2,1}$ norm promotes solutions where entire groups of components (same pixel across all frequencies) are simultaneously zero or non-zero — joint support recovery.

Group sparsity generalises LASSO ( $\ell_1$ ) to the multi-measurement case. The proximal operator of the $\ell_{2,1}$ norm is group soft thresholding. When the group sizes are 1, group sparsity reduces to standard LASSO.

,

Example: LASSO for Sparse Radar Imaging

A radar system illuminates a scene containing $s$ point scatterers at unknown positions. The measurement model is $y = \mathcal{A}g + \epsilon$ where $\mathcal{A}$ is the discretized radar forward operator and $g$ is the reflectivity image. Formulate the reconstruction as a LASSO problem and discuss the choice of $\lambda$ .

Solution

Formulation

The LASSO reconstruction is

$\hat{g} = \arg\min_g \;\frac{1}{2}\|\mathcal{A}g - y\|_2^2 + \lambda\|g\|_1.$

This promotes a sparse reflectivity map — appropriate when the scene consists of isolated point scatterers.

Parameter selection

A principled choice is $\lambda = \sigma\sqrt{2\log n}$ where $\sigma$ is the noise standard deviation and $n$ is the number of pixels. This threshold ensures that noise-only components are suppressed with high probability (the universal threshold of Donoho and Johnstone, 1994).

Comparison with Tikhonov and TV

Unlike Tikhonov (DTikhonov Regularization), which spreads energy over all components, LASSO concentrates energy on a few components, giving sharper point-scatterer localisation at the cost of potential model mismatch when the scene is not truly sparse.

TV (DTotal Variation Regularization) is preferred when the scene consists of extended objects with well-defined boundaries rather than isolated point scatterers.

,

Example: MAP Interpretation of Variational Regularization

Show that the variational problem $\min_x \frac{1}{2\sigma_n^2}\|\mathcal{A}x - y\|^2 + \lambda R(x)$ is the MAP estimate under the prior $p(x) \propto e^{-\lambda R(x)}$ and Gaussian likelihood $y|x \sim \mathcal{N}(\mathcal{A}x, \sigma_n^2 I)$ . Identify the prior for the cases $R(x) = \frac{1}{2}\|x\|^2$ and $R(x) = \|x\|_1$ .

Solution

Posterior maximization

The log-posterior is (up to constants):

$\log p(x|y) = -\frac{1}{2\sigma_n^2}\|y - \mathcal{A}x\|^2 - \lambda R(x) + \mathrm{const}.$

Maximising $\log p(x|y)$ is equivalent to minimising $\frac{1}{2\sigma_n^2}\|y - \mathcal{A}x\|^2 + \lambda R(x)$ , which is the variational problem.

Identify priors

$R(x) = \frac{1}{2}\|x\|^2$ : $p(x) \propto e^{-\lambda\|x\|^2/2}$ — Gaussian prior $x \sim \mathcal{N}(0, \lambda^{-1} I)$ . MAP = Tikhonov solution.
$R(x) = \|x\|_1$ : $p(x) \propto e^{-\lambda\|x\|_1}$ — Laplace prior (product of independent Laplace distributions). MAP = LASSO solution.
$R(x) = \mathrm{TV}(x)$ : $p(x) \propto e^{-\lambda\mathrm{TV}(x)}$ — TV prior (Markov Random Field with Laplace clique potentials on adjacent pixel differences).

,

Variational Regularization: Tikhonov vs. LASSO vs. TV

Compares three regularization penalties for 1D signal reconstruction. The true signal alternates between sparse (point impulses) and piecewise-constant (step) sections to highlight the strengths and weaknesses of each method.

Left panel: True signal (blue), noisy data (gray), and three reconstructions (red = Tikhonov, green = LASSO, orange = TV).

Right panel: Reconstruction error for each method vs. $\lambda$ .

Observe: Tikhonov blurs all features equally. LASSO recovers the sparse components sharply but smooths the piecewise-constant regions (no single support per region). TV recovers the step edges but may produce staircase artefacts on smooth sections.

Parameters

Regularization parameter

\lambda

0.1

Noise level

\sigma

0.1

Signal type

Common Mistake: LASSO Is Wrong for Extended Targets

Mistake:

Applying LASSO (or $\ell_1$ regularization) to an RF imaging problem where the scene contains extended objects (walls, vehicles, terrain) rather than point scatterers.

Correction:

LASSO promotes solutions that are pixel-sparse: most pixels are zero, with a few large non-zero pixels. Extended objects violate this model — they have many non-zero pixels and the $\ell_1$ penalty will try to collapse them onto a few concentrated pixels.

For extended objects:

Use TV regularization for piecewise-constant objects with well-defined boundaries.
Use group sparsity ( $\ell_{2,1}$ ) if multi-frequency data is available and objects are spatially coherent across frequencies.
Use wavelet sparsity (analysis or synthesis) if objects are smooth in a wavelet basis.

LASSO

The Least Absolute Shrinkage and Selection Operator: the variational problem $\min_x \frac{1}{2}\|\mathcal{A}x - y\|^2 + \lambda\|x\|_1$ . It promotes sparse solutions and corresponds to MAP estimation with a Laplace prior.

Related: Picard Condition

Total Variation

The total variation of a discrete image $x$ is the $\ell_1$ norm of its discrete gradient: $\mathrm{TV}(x) = \|\nabla x\|_{2,1}$ (isotropic) or $\|\nabla x\|_1$ (anisotropic). TV regularization promotes piecewise-constant images with preserved edges.

Related: LASSO

Why This Matters: Sparsity-Driven Imaging in Modern Radar Systems

The sparse recovery framework has had significant practical impact on radar and RF imaging:

ISAR (Inverse SAR): Ship and aircraft images in the range-Doppler domain are approximately sparse. LASSO-based ISAR achieves super-resolution by exploiting this sparsity, recovering targets from fewer pulses than traditional matched-filter methods require.
Compressed sensing radar: By designing waveforms whose sensing matrix $\mathcal{A}$ satisfies the RIP (randomised pulse coding, frequency hopping), one can reduce the number of required measurements from $O(n)$ to $O(s\log n)$ while preserving target recovery.
Through-wall imaging: Sparse recovery allows target detection behind walls even with very limited aperture, exploiting the sparsity of the interior scene.

The optimization algorithms for solving these LASSO and TV problems (ISTA, FISTA, ADMM, Chambolle–Pock) are developed in Telecom Ch 03. This section focuses on the modeling choices; their efficient solution is covered there.

See full treatment in The Matched Filter / Backpropagation Estimator

Key Takeaway

Variational regularization replaces the quadratic Tikhonov penalty with a task-specific functional $R(x)$ , yielding the MAP estimate under the prior $p(x) \propto e^{-\lambda R(x)}$ . The LASSO ( $R = \|\cdot\|_1$ ) promotes sparse solutions and recovers point scatterers sharply under RIP conditions ( $m \gtrsim s\log n$ measurements suffice). Total variation ( $R = \mathrm{TV}(\cdot)$ ) promotes piecewise-constant images with preserved edges — the method of choice for extended targets in RF imaging. Group sparsity ( $R = \|\cdot\|_{2,1}$ ) extends LASSO to multi-frequency imaging with joint support recovery. The optimization algorithms for these non-smooth problems are in Telecom Ch 03.

Variational Regularization and Sparsity

Beyond Quadratic Penalties — Sparsity and Edges

Definition: Variational Regularization

Theorem: Existence of Minimizers for Variational Regularization

Coercivity of $J$

Lower semicontinuity

Existence and uniqueness

Definition: LASSO — ℓ1\ell_1ℓ1​ Regularization

Theorem: Sparsity of LASSO Solutions — Optimality Conditions

Optimality condition

Sparsity mechanism

Theorem: Exact Recovery via ℓ1\ell_1ℓ1​ Minimization

Invoke the compressed sensing guarantee

Conclude

Definition: Total Variation Regularization

Definition: Group Sparsity for Multi-Frequency Imaging

Example: LASSO for Sparse Radar Imaging

Formulation

Parameter selection

Comparison with Tikhonov and TV

Example: MAP Interpretation of Variational Regularization

Posterior maximization

Identify priors

Variational Regularization: Tikhonov vs. LASSO vs. TV

Parameters

Common Mistake: LASSO Is Wrong for Extended Targets

LASSO

Total Variation

Why This Matters: Sparsity-Driven Imaging in Modern Radar Systems

Key Takeaway

Definition:
Variational Regularization

Definition:
LASSO — $\ell_1$ Regularization

Theorem: Exact Recovery via $\ell_1$ Minimization

Definition:
Total Variation Regularization

Definition:
Group Sparsity for Multi-Frequency Imaging