LASSO and Basis Pursuit for Imaging

LASSO and Basis Pursuit for RF Imaging

This section connects the LASSO and Basis Pursuit recovery programs to practical RF image reconstruction. The algorithm derivations (ISTA/FISTA, ADMM) live in RFI Ch 04 and Telecom Ch 03; here we focus on the imaging-specific aspects: setting up the optimization for the RF sensing matrix A\mathbf{A}, choosing the regularization parameter λ\lambda, handling complex-valued reflectivity, and debiasing the solution on the estimated support.

Definition:

Basis Pursuit for Noiseless RF Imaging

When the noise level is negligible (high SNR\text{SNR}), solve the Basis Pursuit problem:

c^BP=argmincc1s.t.Ac=y.\hat{\mathbf{c}}_{\text{BP}} = \arg\min_{\mathbf{c}} \|\mathbf{c}\|_1 \quad \text{s.t.} \quad \mathbf{A}\mathbf{c} = \mathbf{y}.

This is a linear program (for real signals) or a second-order cone program (SOCP, for complex signals) solvable by interior-point methods in O(N3)O(N^3).

When to use BP:

  • Known exact forward model (no model mismatch).
  • Very high SNR\text{SNR} (>40> 40 dB).
  • Moderate problem size (N104N \lesssim 10^4).

For large-scale imaging, BP is too expensive; the LASSO solved by FISTA or ADMM is preferred.

Definition:

LASSO and BPDN for Noisy RF Imaging

BPDN (constrained form):

c^=argmincc1s.t.Acy2ϵ,\hat{\mathbf{c}} = \arg\min_{\mathbf{c}} \|\mathbf{c}\|_1 \quad \text{s.t.} \quad \|\mathbf{A}\mathbf{c} - \mathbf{y}\|_2 \leq \epsilon,

where ϵ\epsilon is the noise budget (ϵMσ\epsilon \approx \sqrt{M}\sigma).

LASSO (penalized form):

c^=argminc12Acy22+λc1.\hat{\mathbf{c}} = \arg\min_{\mathbf{c}} \frac{1}{2}\|\mathbf{A}\mathbf{c} - \mathbf{y}\|_2^2 + \lambda\|\mathbf{c}\|_1.

Equivalence: For every ϵ>0\epsilon > 0, there exists a λ>0\lambda > 0 such that the BPDN and LASSO solutions coincide (and vice versa). The mapping ϵλ\epsilon \leftrightarrow \lambda depends on the data.

For RF imaging: The LASSO is almost always preferred because:

  • Efficient first-order solvers (FISTA, ADMM) are available.
  • λ\lambda is easier to tune than ϵ\epsilon.
  • Same theoretical guarantees (§The MIMO Sensing Matrix).

Theorem: Complex Soft-Thresholding for RF Imaging

RF measurements are complex-valued. The proximal operator of λ1\lambda\|\cdot\|_1 for complex vectors is the complex soft-thresholding:

Sλ(z)={z(1λ/z)z>λ,0zλ.\mathcal{S}_{\lambda}(z) = \begin{cases} z\,(1 - \lambda/|z|) & |z| > \lambda, \\ 0 & |z| \leq \lambda. \end{cases}

This shrinks the magnitude and preserves the phase.

Phase preservation is essential for coherent RF imaging where phase carries spatial information. Unlike real soft-thresholding (which simply shifts toward zero), complex soft-thresholding shrinks the complex magnitude while keeping the argument unchanged.

Definition:

Regularization Parameter Selection for RF Imaging

The choice of λ\lambda critically affects reconstruction quality. Three principled selection strategies:

1. Discrepancy principle (known σ\sigma): Choose λ\lambda such that Ac^y2=Mσ\|\mathbf{A}\hat{\mathbf{c}} - \mathbf{y}\|_2 = \sqrt{M}\sigma. This matches the residual to the expected noise level.

2. Cross-validation (unknown σ\sigma): Hold out a random subset of measurements I\mathcal{I}, solve on Ic\mathcal{I}^c, evaluate the prediction error on I\mathcal{I}. Repeat for K=5K = 5 folds and select λ\lambda minimizing the average prediction error.

3. SURE (Stein's Unbiased Risk Estimate): For the LASSO with Gaussian noise, SURE(λ)=Ac^y22Mσ2+2σ2df(λ)\text{SURE}(\lambda) = \|\mathbf{A}\hat{\mathbf{c}} - \mathbf{y}\|_2^2 - M\sigma^2 + 2\sigma^2\,\text{df}(\lambda), where df(λ)=supp(c^)\text{df}(\lambda) = |\text{supp}(\hat{\mathbf{c}})| is the number of nonzero entries.

For radar imaging: The discrepancy principle is most reliable because the noise level σ\sigma is typically known from the receiver noise figure.

,

Definition:

Debiasing on the Estimated Support

The LASSO solution c^LASSO\hat{\mathbf{c}}_{\text{LASSO}} is biased: the 1\ell_1 penalty shrinks all nonzero amplitudes toward zero.

Debiasing procedure:

  1. Estimate the support S^={n:c^n>ϵthr}\hat{S} = \{n : |\hat{c}_n| > \epsilon_{\text{thr}}\}.
  2. Solve the least-squares problem restricted to S^\hat{S}:

c^debias=argminc:supp(c)S^Acy22.\hat{\mathbf{c}}_{\text{debias}} = \arg\min_{\mathbf{c}:\,\text{supp}(\mathbf{c}) \subseteq \hat{S}} \|\mathbf{A}\mathbf{c} - \mathbf{y}\|_2^2.

  1. This is equivalent to c^S^=AS^y\hat{\mathbf{c}}_{\hat{S}} = \mathbf{A}_{\hat{S}}^\dagger\mathbf{y} where AS^\mathbf{A}_{\hat{S}} contains only the columns indexed by S^\hat{S}.

When to debias: Always for amplitude-critical applications (radar cross-section estimation, quantitative imaging). Not needed for pure detection (support recovery).

LASSO Reconstruction of Point Scatterers

Demonstrates LASSO reconstruction for an RF imaging scene with point scatterers. Vary λ\lambda to observe the bias-variance tradeoff: too large kills weak scatterers (over-regularization); too small retains noise (under-regularization).

Left: True scene and LASSO reconstruction. Right: Support recovery and amplitude error vs. λ\lambda.

Parameters
0.05
20
8

Example: LASSO Reconstruction for Compressed SAR

Setup: Stripmap SAR with 50% random frequency subsampling. N=128×128=16384N = 128 \times 128 = 16384 pixels, M=8192M = 8192 measurements. Scene: 25 point scatterers + 5 extended targets (buildings).

Results:

Method NMSE (dB) SSIM Time (s)
Matched filter 6.3-6.3 0.42 0.01
LASSO (FISTA, 100 iter) 18.7-18.7 0.89 2.1
LASSO (ADMM, 40 iter) 19.1-19.1 0.90 2.5
BP (SOCP) 19.3-19.3 0.91 180

FISTA and ADMM achieve nearly the same quality as BP at \sim100×\times lower computational cost. The matched filter provides a quick preview but with significant sidelobe artifacts.

FISTA vs. ADMM vs. Coordinate Descent for Imaging

The imaging-specific sensing matrix A\mathbf{A} has structure (Kronecker, partial DFT) that makes some solvers more natural:

Solver Cost/iter Best when
FISTA 2×2 \times matvec A\mathbf{A} has fast matvec (Kronecker/FFT)
ADMM Linear system + prox Composite penalties (1\ell_1 + TV)
Coordinate descent O(M)O(M) per coordinate A\mathbf{A} stored explicitly, moderate NN

For RF imaging with Kronecker A\mathbf{A}, FISTA is typically fastest for the pure LASSO. ADMM dominates for composite penalties because FISTA would require an inner proximal loop. Coordinate descent is rarely used because the matvec with A\mathbf{A} does not decompose coordinate-wise.

Common Mistake: λ\lambda Selection Pitfalls in RF Imaging

Mistake:

Setting λ\lambda by visual inspection or by using the universal threshold λ=σ2logN\lambda = \sigma\sqrt{2\log N} without accounting for the sensing matrix structure. The universal threshold assumes an orthonormal A\mathbf{A}, but the RF sensing matrix has non-uniform column norms and high coherence, so the effective threshold is different.

Correction:

Use the discrepancy principle when σ\sigma is known (standard in radar). If σ\sigma is unknown, use 5-fold cross-validation. Always validate the chosen λ\lambda by inspecting the residual image Ac^y\mathbf{A}\hat{\mathbf{c}} - \mathbf{y} for systematic structure (which indicates under-regularization).

Choosing the Sparsifying Basis for RF Imaging

The LASSO assumes sparsity in some domain. For RF imaging:

Basis Ψ\boldsymbol{\Psi} Sparse for LASSO becomes
Identity Point scatterers min12Acy2+λc1\min\frac{1}{2}\|\mathbf{A}\mathbf{c}-\mathbf{y}\|^2 + \lambda\|\mathbf{c}\|_1
Wavelet Natural scenes with edges min12AΨθy2+λθ1\min\frac{1}{2}\|\mathbf{A}\boldsymbol{\Psi}\boldsymbol{\theta}-\mathbf{y}\|^2 + \lambda\|\boldsymbol{\theta}\|_1
DCT Smooth scenes Same form with DCT Ψ\boldsymbol{\Psi}
Learned dictionary Complex scenes Requires dictionary learning

Practical advice:

  • Start with the identity (canonical sparsity) for scenes dominated by point scatterers.
  • Use wavelets for scenes with extended targets.
  • Use TV (analysis sparsity, §Total Variation Reconstruction) for piecewise-constant scenes.
  • For mixed scenes, combine 1\ell_1 + TV via ADMM.

Quick Check

After solving the LASSO for an RF imaging problem, you obtain c^LASSO\hat{\mathbf{c}}_{\text{LASSO}}. Why might the estimated scatterer amplitudes be systematically too small?

The 1\ell_1 penalty shrinks all nonzero entries toward zero (shrinkage bias).

The sensing matrix A\mathbf{A} has too few rows.

The noise level is too high.

The grid resolution is too coarse.

Historical Note: From Statistics to Radar Imaging

1996--2006

The LASSO was introduced by Robert Tibshirani in 1996 as a statistical regularization method for linear regression. Its adoption in signal processing and imaging was catalyzed by the compressed sensing revolution (Candes, Romberg, Tao, 2006; Donoho, 2006), which provided the theoretical guarantees showing that 1\ell_1 minimization could exactly recover sparse signals from incomplete measurements. Cetin and Karl (2001) were among the first to apply sparsity-promoting regularization to SAR imaging, demonstrating dramatic improvements over matched filter reconstruction.

,

LASSO

Least Absolute Shrinkage and Selection Operator: the penalized least-squares problem min12Acy22+λc1\min \frac{1}{2}\|\mathbf{A}\mathbf{c} - \mathbf{y}\|_2^2 + \lambda\|\mathbf{c}\|_1. Promotes sparsity through the 1\ell_1 penalty.

Related: Basis Pursuit, FISTA

Basis Pursuit

The 1\ell_1 minimization problem under equality constraints: minc1\min \|\mathbf{c}\|_1 s.t. Ac=y\mathbf{A}\mathbf{c} = \mathbf{y}. Equivalent to the LASSO with λ0\lambda \to 0 in the noiseless case.

Related: LASSO

FISTA

Fast Iterative Shrinkage-Thresholding Algorithm: Nesterov- accelerated proximal gradient for the LASSO, achieving O(1/t2)O(1/t^2) convergence vs. O(1/t)O(1/t) for ISTA.

Related: LASSO

Debiasing

Post-processing step that removes the 1\ell_1 shrinkage bias by solving least-squares restricted to the estimated support.

Related: LASSO

Discrepancy Principle

Regularization parameter selection rule that chooses λ\lambda so the residual norm matches the expected noise level: Ac^y2Mσ\|\mathbf{A}\hat{\mathbf{c}} - \mathbf{y}\|_2 \approx \sqrt{M}\sigma.

Key Takeaway

Basis Pursuit is the gold standard for noiseless recovery but too expensive (O(N3)O(N^3)) for large-scale imaging. The LASSO solved by FISTA or ADMM achieves near-BP quality at orders-of-magnitude lower cost. Complex soft-thresholding preserves phase, essential for coherent RF imaging. The discrepancy principle is the recommended strategy for choosing λ\lambda in radar imaging where the noise level is known. Always debias on the detected support when amplitude accuracy matters.

Why This Matters: LASSO in Wireless Channel Estimation

The same LASSO formulation used for RF image reconstruction appears in mmWave and sub-THz channel estimation (Telecom Ch 35), where the channel is sparse in the angle-delay domain. The sensing matrix is a partial DFT (pilot tones), and the LASSO recovers the channel taps. The debiasing step is equally important there to obtain accurate channel gain estimates for beamforming.

See full treatment in Chapter 35