Recovery Guarantees

From RIP to Recovery

Having established that random sensing matrices satisfy RIP with M=O(slog⁑(N/s))M = O(s \log(N/s)) measurements, we now complete the story: RIP implies that β„“1\ell_1 minimization recovers ss-sparse signals exactly in the noiseless regime, and stably in the noisy regime. The punchline is a deterministic, instance-independent guarantee: one checks RIP once, and every ss-sparse signal is recovered from y=Ax⋆+w\mathbf{y} = \mathbf{A}\mathbf{x}^\star + \mathbf{w}. The same guarantee extends to compressible (not exactly sparse) vectors via the best ss-term approximation error Οƒs(x⋆)1\sigma_s(\mathbf{x}^\star)_1, yielding a beautiful oracle inequality: the LASSO behaves as if an oracle had told us the true support in advance, up to logarithmic factors.

Theorem: Exact Recovery via β„“1\ell_1 Minimization (Noiseless)

Let A∈RMΓ—N\mathbf{A} \in \mathbb{R}^{M \times N} satisfy the RIP of order 2s2s with constant Ξ΄2s<2βˆ’1β‰ˆ0.414\delta_{2s} < \sqrt{2} - 1 \approx 0.414. For every ss-sparse x⋆\mathbf{x}^\star and measurements y=Ax⋆\mathbf{y} = \mathbf{A}\mathbf{x}^\star, the Basis Pursuit solution x^=arg⁑min⁑βˆ₯xβˆ₯1Β s.t.Β Ax=y\hat{\mathbf{x}} = \arg\min \|\mathbf{x}\|_1 \text{ s.t. } \mathbf{A}\mathbf{x} = \mathbf{y} is unique and equals x⋆\mathbf{x}^\star.

If BP returned some other vector x^β‰ x⋆\hat{\mathbf{x}} \neq \mathbf{x}^\star, the difference h=x^βˆ’x⋆\mathbf{h} = \hat{\mathbf{x}} - \mathbf{x}^\star would lie in ker⁑(A)\ker(\mathbf{A}). RIP bounds the β„“2\ell_2 norm of h\mathbf{h} restricted to any small support, while β„“1\ell_1 optimality of x^\hat{\mathbf{x}} bounds the energy of h\mathbf{h} off the support of x⋆\mathbf{x}^\star. Combining these forces h=0\mathbf{h} = \mathbf{0}.

,

Theorem: Stable Recovery under Bounded Noise

Let A\mathbf{A} satisfy RIP with Ξ΄2s<2βˆ’1\delta_{2s} < \sqrt{2} - 1. For any xβ‹†βˆˆRN\mathbf{x}^\star \in \mathbb{R}^N (not necessarily sparse) and y=Ax⋆+w\mathbf{y} = \mathbf{A}\mathbf{x}^\star + \mathbf{w} with βˆ₯wβˆ₯2≀η\|\mathbf{w}\|_2 \leq \eta, the BPDN solution x^\hat{\mathbf{x}} (minimizing βˆ₯xβˆ₯1\|\mathbf{x}\|_1 subject to βˆ₯Axβˆ’yβˆ₯2≀η\|\mathbf{A}\mathbf{x} - \mathbf{y}\|_2 \leq \eta) satisfies βˆ₯x^βˆ’x⋆βˆ₯2≀C0Ξ·+C1Οƒs(x⋆)1s,\|\hat{\mathbf{x}} - \mathbf{x}^\star\|_2 \leq C_0 \eta + C_1 \frac{\sigma_s(\mathbf{x}^\star)_1}{\sqrt{s}}, where Οƒs(x⋆)1=min⁑∣Tβˆ£β‰€sβˆ₯xβ‹†βˆ’xT⋆βˆ₯1\sigma_s(\mathbf{x}^\star)_1 = \min_{|\mathcal{T}| \leq s}\|\mathbf{x}^\star - \mathbf{x}^\star_\mathcal{T}\|_1 is the best ss-term approximation in β„“1\ell_1, and C0,C1C_0, C_1 depend only on Ξ΄2s\delta_{2s}.

Two error sources appear: noise, contributing C0Ξ·C_0 \eta (linear in the noise level), and model mismatch, contributing C1Οƒs/sC_1 \sigma_s/\sqrt{s} (zero when x⋆\mathbf{x}^\star is exactly ss-sparse). The bound is deterministic β€” a single RIP check implies recovery guarantees for every signal.

, ,

Theorem: LASSO Oracle Inequality

Suppose A\mathbf{A} has columns with βˆ₯Ajβˆ₯22≀1\|\mathbf{A}_{j}\|_2^2 \leq 1 and satisfies RIP of order 2s2s with Ξ΄2s\delta_{2s} small enough (e.g., Ξ΄2s<0.1\delta_{2s} < 0.1). Let y=Ax⋆+w\mathbf{y} = \mathbf{A}\mathbf{x}^\star + \mathbf{w} with w∼N(0,Οƒ2I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}). Choose Ξ»=cΟƒ2log⁑N\lambda = c \sigma \sqrt{2 \log N} for a universal constant c>2c > 2. Then the LASSO estimator satisfies, with probability at least 1βˆ’2N1βˆ’c2/81 - 2 N^{1 - c^2/8}, βˆ₯z^LASSOβˆ’x⋆βˆ₯22≀Cβ‹…sβ‹…Οƒ2log⁑N1=O ⁣(slog⁑N1Οƒ2).\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\|_2^2 \leq C \cdot s \cdot \frac{\sigma^2 \log N}{1} = O\!\left(\frac{s \log N}{1} \sigma^2\right).

An oracle that knew the support S\mathcal{S} of size ss in advance would solve least squares on ss columns, incurring MSE β‰ˆsΟƒ2\approx s \sigma^2. The LASSO pays only an extra log⁑N\log N factor for not knowing S\mathcal{S} β€” the price of searching over (Ns)\binom{N}{s} supports, paid logarithmically.

, ,

Example: How Many Measurements Suffice?

We want to recover an s=10s = 10 sparse vector in R1024\mathbb{R}^{1024} from Gaussian measurements with failure probability at most 10βˆ’610^{-6}. Assuming the universal constant in the RIP theorem is c2=30c_2 = 30 and we want Ξ΄2s≀0.4\delta_{2s} \leq 0.4, estimate the required MM.

Phase Transition of β„“1\ell_1 Recovery

For each (M/N,s/M)(M/N, s/M), run multiple trials of noiseless BP on Gaussian A\mathbf{A} and measure empirical success probability. The 2D heatmap reveals the Donoho-Tanner phase transition: a sharp boundary separating success from failure. Below the curve, β„“1\ell_1 recovery works; above, it fails.

Parameters
80
8
8
8
2

Stable Recovery: β„“2\ell_2 Error vs Noise Level

Sweep the noise standard deviation Οƒ\sigma and plot the empirical LASSO reconstruction error βˆ₯z^LASSOβˆ’x⋆βˆ₯2\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\|_2. Overlay the predicted O(slog⁑N)ΟƒO(\sqrt{s \log N})\sigma scaling. For comparison, plot the oracle least-squares error (which knows the support): O(s)ΟƒO(\sqrt{s})\sigma. The logarithmic gap is the LASSO's price for support-blindness.

Parameters
150
80
6
0.01
1
10
4
⚠️Engineering Note

Setting the LASSO Parameter in Practice

The theory prescribes λ≍σ2log⁑N\lambda \asymp \sigma\sqrt{2\log N}, but Οƒ\sigma is typically unknown. Practical recipes:

  1. Cross-validation: hold out a measurement subset and tune Ξ»\lambda to minimize held-out residual. Safe and nonparametric, but more expensive.
  2. SURE / Stein's unbiased risk estimate: for Gaussian noise with known Οƒ2\sigma^2, SURE provides a closed-form unbiased estimate of βˆ₯z^LASSOβˆ’x⋆βˆ₯22\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\|_2^2 that can be minimized over Ξ»\lambda.
  3. BIC / AIC information criteria: select Ξ»\lambda minimizing βˆ₯yβˆ’A z^LASSOβˆ₯22+ΞΊβˆ₯z^LASSOβˆ₯0log⁑N\|\mathbf{y} - \mathbf{A}\,\hat{\mathbf{z}}_{\text{LASSO}}\|_2^2 + \kappa \|\hat{\mathbf{z}}_{\text{LASSO}}\|_0 \log N.
  4. Square-root LASSO (Belloni-Chernozhukov-Wang, 2011): replaces the quadratic loss with its square root, making the optimal Ξ»\lambda independent of Οƒ\sigma.

Rule of thumb: start with Ξ»=0.1βˆ₯ATyβˆ₯∞\lambda = 0.1 \|\mathbf{A}^{T}\mathbf{y}\|_\infty and sweep by factors of 2 on either side, choosing the knee of the held-out error curve.

Practical Constraints
  • β€’

    The optimal Ξ»\lambda scales as Οƒlog⁑N\sigma\sqrt{\log N}.

  • β€’

    Too small Ξ»\lambda: overfitting, dense solution.

  • β€’

    Too large Ξ»\lambda: shrinkage bias, missed supports.

Recovery Regimes

RegimeSignal modelNoiseProgramGuarantee
Exactss-sparsew=0\mathbf{w} = \mathbf{0}BPx^=x⋆\hat{\mathbf{x}} = \mathbf{x}^\star
StableCompressibleβˆ₯wβˆ₯2≀η\|\mathbf{w}\|_2 \leq \etaBPDNβˆ₯x^βˆ’x⋆βˆ₯2≀C0Ξ·+C1Οƒs/s\|\hat{\mathbf{x}} - \mathbf{x}^\star\|_2 \leq C_0 \eta + C_1 \sigma_s/\sqrt{s}
Oracle (LASSO)ss-sparsew∼N(0,Οƒ2I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})LASSO, Ξ»βˆΟƒlog⁑N\lambda \propto \sigma\sqrt{\log N}βˆ₯z^LASSOβˆ’x⋆βˆ₯22≀CsΟƒ2log⁑N\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\|_2^2 \leq C s \sigma^2 \log N
πŸŽ“CommIT Contribution(2024)

RIS-Assisted Compressed Sensing for Near-Field Localization

G. Caire, CommIT group β€” IEEE Trans. Signal Processing (internal CommIT working paper)

The CommIT group has investigated how reconfigurable intelligent surfaces (RIS) can be used to synthesize sensing matrices tailored to sparse near-field targets. By programming the RIS phases to form random or deterministic patterns, the effective A\mathbf{A} can be made to satisfy RIP of order ss with M=O(slog⁑(N/s))M = O(s \log(N/s)) pilot snapshots β€” the standard CS rate β€” despite the physical array having only Nrβ‰ͺNN_r \ll N antennas. The near-field regime requires a curved-wavefront dictionary, for which coherence is substantially higher than in far-field and β„“1\ell_1-only recovery fails; CommIT's work combines β„“1\ell_1 with atomic-norm regularization to recover target positions directly.

riscompressed-sensingnear-fieldlocalization

Common Mistake: Beware Non-Random A\mathbf{A}

Mistake:

Applying the Gaussian RIP theorem to a hand-designed (deterministic) measurement matrix without verifying that it satisfies RIP.

Correction:

The M=O(slog⁑(N/s))M = O(s \log(N/s)) sample-complexity theorem is specific to the random ensemble. Deterministic matrices may or may not satisfy RIP, and certifying them is NP-hard. Deterministic constructions (e.g., partial DFT, algebraic codes over finite fields) typically incur a log⁑cN\log^c N penalty in sample complexity or only satisfy weaker "statistical RIP" guarantees.

Common Mistake: Ξ΄2s<1/2\delta_{2s} < 1/2 is Not Enough for Classical β„“1\ell_1 Recovery

Mistake:

Citing δ2s<1/2\delta_{2s} < 1/2 as the sufficient condition for Basis Pursuit recovery. This is weaker than the sharper Candès bound.

Correction:

The classical CandΓ¨s (2008) bound is Ξ΄2s<2βˆ’1β‰ˆ0.414\delta_{2s} < \sqrt{2} - 1 \approx 0.414. Cai, Wang, and Xu (2010) later refined this to Ξ΄2s<2/(3+7/4)β‰ˆ0.453\delta_{2s} < 2/(3 + \sqrt{7/4}) \approx 0.453, and in 2013 to Ξ΄2s<1/2β‰ˆ0.707\delta_{2s} < 1/\sqrt{2} \approx 0.707. The 2βˆ’1\sqrt{2} - 1 threshold is historically canonical and sharpest via the original block-decomposition proof.

Quick Check

The LASSO oracle inequality states that βˆ₯z^LASSOβˆ’x⋆βˆ₯22≲sΟƒ2log⁑N\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\|_2^2 \lesssim s \sigma^2 \log N. Compared to an oracle that knows the support of size ss, how much extra error does LASSO pay?

A factor log⁑N\log N.

A factor NN.

No extra error β€” LASSO achieves the oracle rate exactly.

A factor ss.

Quick Check

In the stable recovery theorem, what does the term Οƒs(x⋆)1/s\sigma_s(\mathbf{x}^\star)_1/\sqrt{s} capture?

The noise standard deviation.

The model-mismatch / compressibility error of x⋆\mathbf{x}^\star.

The condition number of A\mathbf{A}.

The RIP constant Ξ΄2s\delta_{2s}.

Historical Note: The Oracle Inequality Program

1997-2009

The term "oracle inequality" was coined by Donoho and Johnstone (1994) in the context of wavelet thresholding: an estimator is "near-oracle" if it matches the error of a hypothetical oracle that knows the best model parameter. Candès and Tao (2007) and Bickel, Ritov, Tsybakov (2009) adapted the framework to LASSO in the high-dimensional setting, establishing the slog⁑Ns \log N rate as the fundamental limit for sparse regression. This line of work launched the subfield of high-dimensional statistics, which now includes group LASSO, nuclear-norm minimization, and matrix completion.

,

Key Takeaway

RIP unifies compressed sensing: Ξ΄2s<2βˆ’1\delta_{2s} < \sqrt{2} - 1 implies exact β„“1\ell_1 recovery (noiseless), stable recovery under bounded noise, and near-oracle performance of LASSO under Gaussian noise. The logarithmic price log⁑N\log N for support-blindness is the hallmark result of high-dimensional statistics, and it appears across group LASSO, matrix completion, and graphical models. All of this rests on a single RIP check.

Best ss-term approximation error

Οƒs(x)p=inf⁑{βˆ₯xβˆ’zβˆ₯p:βˆ₯zβˆ₯0≀s}\sigma_s(\mathbf{x})_p = \inf\{\|\mathbf{x} - \mathbf{z}\|_p : \|\mathbf{z}\|_0 \leq s\}. It is zero iff x\mathbf{x} is ss-sparse and decays in ss for compressible vectors.

Related: Compressible Vector, Sparse Recovery Problem

Donoho-Tanner phase transition

The sharp boundary in the (M/N,s/M)(M/N, s/M) plane separating success and failure of β„“1\ell_1 recovery for large Gaussian random matrices. The curve is explicit and matches empirical observations to within 1% at moderate NN.

Related: Restricted Isometry Property (RIP), L1 Minimization