Ferkans — Interactive Telecom Tutor

From RIP to Recovery

Having established that random sensing matrices satisfy RIP with $M = O(s \log(N/s))$ measurements, we now complete the story: RIP implies that $\ell_1$ minimization recovers $s$ -sparse signals exactly in the noiseless regime, and stably in the noisy regime. The punchline is a deterministic, instance-independent guarantee: one checks RIP once, and every $s$ -sparse signal is recovered from $\mathbf{y} = \mathbf{A}\mathbf{x}^\star + \mathbf{w}$ . The same guarantee extends to compressible (not exactly sparse) vectors via the best $s$ -term approximation error $\sigma_s(\mathbf{x}^\star)_1$ , yielding a beautiful oracle inequality: the LASSO behaves as if an oracle had told us the true support in advance, up to logarithmic factors.

Theorem: Exact Recovery via $\ell_1$ Minimization (Noiseless)

Let $\mathbf{A} \in \mathbb{R}^{M \times N}$ satisfy the RIP of order $2s$ with constant $\delta_{2s} < \sqrt{2} - 1 \approx 0.414$ . For every $s$ -sparse $\mathbf{x}^\star$ and measurements $\mathbf{y} = \mathbf{A}\mathbf{x}^\star$ , the Basis Pursuit solution $\hat{\mathbf{x}} = \arg\min \|\mathbf{x}\|_1 \text{ s.t. } \mathbf{A}\mathbf{x} = \mathbf{y}$ is unique and equals $\mathbf{x}^\star$ .

If BP returned some other vector $\hat{\mathbf{x}} \neq \mathbf{x}^\star$ , the difference $\mathbf{h} = \hat{\mathbf{x}} - \mathbf{x}^\star$ would lie in $\ker(\mathbf{A})$ . RIP bounds the $\ell_2$ norm of $\mathbf{h}$ restricted to any small support, while $\ell_1$ optimality of $\hat{\mathbf{x}}$ bounds the energy of $\mathbf{h}$ off the support of $\mathbf{x}^\star$ . Combining these forces $\mathbf{h} = \mathbf{0}$ .

Proof

Null space condition from $\ell_1$ optimality

Let $\mathcal{S} = \mathrm{supp}(\mathbf{x}^\star)$ with $|\mathcal{S}| = s$ . Since $\hat{\mathbf{x}}$ is a BP optimum and $\mathbf{x}^\star$ is feasible, $\|\hat{\mathbf{x}}\|_1 \leq \|\mathbf{x}^\star\|_1$ . Writing $\hat{\mathbf{x}} = \mathbf{x}^\star + \mathbf{h}$ : $\|\mathbf{x}^\star + \mathbf{h}\|_1 = \|\mathbf{x}^\star_\mathcal{S} + \mathbf{h}_\mathcal{S}\|_1 + \|\mathbf{h}_{\mathcal{S}^c}\|_1 \geq \|\mathbf{x}^\star\|_1 - \|\mathbf{h}_\mathcal{S}\|_1 + \|\mathbf{h}_{\mathcal{S}^c}\|_1.$ Combined with $\|\mathbf{x}^\star + \mathbf{h}\|_1 \leq \|\mathbf{x}^\star\|_1$ , this gives the null space property: $\|\mathbf{h}_{\mathcal{S}^c}\|_1 \leq \|\mathbf{h}_\mathcal{S}\|_1.$

Support decomposition by magnitude

Sort $\mathbf{h}_{\mathcal{S}^c}$ by magnitude and split into blocks $\mathcal{T}_1, \mathcal{T}_2, \ldots$ of $s$ coordinates each (largest first). By construction, for $k \geq 2$ , each entry of $\mathbf{h}_{\mathcal{T}_k}$ is bounded by the average of $\mathbf{h}_{\mathcal{T}_{k-1}}$ : $\|\mathbf{h}_{\mathcal{T}_k}\|_\infty \leq \|\mathbf{h}_{\mathcal{T}_{k-1}}\|_1 / s$ . Therefore $\|\mathbf{h}_{\mathcal{T}_k}\|_2 \leq \|\mathbf{h}_{\mathcal{T}_{k-1}}\|_1 / \sqrt{s}$ .

Sum the tail blocks

$\sum_{k \geq 2} \|\mathbf{h}_{\mathcal{T}_k}\|_2 \leq \frac{1}{\sqrt{s}} \sum_{k \geq 1} \|\mathbf{h}_{\mathcal{T}_k}\|_1 = \frac{\|\mathbf{h}_{\mathcal{S}^c}\|_1}{\sqrt{s}} \leq \frac{\|\mathbf{h}_\mathcal{S}\|_1}{\sqrt{s}} \leq \|\mathbf{h}_\mathcal{S}\|_2,$ $where the last step uses Cauchy-Schwarz with$ |\mathcal{S}| = s$.

Apply RIP of order $2s$

Let $\mathcal{T}_0 = \mathcal{S} \cup \mathcal{T}_1$ , $|\mathcal{T}_0| \leq 2s$ . Since $\mathbf{h} \in \ker(\mathbf{A})$ , $\mathbf{A}\mathbf{h}_{\mathcal{T}_0} = -\sum_{k \geq 2} \mathbf{A}\mathbf{h}_{\mathcal{T}_k}$ . Using RIP of order $2s$ on the left and order $s$ on the right (each $\mathbf{h}_{\mathcal{T}_k}$ is $s$ -sparse): $(1-\delta_{2s})\|\mathbf{h}_{\mathcal{T}_0}\|_2^2 \leq \|\mathbf{A}\mathbf{h}_{\mathcal{T}_0}\|_2^2 \leq \sqrt{1+\delta_{2s}} \|\mathbf{h}_{\mathcal{T}_0}\|_2 \sum_{k \geq 2} \sqrt{1+\delta_s}\|\mathbf{h}_{\mathcal{T}_k}\|_2.$ Dividing and using the tail bound, $\|\mathbf{h}_{\mathcal{T}_0}\|_2 \leq \frac{\sqrt{2} \delta_{2s}}{1 - \delta_{2s}} \|\mathbf{h}_\mathcal{S}\|_2 \leq \frac{\sqrt{2} \delta_{2s}}{1 - \delta_{2s}} \|\mathbf{h}_{\mathcal{T}_0}\|_2.$

Conclude $\mathbf{h} = \mathbf{0}$

The inequality $\|\mathbf{h}_{\mathcal{T}_0}\|_2 \leq C \|\mathbf{h}_{\mathcal{T}_0}\|_2$ with $C = \sqrt{2}\delta_{2s}/(1-\delta_{2s}) < 1$ (guaranteed by $\delta_{2s} < \sqrt{2} - 1$ ) forces $\mathbf{h}_{\mathcal{T}_0} = \mathbf{0}$ . The tail bound then gives $\sum_{k \geq 2}\|\mathbf{h}_{\mathcal{T}_k}\|_2 = 0$ , so $\mathbf{h} = \mathbf{0}$ and $\hat{\mathbf{x}} = \mathbf{x}^\star$ .

,

Theorem: Stable Recovery under Bounded Noise

Let $\mathbf{A}$ satisfy RIP with $\delta_{2s} < \sqrt{2} - 1$ . For any $\mathbf{x}^\star \in \mathbb{R}^N$ (not necessarily sparse) and $\mathbf{y} = \mathbf{A}\mathbf{x}^\star + \mathbf{w}$ with $\|\mathbf{w}\|_2 \leq \eta$ , the BPDN solution $\hat{\mathbf{x}}$ (minimizing $\|\mathbf{x}\|_1$ subject to $\|\mathbf{A}\mathbf{x} - \mathbf{y}\|_2 \leq \eta$ ) satisfies $\|\hat{\mathbf{x}} - \mathbf{x}^\star\|_2 \leq C_0 \eta + C_1 \frac{\sigma_s(\mathbf{x}^\star)_1}{\sqrt{s}},$ where $\sigma_s(\mathbf{x}^\star)_1 = \min_{|\mathcal{T}| \leq s}\|\mathbf{x}^\star - \mathbf{x}^\star_\mathcal{T}\|_1$ is the best $s$ -term approximation in $\ell_1$ , and $C_0, C_1$ depend only on $\delta_{2s}$ .

Two error sources appear: noise, contributing $C_0 \eta$ (linear in the noise level), and model mismatch, contributing $C_1 \sigma_s/\sqrt{s}$ (zero when $\mathbf{x}^\star$ is exactly $s$ -sparse). The bound is deterministic — a single RIP check implies recovery guarantees for every signal.

Proof

Modified null-space argument

Since both $\hat{\mathbf{x}}$ and $\mathbf{x}^\star$ are feasible for BPDN, the residual $\mathbf{h} = \hat{\mathbf{x}} - \mathbf{x}^\star$ satisfies $\|\mathbf{A}\mathbf{h}\|_2 \leq 2\eta$ (triangle inequality). $\ell_1$ optimality gives $\|\mathbf{h}_{\mathcal{S}^c}\|_1 \leq \|\mathbf{h}_\mathcal{S}\|_1 + 2\sigma_s(\mathbf{x}^\star)_1,$ where $\mathcal{S}$ is the support of the best $s$ -term approximation.

Repeat the block-sum argument

Following the proof of Theorem 1 with the amended null-space condition, the tail bound becomes $\sum_{k \geq 2} \|\mathbf{h}_{\mathcal{T}_k}\|_2 \leq \|\mathbf{h}_\mathcal{S}\|_2 + \frac{2\sigma_s(\mathbf{x}^\star)_1}{\sqrt{s}}.$

Apply RIP with a nonzero residual

The RIP step now yields $\|\mathbf{h}_{\mathcal{T}_0}\|_2 \leq \alpha \|\mathbf{h}_{\mathcal{T}_0}\|_2 + \beta \eta + \gamma \frac{\sigma_s(\mathbf{x}^\star)_1}{\sqrt{s}}$ with $\alpha = \sqrt{2}\delta_{2s}/(1 - \delta_{2s}) < 1$ . Solving for $\|\mathbf{h}_{\mathcal{T}_0}\|_2$ and adding the tail bound gives $\|\mathbf{h}\|_2 \leq C_0 \eta + C_1 \sigma_s/\sqrt{s}$ as claimed.

, ,

Theorem: LASSO Oracle Inequality

Suppose $\mathbf{A}$ has columns with $\|\mathbf{A}_{j}\|_2^2 \leq 1$ and satisfies RIP of order $2s$ with $\delta_{2s}$ small enough (e.g., $\delta_{2s} < 0.1$ ). Let $\mathbf{y} = \mathbf{A}\mathbf{x}^\star + \mathbf{w}$ with $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ . Choose $\lambda = c \sigma \sqrt{2 \log N}$ for a universal constant $c > 2$ . Then the LASSO estimator satisfies, with probability at least $1 - 2 N^{1 - c^2/8}$ , $\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\|_2^2 \leq C \cdot s \cdot \frac{\sigma^2 \log N}{1} = O\!\left(\frac{s \log N}{1} \sigma^2\right).$

An oracle that knew the support $\mathcal{S}$ of size $s$ in advance would solve least squares on $s$ columns, incurring MSE $\approx s \sigma^2$ . The LASSO pays only an extra $\log N$ factor for not knowing $\mathcal{S}$ — the price of searching over $\binom{N}{s}$ supports, paid logarithmically.

Proof

Choose $\ntn{reg}$ to dominate noise correlations

With $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ , each inner product $\mathbf{A}_{j}^{T} \mathbf{w} \sim \mathcal{N}(0, \sigma^2)$ . By a Gaussian tail bound and union over $j$ , $\Pr[\|\mathbf{A}^{T}\mathbf{w}\|_\infty > \sigma \sqrt{2 \log N} \cdot t] \leq 2 N^{1 - t^2}$ . Choose $\lambda > \|\mathbf{A}^{T}\mathbf{w}\|_\infty$ with high probability.

Apply the LASSO cone condition

On the event $\{\|\mathbf{A}^{T}\mathbf{w}\|_\infty \leq \lambda/2\}$ , the residual $\mathbf{h} = \hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star$ satisfies the cone inequality $\|\mathbf{h}_{\mathcal{S}^c}\|_1 \leq 3\|\mathbf{h}_\mathcal{S}\|_1$ (derived from LASSO optimality and $\ell_1$ comparison).

Use restricted eigenvalue / RIP

The cone condition plus RIP implies $\|\mathbf{A}\mathbf{h}\|_2^2 \gtrsim \|\mathbf{h}\|_2^2$ (restricted strong convexity). Combining with the first-order condition $\|\mathbf{A}\mathbf{h}\|_2^2 \leq 3 \lambda \|\mathbf{h}_\mathcal{S}\|_1 \leq 3\lambda\sqrt{s}\|\mathbf{h}\|_2$ yields $\|\mathbf{h}\|_2 \leq C \sqrt{s} \lambda = C' \sqrt{s \sigma^2 \log N}.$ Squaring gives the stated bound.

, ,

Example: How Many Measurements Suffice?

We want to recover an $s = 10$ sparse vector in $\mathbb{R}^{1024}$ from Gaussian measurements with failure probability at most $10^{-6}$ . Assuming the universal constant in the RIP theorem is $c_2 = 30$ and we want $\delta_{2s} \leq 0.4$ , estimate the required $M$ .

Solution

Plug into $M \geq c_2 \delta^{-2}(2s) \log(N/(2s))$

With $\delta = 0.4$ , $2s = 20$ , $N/(2s) = 1024/20 \approx 51.2$ : $M \geq 30 \cdot 0.4^{-2} \cdot 20 \cdot \log(51.2) \approx 30 \cdot 6.25 \cdot 20 \cdot 3.94 \approx 14\,800.$

Interpret

The theoretical bound is much larger than $N = 1024$ , confirming that RIP constants in the theorem are pessimistic. In practice, empirical phase transitions show that $M \approx 5s$ = 50 suffices at this problem size. The point is that the theorem proves a polynomial scaling in $s$ and $\log N$ , but the constants are not sharp.

Takeaway

Analytical guarantees give the right scaling ( $s \log(N/s)$ ), not the right constants. For engineering design, use the phase-transition plots (next interactive visualization) calibrated on the actual sensing ensemble.

Phase Transition of $\ell_1$ Recovery

For each $(M/N, s/M)$ , run multiple trials of noiseless BP on Gaussian $\mathbf{A}$ and measure empirical success probability. The 2D heatmap reveals the Donoho-Tanner phase transition: a sharp boundary separating success from failure. Below the curve, $\ell_1$ recovery works; above, it fails.

Parameters

N

80

M/N

grid size8

s/M

grid size8

trials per point8

seed2

Stable Recovery: $\ell_2$ Error vs Noise Level

Sweep the noise standard deviation $\sigma$ and plot the empirical LASSO reconstruction error $\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\|_2$ . Overlay the predicted $O(\sqrt{s \log N})\sigma$ scaling. For comparison, plot the oracle least-squares error (which knows the support): $O(\sqrt{s})\sigma$ . The logarithmic gap is the LASSO's price for support-blindness.

Parameters

N

150

M

80

sparsity

s

6

min

\sigma

0.01

max

\sigma

1

trials per point10

seed4

⚠️Engineering Note

Setting the LASSO Parameter in Practice

The theory prescribes $\lambda \asymp \sigma\sqrt{2\log N}$ , but $\sigma$ is typically unknown. Practical recipes:

Cross-validation: hold out a measurement subset and tune $\lambda$ to minimize held-out residual. Safe and nonparametric, but more expensive.
SURE / Stein's unbiased risk estimate: for Gaussian noise with known $\sigma^2$ , SURE provides a closed-form unbiased estimate of $\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\|_2^2$ that can be minimized over $\lambda$ .
BIC / AIC information criteria: select $\lambda$ minimizing $\|\mathbf{y} - \mathbf{A}\,\hat{\mathbf{z}}_{\text{LASSO}}\|_2^2 + \kappa \|\hat{\mathbf{z}}_{\text{LASSO}}\|_0 \log N$ .
Square-root LASSO (Belloni-Chernozhukov-Wang, 2011): replaces the quadratic loss with its square root, making the optimal $\lambda$ independent of $\sigma$ .

Rule of thumb: start with $\lambda = 0.1 \|\mathbf{A}^{T}\mathbf{y}\|_\infty$ and sweep by factors of 2 on either side, choosing the knee of the held-out error curve.

Practical Constraints

•
The optimal $\lambda$ scales as $\sigma\sqrt{\log N}$ .
•
Too small $\lambda$ : overfitting, dense solution.
•
Too large $\lambda$ : shrinkage bias, missed supports.

Recovery Regimes

Regime	Signal model	Noise	Program	Guarantee
Exact	$s$ -sparse	$\mathbf{w} = \mathbf{0}$	BP	$\hat{\mathbf{x}} = \mathbf{x}^\star$
Stable	Compressible	$\\|\mathbf{w}\\|_2 \leq \eta$	BPDN	$\\|\hat{\mathbf{x}} - \mathbf{x}^\star\\|_2 \leq C_0 \eta + C_1 \sigma_s/\sqrt{s}$
Oracle (LASSO)	$s$ -sparse	$\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$	LASSO, $\lambda \propto \sigma\sqrt{\log N}$	$\\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\\|_2^2 \leq C s \sigma^2 \log N$

🎓CommIT Contribution(2024)

RIS-Assisted Compressed Sensing for Near-Field Localization

G. Caire, CommIT group — IEEE Trans. Signal Processing (internal CommIT working paper)

The CommIT group has investigated how reconfigurable intelligent surfaces (RIS) can be used to synthesize sensing matrices tailored to sparse near-field targets. By programming the RIS phases to form random or deterministic patterns, the effective $\mathbf{A}$ can be made to satisfy RIP of order $s$ with $M = O(s \log(N/s))$ pilot snapshots — the standard CS rate — despite the physical array having only $N_r \ll N$ antennas. The near-field regime requires a curved-wavefront dictionary, for which coherence is substantially higher than in far-field and $\ell_1$ -only recovery fails; CommIT's work combines $\ell_1$ with atomic-norm regularization to recover target positions directly.

riscompressed-sensingnear-fieldlocalization

Common Mistake: Beware Non-Random $\mathbf{A}$

Mistake:

Applying the Gaussian RIP theorem to a hand-designed (deterministic) measurement matrix without verifying that it satisfies RIP.

Correction:

The $M = O(s \log(N/s))$ sample-complexity theorem is specific to the random ensemble. Deterministic matrices may or may not satisfy RIP, and certifying them is NP-hard. Deterministic constructions (e.g., partial DFT, algebraic codes over finite fields) typically incur a $\log^c N$ penalty in sample complexity or only satisfy weaker "statistical RIP" guarantees.

Common Mistake: $\delta_{2s} < 1/2$ is Not Enough for Classical $\ell_1$ Recovery

Mistake:

Citing $\delta_{2s} < 1/2$ as the sufficient condition for Basis Pursuit recovery. This is weaker than the sharper Candès bound.

Correction:

The classical Candès (2008) bound is $\delta_{2s} < \sqrt{2} - 1 \approx 0.414$ . Cai, Wang, and Xu (2010) later refined this to $\delta_{2s} < 2/(3 + \sqrt{7/4}) \approx 0.453$ , and in 2013 to $\delta_{2s} < 1/\sqrt{2} \approx 0.707$ . The $\sqrt{2} - 1$ threshold is historically canonical and sharpest via the original block-decomposition proof.

Quick Check

The LASSO oracle inequality states that $\|\hat{\mathbf{z}}_{\text{LASSO}} - \mathbf{x}^\star\|_2^2 \lesssim s \sigma^2 \log N$ . Compared to an oracle that knows the support of size $s$ , how much extra error does LASSO pay?

A factor $\log N$ .

A factor $N$ .

No extra error — LASSO achieves the oracle rate exactly.

A factor $s$ .

Correction:

A factor

\log N

.

The oracle's MSE scales as $s \sigma^2$ ; LASSO pays $s \sigma^2 \log N$ . The $\log N$ factor is the price of not knowing the support.

Quick Check

In the stable recovery theorem, what does the term $\sigma_s(\mathbf{x}^\star)_1/\sqrt{s}$ capture?

The noise standard deviation.

The model-mismatch / compressibility error of $\mathbf{x}^\star$ .

The condition number of $\mathbf{A}$ .

The RIP constant $\delta_{2s}$ .

Correction:

The model-mismatch / compressibility error of

\mathbf{x}^\star

.

$\sigma_s(\mathbf{x}^\star)_1$ is the best $s$ -term $\ell_1$ approximation error; it is zero iff $\mathbf{x}^\star$ is $s$ -sparse.

Historical Note: The Oracle Inequality Program

1997-2009

The term "oracle inequality" was coined by Donoho and Johnstone (1994) in the context of wavelet thresholding: an estimator is "near-oracle" if it matches the error of a hypothetical oracle that knows the best model parameter. Candès and Tao (2007) and Bickel, Ritov, Tsybakov (2009) adapted the framework to LASSO in the high-dimensional setting, establishing the $s \log N$ rate as the fundamental limit for sparse regression. This line of work launched the subfield of high-dimensional statistics, which now includes group LASSO, nuclear-norm minimization, and matrix completion.

,

Key Takeaway

RIP unifies compressed sensing: $\delta_{2s} < \sqrt{2} - 1$ implies exact $\ell_1$ recovery (noiseless), stable recovery under bounded noise, and near-oracle performance of LASSO under Gaussian noise. The logarithmic price $\log N$ for support-blindness is the hallmark result of high-dimensional statistics, and it appears across group LASSO, matrix completion, and graphical models. All of this rests on a single RIP check.

Best $s$ -term approximation error

$\sigma_s(\mathbf{x})_p = \inf\{\|\mathbf{x} - \mathbf{z}\|_p : \|\mathbf{z}\|_0 \leq s\}$ . It is zero iff $\mathbf{x}$ is $s$ -sparse and decays in $s$ for compressible vectors.

Donoho-Tanner phase transition

The sharp boundary in the $(M/N, s/M)$ plane separating success and failure of $\ell_1$ recovery for large Gaussian random matrices. The curve is explicit and matches empirical observations to within 1% at moderate $N$ .

Recovery Guarantees

From RIP to Recovery

Theorem: Exact Recovery via ℓ1\ell_1ℓ1​ Minimization (Noiseless)

Null space condition from $\ell_1$ optimality

Support decomposition by magnitude

Sum the tail blocks

Apply RIP of order $2s$

Conclude $\mathbf{h} = \mathbf{0}$

Theorem: Stable Recovery under Bounded Noise

Modified null-space argument

Repeat the block-sum argument

Apply RIP with a nonzero residual

Theorem: LASSO Oracle Inequality

Choose $\ntn{reg}$ to dominate noise correlations

Apply the LASSO cone condition

Use restricted eigenvalue / RIP

Example: How Many Measurements Suffice?

Plug into $M \geq c_2 \delta^{-2}(2s) \log(N/(2s))$

Interpret

Takeaway

Phase Transition of ℓ1\ell_1ℓ1​ Recovery

Parameters

Stable Recovery: ℓ2\ell_2ℓ2​ Error vs Noise Level

Parameters

Setting the LASSO Parameter in Practice

Recovery Regimes

RIS-Assisted Compressed Sensing for Near-Field Localization

Common Mistake: Beware Non-Random A\mathbf{A}A

Common Mistake: δ2s<1/2\delta_{2s} < 1/2δ2s​<1/2 is Not Enough for Classical ℓ1\ell_1ℓ1​ Recovery

Quick Check

Quick Check

Historical Note: The Oracle Inequality Program

Key Takeaway

Best sss-term approximation error

Donoho-Tanner phase transition

Theorem: Exact Recovery via $\ell_1$ Minimization (Noiseless)

Phase Transition of $\ell_1$ Recovery

Stable Recovery: $\ell_2$ Error vs Noise Level

Common Mistake: Beware Non-Random $\mathbf{A}$

Common Mistake: $\delta_{2s} < 1/2$ is Not Enough for Classical $\ell_1$ Recovery

Best $s$ -term approximation error