Ferkans — Interactive Telecom Tutor

What Makes $\mathbf{A}$ Good for Sparse Recovery?

We have argued that $\ell_1$ minimization is a convex surrogate for $\ell_0$ . But when does the surrogate succeed? A sensing matrix must act as an approximate isometry — not on all of $\mathbb{R}^N$ (impossible when $M < N$ ) but on the subset of sparse vectors we care about. Candès and Tao captured this idea in the Restricted Isometry Property (RIP): $\mathbf{A}$ should approximately preserve the $\ell_2$ norm of every $s$ -sparse vector. The RIP is the central quantitative condition of compressed sensing: it implies exact recovery in the noiseless case, stable recovery under noise, and (via random-matrix concentration) holds with overwhelming probability for Gaussian, Bernoulli, and partial-Fourier designs with $M = O(s \log(N/s))$ measurements.

Definition:
Restricted Isometry Property (RIP)

A matrix $\mathbf{A} \in \mathbb{R}^{M \times N}$ satisfies the restricted isometry property of order $s$ with constant $\delta_s \in [0, 1)$ if $(1 - \delta_s) \|\mathbf{x}\|_2^2 \leq \|\mathbf{A}\mathbf{x}\|_2^2 \leq (1 + \delta_s) \|\mathbf{x}\|_2^2$ holds for every $\mathbf{x} \in \Sigma_s$ (every $s$ -sparse vector). The restricted isometry constant $\delta_s$ is the smallest such number.

Equivalently, every $M \times s$ submatrix $\mathbf{A}_\mathcal{S}$ (columns indexed by $\mathcal{S}$ , $|\mathcal{S}| = s$ ) has all singular values in $[\sqrt{1-\delta_s}, \sqrt{1+\delta_s}]$ . Thus the RIP is a uniform control on the conditioning of every $s$ -subset of columns.

Definition:
Mutual Coherence

The mutual coherence of $\mathbf{A}$ with $\ell_2$ -normalized columns $\mathbf{a}_1, \ldots, \mathbf{a}_N$ ( $\|\mathbf{a}_i\|_2 = 1$ ) is $\mu(\mathbf{A}) = \max_{1 \leq i \neq j \leq N} |\mathbf{a}_i^T \mathbf{a}_j| \in [0, 1].$ Coherence measures the worst-case linear dependence between pairs of columns.

Coherence is easy to compute (quadratic in $N$ ), unlike RIP constants. It lower-bounds the RIP: $\delta_2 \leq \mu$ . But the coherence-based guarantees are typically much weaker than RIP-based ones — coherence captures only pairwise correlations.

Theorem: Welch Bound

For any $\mathbf{A} \in \mathbb{R}^{M \times N}$ with unit-norm columns and $N > M$ , the mutual coherence satisfies $\mu(\mathbf{A}) \geq \sqrt{\frac{N - M}{M(N - 1)}}.$ Equality holds if and only if $\mathbf{A}$ forms an equiangular tight frame (ETF).

We cannot pack too many unit vectors in $\mathbb{R}^M$ before some pair becomes close. The Welch bound is the sharpest version of this geometric fact. For $N \gg M$ , it gives $\mu \gtrsim 1/\sqrt{M}$ .

Proof

Form the Gram matrix

Let $\mathbf{G} = \mathbf{A}^{T} \mathbf{A} \in \mathbb{R}^{N \times N}$ . Since $\mathrm{rank}(\mathbf{G}) \leq M < N$ , $\mathbf{G}$ is rank-deficient. Its diagonal entries $G_{ii} = \|\mathbf{a}_i\|_2^2 = 1$ , so $\mathrm{tr}(\mathbf{G}) = N$ .

Use the Frobenius-trace inequality

The eigenvalues $\lambda_1, \ldots, \lambda_M$ of $\mathbf{G}$ (nonzero ones) satisfy $\sum_k \lambda_k = N$ and $\sum_k \lambda_k^2 = \|\mathbf{G}\|_F^2$ . By Cauchy-Schwarz, $\|\mathbf{G}\|_F^2 \geq \frac{(\sum_k \lambda_k)^2}{M} = \frac{N^2}{M}.$

Bound the off-diagonal entries

$\|\mathbf{G}\|_F^2 = N + \sum_{i \neq j} G_{ij}^2 \leq N + N(N-1)\mu^2$ . Combining, $N(N-1)\mu^2 \geq \frac{N^2}{M} - N = \frac{N(N - M)}{M},$ which gives $\mu^2 \geq (N-M)/[M(N-1)]$ . Equality requires $|G_{ij}| = \mu$ for all $i \neq j$ — an ETF.

,

Theorem: Coherence Implies RIP

If $\mathbf{A}$ has unit-norm columns and mutual coherence $\mu$ , then for every $s \leq 1 + 1/\mu$ , $\delta_s \leq (s - 1) \mu.$

Pairwise correlations at most $\mu$ let us bound the Gram matrix of any $s$ -subset of columns by Gershgorin's disk theorem, and hence bound the deviation of its eigenvalues from $1$ .

Proof

Gram matrix of an $s$-subset

Fix a support $\mathcal{S}$ with $|\mathcal{S}| = s$ . The Gram matrix $\mathbf{G}_\mathcal{S} = \mathbf{A}_\mathcal{S}^T \mathbf{A}_\mathcal{S}$ has diagonal entries $1$ and off-diagonal entries bounded in absolute value by $\mu$ .

Gershgorin's disk bound

By Gershgorin, every eigenvalue $\lambda$ of $\mathbf{G}_\mathcal{S}$ satisfies $|\lambda - 1| \leq \sum_{j \neq i}|G_{ij}| \leq (s - 1)\mu,$ so $\lambda \in [1 - (s-1)\mu, 1 + (s-1)\mu]$ . Hence the singular values of $\mathbf{A}_\mathcal{S}$ lie in $[\sqrt{1 - (s-1)\mu}, \sqrt{1 + (s-1)\mu}]$ .

Uniformity over supports

The bound holds for every $\mathcal{S}$ with $|\mathcal{S}| = s$ , so $\delta_s \leq (s-1)\mu$ .

Theorem: Gaussian Matrices Satisfy RIP with $M = O(s \log(N/s))$

Let $\mathbf{A} \in \mathbb{R}^{M \times N}$ have i.i.d. entries $\mathbf{A}_{ij} \sim \mathcal{N}(0, 1/M)$ . Fix $\delta \in (0, 1)$ . There exist universal constants $c_1, c_2 > 0$ such that, with probability at least $1 - 2 e^{-c_1 M}$ , $\mathbf{A}$ satisfies the RIP of order $s$ with constant $\delta_s \leq \delta$ whenever $M \geq c_2 \, \delta^{-2} \, s \log(N/s).$

Each $s$ -subset of columns behaves like an $M \times s$ Gaussian submatrix, which is well conditioned when $M \gtrsim s$ . A union bound over the $\binom{N}{s} \leq (eN/s)^s$ subsets pays a logarithmic price, giving the $s \log(N/s)$ scaling.

Proof

Concentration for a fixed support

Fix $\mathcal{S}$ with $|\mathcal{S}| = s$ . The submatrix $\mathbf{A}_\mathcal{S}$ has i.i.d. $\mathcal{N}(0, 1/M)$ entries. Standard results (e.g., Davidson–Szarek) give: with probability $\geq 1 - 2e^{-M t^2/2}$ , $\big|\|\mathbf{A}_\mathcal{S}\mathbf{z}\|_2^2 - \|\mathbf{z}\|_2^2\big| \leq t\|\mathbf{z}\|_2^2$ uniformly over $\mathbf{z} \in \mathbb{R}^s$ , for $t = \delta/2$ and $M \geq C\,\delta^{-2} s$ .

$\epsilon$-net on the sphere

An $\epsilon$ -net in the unit sphere of $\mathbb{R}^s$ has size at most $(3/\epsilon)^s$ . Union-bound the concentration over the net, then bootstrap from net to sphere by approximation. This gives uniform control over $\mathbf{z} \in \mathbb{S}^{s-1}$ for a single support.

Union bound over supports

There are $\binom{N}{s} \leq (eN/s)^s$ supports of size $s$ . Applying the union bound: $\Pr[\delta_s > \delta] \leq 2 \cdot (eN/s)^s \cdot (3/\epsilon)^s \cdot e^{-c M}.$ Taking $\epsilon = \delta/4$ and $M \geq c_2 \delta^{-2} s [\log(eN/s) + \log(12/\delta)]$ makes the exponent dominated by $-c_1 M$ . Absorbing $\log(1/\delta)$ into the constant yields the stated $M \gtrsim s \log(N/s)$ bound.

, ,

Bernoulli, Sub-Gaussian, and Partial Fourier

The same sample complexity $M = O(s \log(N/s))$ holds for any sub-Gaussian sensing matrix (e.g., $\mathbf{A}_{ij} \in \{\pm 1/\sqrt{M}\}$ independently). For structured ensembles like the partial DFT (random rows of the discrete Fourier matrix), the bound weakens slightly to $M = O(s \log^4 N)$ — the logarithmic factor is the price of deterministic structure. For imaging and MRI, partial Fourier is the relevant model.

⚠️Engineering Note

RIP Constants are Hard to Verify

Given an arbitrary $M \times N$ matrix, computing $\delta_s$ requires inspecting every $s$ -column submatrix — combinatorial effort. In fact, certifying RIP is NP-hard in general (Bandeira et al., 2013). In practice, we use random constructions for which RIP holds with high probability by design, or we replace RIP with the weaker (but computable) coherence bound $\delta_s \leq (s-1)\mu$ .

Implication for practice: When building a CS system, randomness is not an aesthetic choice — it is the engineering device that makes the sensing guarantee possible. A carefully hand-designed $\mathbf{A}$ cannot be certified without a random model.

Practical Constraints

•
Certifying RIP for a given matrix is NP-hard.
•
Random designs give RIP with exponentially small failure probability.
•
Coherence-based guarantees are polynomial-time but sub-optimal.

Empirical RIP Constants for Gaussian Sensing

We draw $\mathbf{A} \in \mathbb{R}^{M \times N}$ with i.i.d. $\mathcal{N}(0, 1/M)$ entries and estimate $\delta_s$ by sampling many random supports of size $s$ , computing the extremal eigenvalues of $\mathbf{A}_\mathcal{S}^T \mathbf{A}_\mathcal{S}$ , and reporting $\max_\mathcal{S} \max(|\lambda_{\max} - 1|, |\lambda_{\min} - 1|)$ . Observe how $\delta_s$ shrinks as $M$ grows relative to $s$ .

Parameters

N

80

min

M/s

4

max

M/s

20

sparsity

s

4

random supports60

seed1

Mutual Coherence vs $(M, N)$ and the Welch Bound

For Gaussian $\mathbf{A}$ with unit-normalized columns, we plot empirical mutual coherence as a function of $M$ (for fixed $N$ ) and overlay the Welch lower bound $\sqrt{(N-M)/(M(N-1))}$ . Gaussian designs are within a constant of the Welch bound but do not attain it.

Parameters

N

200

min

M

20

max

M

150

trials per

M

15

seed0

RIP as Near-Isometry on Sparse Vectors

Visual: the unit sphere of an

s

-sparse subspace is mapped by

\mathbf{A}

into a (slightly squashed) ellipsoid in

\mathbb{R}^M

. As

M

grows, the ellipsoid becomes more spherical —

\delta_s

shrinks. The RIP bounds the distortion uniformly across all

\binom{N}{s}

subspaces.

Example: Coherence of the Hadamard-Spike Dictionary

Let $\mathbf{A} \in \mathbb{R}^{M \times 2M}$ be the concatenation of the $M \times M$ identity (spike basis) and a normalized Hadamard matrix $\mathbf{H}/\sqrt{M}$ . Compute the mutual coherence.

Solution

Inspect the pairings

The columns come in three pairing types: (spike, spike): inner products $0$ (different) or $1$ (same); (Hadamard, Hadamard): columns of an orthogonal matrix → inner products $0$ or $1$ ; (spike $\mathbf{e}_i$ , Hadamard column $\mathbf{h}_j/\sqrt{M}$ ): inner product $\pm 1/\sqrt{M}$ .

Read off the coherence

The maximum off-diagonal inner product in absolute value is $1/\sqrt{M}$ . Hence $\mu(\mathbf{A}) = 1/\sqrt{M}$ .

Relation to the Welch bound

For $M \times 2M$ with unit columns, the Welch bound is $\sqrt{(2M - M)/(M(2M - 1))} = \sqrt{1/(2M - 1)} \approx 1/\sqrt{2M}$ . The spike-Hadamard dictionary achieves $\mu = 1/\sqrt{M}$ , within a factor $\sqrt{2}$ of Welch — essentially optimal among deterministic constructions.

Quick Check

Which of the following does the RIP of order $2s$ with $\delta_{2s} < 1$ directly imply?

$\mathbf{A}$ preserves pairwise Euclidean distances between every pair of $s$ -sparse vectors.

$\mathbf{A}^{T}\mathbf{A} = \mathbf{I}$ on all of $\mathbb{R}^N$ .

$\mathbf{A}$ has coherence $\mu = 0$ .

Every column of $\mathbf{A}$ has unit $\ell_2$ norm.

Correction:

\mathbf{A}

preserves pairwise Euclidean distances between every pair of

s

-sparse vectors.

The difference of two $s$ -sparse vectors is at most $2s$ -sparse, so RIP of order $2s$ bounds $\|\mathbf{A}(\mathbf{x}_1 - \mathbf{x}_2)\|_2$ .

Quick Check

A Gaussian $\mathbf{A}$ of size $M \times N$ with entries $\mathcal{N}(0, 1/M)$ satisfies RIP of order $s$ with constant $\delta$ with high probability when $M$ is at least proportional to:

$s$

$s \log(N/s)$

$N \log s$

$s^2$

Correction:

s \log(N/s)

Correct: $M = O(\delta^{-2} s \log(N/s))$ is the fundamental sample complexity.

Common Mistake: Low Coherence Does Not Imply Sharp RIP

Mistake:

Using the bound $\delta_s \leq (s-1)\mu$ as the operational definition of RIP, and therefore believing that $M \gtrsim s^2$ is needed (since $\mu \gtrsim 1/\sqrt{M}$ implies $\delta_s \leq (s-1)/\sqrt{M}$ , so $\delta_s < c$ needs $M \gtrsim s^2$ ).

Correction:

The coherence bound is a worst-case (pairwise) estimate; random designs satisfy the RIP at the much sharper $M \gtrsim s \log(N/s)$ rate. The coherence-to-RIP implication is one-way: it is a useful check for deterministic matrices, not a tight theory for random ones. This is the "square-root bottleneck" of coherence-only analysis.

Why This Matters: Sparse Channel Estimation in mmWave MIMO

Millimeter-wave channels have a small number of dominant propagation paths (typically $s = 2$ – $5$ ) relative to the large angular/delay grid used for beamforming. Representing the channel as a sparse vector in an angle-delay dictionary and measuring via random pilot beams yields a CS problem with structured $\mathbf{A}$ (partial DFT across angles). The RIP theory of this chapter justifies using $\ell_1$ -regularized channel estimators with far fewer pilots than the grid size would suggest.

Historical Note: The RIP and the Square-Root Bottleneck

2005-2008

Before Candès and Tao introduced the RIP in 2005, the standard guarantees for $\ell_1$ recovery relied on coherence and required $s \lesssim \sqrt{M}$ — the "square-root bottleneck" noted by Donoho and Elad (2003). RIP broke this bottleneck by analyzing sparse subspaces directly, not just pairs of columns. The $s \log(N/s)$ rate for Gaussian matrices was shown by Candès-Romberg-Tao (2006) and given a simpler concentration-based proof by Baraniuk et al. (2008).

,

Key Takeaway

The RIP quantifies how well $\mathbf{A}$ preserves norms on sparse subspaces. Random Gaussian and sub-Gaussian matrices satisfy RIP of order $s$ with $M = O(s \log(N/s))$ — the fundamental sample complexity of compressed sensing. Mutual coherence is an easier-to-compute surrogate via $\delta_s \leq (s-1)\mu$ , but it is sharp only up to the square-root bottleneck and misses the logarithmic scaling.

Restricted Isometry Property (RIP)

The condition $(1-\delta_s)\|\mathbf{x}\|_2^2 \leq \|\mathbf{A}\mathbf{x}\|_2^2 \leq (1+\delta_s)\|\mathbf{x}\|_2^2$ for all $s$ -sparse $\mathbf{x}$ . Controls the conditioning of every $M \times s$ submatrix of $\mathbf{A}$ .

Related: Mutual Coherence, Welch Bound

Equiangular Tight Frame (ETF)

A collection of $N$ unit vectors in $\mathbb{R}^M$ whose pairwise absolute inner products are all equal. ETFs attain the Welch bound with equality.

Related: Welch Bound, Mutual Coherence

The Restricted Isometry Property

What Makes A\mathbf{A}A Good for Sparse Recovery?

Definition: Restricted Isometry Property (RIP)

Definition: Mutual Coherence

Theorem: Welch Bound

Form the Gram matrix

Use the Frobenius-trace inequality

Bound the off-diagonal entries

Theorem: Coherence Implies RIP

Gram matrix of an $s$-subset

Gershgorin's disk bound

Uniformity over supports

Theorem: Gaussian Matrices Satisfy RIP with M=O(slog⁡(N/s))M = O(s \log(N/s))M=O(slog(N/s))

Concentration for a fixed support

$\epsilon$-net on the sphere

Union bound over supports

Bernoulli, Sub-Gaussian, and Partial Fourier

RIP Constants are Hard to Verify

Empirical RIP Constants for Gaussian Sensing

Parameters

Mutual Coherence vs (M,N)(M, N)(M,N) and the Welch Bound

Parameters

RIP as Near-Isometry on Sparse Vectors

Example: Coherence of the Hadamard-Spike Dictionary

Inspect the pairings

Read off the coherence

Relation to the Welch bound

Quick Check

Quick Check

Common Mistake: Low Coherence Does Not Imply Sharp RIP

Why This Matters: Sparse Channel Estimation in mmWave MIMO

Historical Note: The RIP and the Square-Root Bottleneck

Key Takeaway

Restricted Isometry Property (RIP)

Equiangular Tight Frame (ETF)

What Makes $\mathbf{A}$ Good for Sparse Recovery?

Definition:
Restricted Isometry Property (RIP)

Definition:
Mutual Coherence

Theorem: Gaussian Matrices Satisfy RIP with $M = O(s \log(N/s))$

Mutual Coherence vs $(M, N)$ and the Welch Bound