The Sparse Recovery Problem
Why Bother with Underdetermined Systems?
Classical linear estimation asks: given noisy observations , how well can we recover ? The least-squares answer is complete, and for with invertible we recover exactly in the noiseless case.
The point is that many modern problems fall on the opposite side: . A camera takes 10 megapixel images but the underlying scene has a few dozen edges. A medical MRI scan could take an hour if all -space samples were acquired; patients cannot hold still that long. An RF imager observes a few hundred pilot responses but must locate scatterers on a grid of voxels. In each case the system is massively underdetermined: infinitely many fit the data.
What saves us is that the true is sparse, or nearly so. Sparsity is a structural prior that collapses the infinite solution set to (hopefully) a unique point. This chapter makes that intuition rigorous. We prove that, under certain random designs, measurements proportional to β far fewer than β suffice to recover an -sparse vector exactly, via a tractable convex program.
Definition: -Sparse Vector
-Sparse Vector
A vector is called -sparse if it has at most nonzero entries, i.e., The set of all -sparse vectors is denoted
The quantity is called the " norm" by convention, but it is not a norm: it fails homogeneity ( for ) and the set is a non-convex union of linear subspaces.
Definition: Compressible Vector
Compressible Vector
A vector is compressible if its sorted magnitudes decay rapidly. Formally, if denote the order statistics of , then is compressible with exponent if for some constant .
Real signals (images in wavelet bases, audio in DCT) are rarely exactly sparse but are typically compressible. The -term approximation error , where keeps the largest entries, decays like for .
Definition: Sparse Recovery Problem
Sparse Recovery Problem
Let with , and let . The sparse recovery problem is: given where is a noise vector with , produce an estimate such that .
In the noiseless case () we seek exact recovery: .
Infinitely Many Candidates
Without any prior, the equation with has either no solutions (inconsistent) or an affine subspace of solutions of dimension . Every in that affine subspace is consistent with the data. Sparsity picks out one.
Theorem: Minimization: The Ideal but Intractable Recovery
Consider the noiseless problem with . If every columns of are linearly independent (equivalently, ), then is the unique solution of
If two different -sparse vectors both satisfied , then their difference would be -sparse and lie in the null space of . If no columns are linearly dependent, the only such vector is zero.
Suppose two $s$-sparse solutions exist
Let both satisfy . Then and .
Use the spark condition
By hypothesis, every collection of columns of is linearly independent. This means the only vector in with support of size at most is . Hence .
Uniqueness in $(P_0)$
Any minimizer of has at most , so it is -sparse. By the previous step, it must equal . Hence is the unique optimum.
Theorem: Minimization is NP-Hard
The problem β minimizing subject to β is NP-hard in general. More precisely, no algorithm solves all instances in time polynomial in unless .
The feasible set is an affine subspace; the objective counts supports. Solving amounts to searching over all supports of size and checking feasibility on each, which is combinatorial.
The proof proceeds by reduction from the NP-hard problem of exact cover by 3-sets.
Reduction from exact cover by 3-sets (sketch)
Given an instance of exact-3-cover β a ground set of elements and a collection of 3-element subsets β construct with one column per subset, whose rows are incidence indicators. Let (the all-ones vector on ). A vector with and corresponds to an exact cover. Hence solves the NP-hard decision problem, so it is at least as hard.
Consequence
We must either restrict (so that becomes tractable) or replace the objective with a convex surrogate. Sections 13.2β13.4 pursue the second route.
Example: Sparse Recovery in Dimension
Let , , and . Suppose we observe and know is -sparse. Recover .
Enumerate sparse candidates
A -sparse vector has the form or for some .
Impose the constraint
Case 1: , giving .
Case 2: , giving .
Both are $1$-sparse solutions
Both candidates satisfy the equation and are -sparse. The problem has two solutions; the minimum is not unique. This violates the spark condition: any two columns of are (trivially) dependent in , so . We would need and non-proportional columns to guarantee uniqueness for .
Solutions: Counting Feasible Sparse Vectors
For a small system with random Gaussian , we vary and count how many distinct supports of size exactly yield a feasible (restricted to support ). Observe the transition from "many solutions" to "unique solution" as grows.
Parameters
Common Mistake: is Not a Norm
Mistake:
Students often invoke the triangle inequality or treat as a continuous objective, e.g., "the gradient of ".
Correction:
is neither homogeneous () nor continuous. It is discontinuous everywhere on a coordinate hyperplane and has no useful subgradient. Any algorithmic approach requires a surrogate (the norm, iteratively reweighted , or greedy methods like OMP).
Common Mistake: is the Sensing Matrix, not the Channel
Mistake:
Writing the CS model as , confusing it with MIMO channel notation.
Correction:
In this book we reserve for wireless channels and for sensing / measurement matrices in compressed sensing and imaging. This follows Caire's convention and prevents notational collisions when both appear in the same analysis (e.g., a CS problem posed in the OFDM frequency domain).
Key Takeaway
Sparse recovery replaces the question "which fits the data?" with "which sparse fits the data?" Combinatorial minimization yields the right answer when every columns of are independent, but it is NP-hard. The art of compressed sensing is to find tractable surrogates that succeed on the same problem instances.
Historical Note: The Birth of Compressed Sensing (2004β2006)
2004-2006Although sparse recovery had been studied in geophysics (seismic deconvolution) and statistics (LASSO, 1996) for decades, compressed sensing as a general theory crystallized in a remarkable 2004β2006 period. CandΓ¨s, Romberg, and Tao proved that random Fourier samples could recover sparse signals via minimization with overwhelming probability. Simultaneously, Donoho introduced the term "compressed sensing" and developed the phase-transition analysis. The confluence of these results reframed sampling theory: Nyquist is a sufficient but not necessary rate when structure (sparsity) is available.
Spark of a matrix
The smallest number of columns of that are linearly dependent: . Unlike rank, spark is hard to compute. is equivalent to uniqueness of -sparse solutions.