Greedy Algorithms: OMP, CoSaMP, IHT
The Case for Greediness
Convex relaxation solves -sparse recovery by replacing the combinatorial penalty with and invoking ISTA/FISTA/ADMM. This works β but in three important regimes, greedy algorithms are faster, simpler, and often more accurate: (i) known sparsity level , so we can stop after exactly selections; (ii) very sparse signals (), where the greedy search touches only a handful of columns of ; (iii) speed-critical applications like real-time channel estimation or radar, where even FISTA is too expensive. This section develops the three canonical greedy methods: OMP, CoSaMP, and IHT. All three share one idea β identify a candidate support, solve a restricted least-squares on it, update. They differ in how they build the support.
Definition: Restricted Least-Squares
Restricted Least-Squares
Given a support set with , the restricted least-squares estimate is where is the submatrix containing only the columns of indexed by , and is its pseudo-inverse. The full estimate places these values on and zeros elsewhere.
All three greedy algorithms produce their final iterate by a restricted least-squares solve on their estimated support. The cost is a QR or Cholesky factorization of β cheap when is small.
OMP (Orthogonal Matching Pursuit)
Complexity: correlations + for the incremental QR of . Much cheaper than FISTA when .The name orthogonal refers to step 4: after each selection we re-solve LS over the current support, which makes the residual orthogonal to every column in . Plain matching pursuit (Mallat-Zhang 1993) skipped this orthogonalization and was forced to revisit coordinates; OMP never revisits a chosen index.
Theorem: OMP Exact Recovery under Coherence
Let with unit-norm columns and coherence . If is -sparse with and (noiseless), then OMP recovers the true support and exactly in steps.
Low coherence means the columns of are nearly orthogonal, so the inner product peaks at a true-support index. OMP then identifies one correct index per iteration. The condition ensures the peak is always on a correct index: the "interference" from other active coordinates cannot overpower the signal of the unselected true index.
Induction on the selected support
Claim: at iteration , . Base case : , trivially true. Assume true for .
Residual in terms of remaining support
Because LS is solved on , where is supported on (and possibly scaled; details in Tropp 2004).
Identification peaks on true support
For a true index , . For an off-support index , . Under , the true peak strictly dominates, so is on the true support.
Termination
After iterations, , and LS on the exact support recovers exactly.
CoSaMP (Compressive Sampling Matching Pursuit)
Complexity: Per iter for the correlation + for LS on the merged support. Linear convergence under RIP: reaches accuracy in iterations.CoSaMP (Needell-Tropp 2008) differs from OMP in two ways: it adds candidates per iteration (not 1), and it can remove wrongly-selected indices via the final hard- threshold . This makes CoSaMP robust to noise and RIP-friendly. It converges in logarithmically many iterations, while OMP needs .
IHT (Iterative Hard Thresholding)
Complexity: per iteration. Linear convergence under RIP: for some .IHT is the combinatorial sibling of ISTA β replace soft-threshold with hard-threshold at sparsity . It solves the non-convex problem directly: s.t. . Convergence to the global minimum is guaranteed under RIP (Blumensath-Davies 2009).
Theorem: IHT Convergence under RIP
Let satisfy RIP of order with constant . Let be -sparse and . Then the IHT iterates with step size satisfy
Linear convergence to a noise-floor plateau. Each iteration contracts the error by a factor ; the residual noise leaves an unavoidable noise floor proportional to . Same qualitative story as CoSaMP.
Error recursion
Let . After the gradient step: .
Use RIP on 3s-sparse vectors
projects onto a support of size . The error is supported on β at most nonzeros. The "interference" matrix acts on vectors of support size , so by RIP its operator norm on these is . The hard-threshold loses at most a factor of by standard best--approximation arguments, yielding .
Unfold recursion
Iterating gives the geometric-series bound in the statement.
OMP: Residual Energy vs. Iteration
Watch the residual drop as OMP greedily picks columns. With low coherence, residual hits zero exactly at iteration . With high coherence (or noise), residual plateaus at the noise floor.
Parameters
Support Recovery: Greedy vs. LASSO
Sweep the true sparsity level and plot the probability that each algorithm recovers the exact support. Compares OMP, CoSaMP, IHT, and LASSO/FISTA on random Gaussian sensing matrices.
Parameters
OMP: Selecting Columns of One at a Time
Example: OMP on a 3-Column Problem
Let with unit-norm columns and , , . Let (so ) and . Execute two iterations of OMP.
Iteration 1: correlations
. . Compute each: ; ; . Largest magnitude: , so , . Correct true-support pick.
Iteration 1: residual
LS on : . . Note the LS on gave , not the true , because column 3 was non-orthogonal to column 1.
Iteration 2: correlations
Compute . : (residual orthogonal to selected column, as it should be). : . : . Largest: , so , . Both true-support indices now selected.
Iteration 2: final LS recovers $\mathbf{x}_\star$
LS on with exact support gives exactly (the true coefficients), hence . The residual is zero β OMP terminates with exact recovery after steps.
When to Pick Greedy Over LASSO
Rule of thumb: if is known (a priori or from a reliable estimator) and the measurement matrix has low coherence or satisfies strong RIP, greedy methods are faster and simpler. If is unknown, or the measurement matrix has high coherence, LASSO/FISTA is more forgiving. A practical compromise: run OMP first to identify a support, then refine with a debiased LS solve on that support (the "OMP-debiasing" trick) β this combines the speed of greedy selection with the unbiased coefficients of LS.
- β’
OMP is not robust to measurement noise once the residual drops below the noise floor β stop early.
- β’
CoSaMP and IHT need RIP, which is hard to verify but holds w.h.p. for random Gaussian/Bernoulli matrices.
- β’
For streaming data, IHT with small step size acts as an online sparse estimator.
OMP vs. CoSaMP vs. IHT vs. LASSO
| Algorithm | Needs ? | Per-Iter Cost | Iterations | Guarantee |
|---|---|---|---|---|
| OMP | yes (usually) | Coherence / ERC | ||
| CoSaMP | yes | RIP- | ||
| IHT | yes | RIP- | ||
| LASSO / FISTA | no | RIP- |
Common Mistake: Running OMP on a High-Coherence Dictionary
Mistake:
Applying OMP to an overcomplete dictionary with close to and expecting exact support recovery.
Correction:
The ERC guarantee degrades rapidly as . For only is safe β OMP will almost certainly pick a wrong index at iteration 2. Either pre-process with dictionary decoherence (e.g. Gram-Schmidt), or use LASSO, which is forgiving of coherence when the signal has distinguishable magnitudes.
Common Mistake: IHT Step Size Too Small
Mistake:
Choosing with , expecting ISTA-like convergence."
Correction:
IHT's RIP convergence analysis requires , assuming has columns normalized to . With , convergence is typically much slower. Rescale so that its columns have unit norm, then use .
Historical Note: Matching Pursuit, Orthogonal MP, and Compressed Sensing
1993-2009Matching Pursuit was introduced by Mallat and Zhang (1993) in the context of wavelet dictionaries. Pati, Rezaiifar and Krishnaprasad (1993) and independently Davis, Mallat and Avellaneda (1994) added the orthogonalization step, producing OMP. Its role exploded in the 2000s with compressed sensing: Tropp's "Greed is Good" (2004) gave the first modern coherence analysis. CoSaMP (Needell-Tropp 2008), Subspace Pursuit (Dai-Milenkovic 2009), and IHT (Blumensath-Davies 2009) emerged almost simultaneously, all motivated by the need for provable RIP-based guarantees matching those of BPDN/LASSO but with lower per-iteration cost.
OMP
Orthogonal Matching Pursuit. At each step, picks the column of most correlated with the current residual, adds it to the support, and solves restricted least-squares. Terminates after steps.
Related: OMP (Orthogonal Matching Pursuit)
CoSaMP
Compressive Sampling Matching Pursuit. Adds candidates per iteration and uses a final hard-threshold to prune back to . Linear convergence under RIP.
IHT
Iterative Hard Thresholding. Projected gradient descent onto the set of -sparse vectors. Solves the non-convex problem and converges linearly under RIP.
Related: IHT (Iterative Hard Thresholding)
Quick Check
Which scenario favours OMP over LASSO?
Unknown sparsity level and high measurement noise.
Known small sparsity, low coherence, ultra-fast inference required.
Overcomplete dictionary with high mutual coherence.
Distributed data across multiple servers.
Greedy methods shine when is known and small β they touch only columns of , far cheaper than FISTA's full mat-vecs.
Quick Check
IHT converges at what rate under RIP?
Sublinear
Sublinear
Linear for some
No convergence guarantee
Key Takeaway
Greedy algorithms trade the convex-optimization guarantee for combinatorial directness. OMP picks one column per iteration (runtime ); CoSaMP and IHT pick / adjust many indices per iteration and converge in steps under RIP. When you know , the sensing matrix is well-conditioned, and the SNR is moderate to high, greedy methods are often the fastest path to a sparse estimate.