Chapter Summary
Chapter 14 Summary
Key Points
- 1.
ISTA is proximal gradient descent on the LASSO objective. The update combines a gradient step on the smooth quadratic data term with the proximal operator of β componentwise soft thresholding. Convergence is in objective value, requires step size with , and the per-iteration cost is a single matrix-vector product.
- 2.
FISTA achieves via Nesterov momentum. Adding a cheap momentum extrapolation before the ISTA step accelerates convergence quadratically at no extra storage or gradient cost. This matches the Nesterov lower bound for smooth convex optimization β FISTA is optimal for its problem class.
- 3.
ADMM splits the problem to decouple smooth and non-smooth terms. Introducing and alternating the -update (a linear least-squares solve), -update (soft thresholding), and dual update yields a robust solver with primal/dual residual stopping criteria. ADMM converges for any penalty parameter β it trades this universality for a sometimes-slower rate than well-tuned FISTA.
- 4.
Greedy methods are cheap when sparsity is known. OMP selects one column per iteration (the one best correlated with the current residual), solves a least-squares problem on the active set, and updates the residual. CoSaMP and IHT maintain sparse iterates by hard thresholding. For very sparse signals, well-conditioned dictionaries, and tight sparsity budgets, greedy methods beat convex relaxations in both speed and accuracy.
- 5.
Bayesian sparse recovery unifies algorithms under one probabilistic umbrella. MAP under a Laplace prior recovers LASSO; MAP under a spike-and-slab prior recovers ; the posterior mean is the Bayesian MMSE. Sparse Bayesian Learning replaces the intractable support enumeration by an ARD hyperparameter fit solved via EM, producing sparser, less-biased estimates than LASSO at cubic per-iteration cost.
Looking Ahead
Chapter 15 puts these algorithms to work on wireless problems β sparse channel estimation, massive random access, sparse DOA β where the choice of algorithm depends critically on coherence, real-time constraints, and signal structure. Later chapters (18, 20) return to these algorithms as unrolled message-passing schemes and as the starting points for deep unfolding.