Chapter Summary

Chapter 14 Summary

Key Points

1.
ISTA is proximal gradient descent on the LASSO objective. The update $\mathbf{x}^{(t+1)} = \mathrm{soft}_{\lambda/L}(\mathbf{x}^{(t)} + L^{-1}\mathbf{A}^{H}(\mathbf{y}-\mathbf{A}\mathbf{x}^{(t)}))$ combines a gradient step on the smooth quadratic data term with the proximal operator of $\lambda\|\cdot\|_1$ — componentwise soft thresholding. Convergence is $O(1/k)$ in objective value, requires step size $1/L$ with $L \geq \lambda_{\max}(\mathbf{A}^{H}\mathbf{A})$ , and the per-iteration cost is a single matrix-vector product.
2.
FISTA achieves $O(1/k^2)$ via Nesterov momentum. Adding a cheap momentum extrapolation $\mathbf{z}^{(t)} = \mathbf{x}^{(t)} + \tfrac{t-1}{t+2}(\mathbf{x}^{(t)}-\mathbf{x}^{(t-1)})$ before the ISTA step accelerates convergence quadratically at no extra storage or gradient cost. This matches the Nesterov lower bound for smooth convex optimization — FISTA is optimal for its problem class.
3.
ADMM splits the problem to decouple smooth and non-smooth terms. Introducing $\mathbf{z}=\mathbf{x}$ and alternating the $\mathbf{x}$ -update (a linear least-squares solve), $\mathbf{z}$ -update (soft thresholding), and dual update yields a robust solver with primal/dual residual stopping criteria. ADMM converges for any penalty parameter $\rho>0$ — it trades this universality for a sometimes-slower rate than well-tuned FISTA.
4.
Greedy methods are cheap when sparsity is known. OMP selects one column per iteration (the one best correlated with the current residual), solves a least-squares problem on the active set, and updates the residual. CoSaMP and IHT maintain sparse iterates by hard thresholding. For very sparse signals, well-conditioned dictionaries, and tight sparsity budgets, greedy methods beat convex relaxations in both speed and accuracy.
5.
Bayesian sparse recovery unifies algorithms under one probabilistic umbrella. MAP under a Laplace prior recovers LASSO; MAP under a spike-and-slab prior recovers $\ell_0$ ; the posterior mean is the Bayesian MMSE. Sparse Bayesian Learning replaces the intractable support enumeration by an ARD hyperparameter fit solved via EM, producing sparser, less-biased estimates than LASSO at cubic per-iteration cost.

Looking Ahead

Chapter 15 puts these algorithms to work on wireless problems — sparse channel estimation, massive random access, sparse DOA — where the choice of algorithm depends critically on coherence, real-time constraints, and signal structure. Later chapters (18, 20) return to these algorithms as unrolled message-passing schemes and as the starting points for deep unfolding.

Bayesian Sparse Recovery Exercises