Chapter Summary

Chapter 14 Summary

Key Points

  • 1.

    ISTA is proximal gradient descent on the LASSO objective. The update x(t+1)=softΞ»/L(x(t)+Lβˆ’1AH(yβˆ’Ax(t)))\mathbf{x}^{(t+1)} = \mathrm{soft}_{\lambda/L}(\mathbf{x}^{(t)} + L^{-1}\mathbf{A}^{H}(\mathbf{y}-\mathbf{A}\mathbf{x}^{(t)})) combines a gradient step on the smooth quadratic data term with the proximal operator of Ξ»βˆ₯β‹…βˆ₯1\lambda\|\cdot\|_1 β€” componentwise soft thresholding. Convergence is O(1/k)O(1/k) in objective value, requires step size 1/L1/L with Lβ‰₯Ξ»max⁑(AHA)L \geq \lambda_{\max}(\mathbf{A}^{H}\mathbf{A}), and the per-iteration cost is a single matrix-vector product.

  • 2.

    FISTA achieves O(1/k2)O(1/k^2) via Nesterov momentum. Adding a cheap momentum extrapolation z(t)=x(t)+tβˆ’1t+2(x(t)βˆ’x(tβˆ’1))\mathbf{z}^{(t)} = \mathbf{x}^{(t)} + \tfrac{t-1}{t+2}(\mathbf{x}^{(t)}-\mathbf{x}^{(t-1)}) before the ISTA step accelerates convergence quadratically at no extra storage or gradient cost. This matches the Nesterov lower bound for smooth convex optimization β€” FISTA is optimal for its problem class.

  • 3.

    ADMM splits the problem to decouple smooth and non-smooth terms. Introducing z=x\mathbf{z}=\mathbf{x} and alternating the x\mathbf{x}-update (a linear least-squares solve), z\mathbf{z}-update (soft thresholding), and dual update yields a robust solver with primal/dual residual stopping criteria. ADMM converges for any penalty parameter ρ>0\rho>0 β€” it trades this universality for a sometimes-slower rate than well-tuned FISTA.

  • 4.

    Greedy methods are cheap when sparsity is known. OMP selects one column per iteration (the one best correlated with the current residual), solves a least-squares problem on the active set, and updates the residual. CoSaMP and IHT maintain sparse iterates by hard thresholding. For very sparse signals, well-conditioned dictionaries, and tight sparsity budgets, greedy methods beat convex relaxations in both speed and accuracy.

  • 5.

    Bayesian sparse recovery unifies algorithms under one probabilistic umbrella. MAP under a Laplace prior recovers LASSO; MAP under a spike-and-slab prior recovers β„“0\ell_0; the posterior mean is the Bayesian MMSE. Sparse Bayesian Learning replaces the intractable support enumeration by an ARD hyperparameter fit solved via EM, producing sparser, less-biased estimates than LASSO at cubic per-iteration cost.

Looking Ahead

Chapter 15 puts these algorithms to work on wireless problems β€” sparse channel estimation, massive random access, sparse DOA β€” where the choice of algorithm depends critically on coherence, real-time constraints, and signal structure. Later chapters (18, 20) return to these algorithms as unrolled message-passing schemes and as the starting points for deep unfolding.