Prerequisites & Notation

Before You Begin

This chapter assumes mastery of the sparse estimation theory developed in Chapter 13 (LASSO, BPDN, $\ell_1$ relaxation, RIP and coherence guarantees). Here we turn from "does a sparse solution exist?" to "how do we actually compute it?" — a shift that demands first-order convex optimization, proximal operators, and some taste for Bayesian modeling.

LASSO and basis pursuit denoising (Ch 13)(Review ch13)
Self-check: Can you write the LASSO objective $\tfrac{1}{2}\|\mathbf{A}\mathbf{x}-\mathbf{y}\|_2^2 + \lambda\|\mathbf{x}\|_1$ and explain why $\lambda$ trades off sparsity against fit?
Restricted Isometry Property and coherence (Ch 13)(Review ch13)
Self-check: When does a random Gaussian $\mathbf{A}$ satisfy RIP- $s$ with $M=O(s\log(N/s))$ measurements?
Gradient descent and Lipschitz continuity of gradients
Self-check: For $f(\mathbf{x}) = \tfrac{1}{2}\|\mathbf{A}\mathbf{x}-\mathbf{y}\|_2^2$ , what is $\nabla f$ and what is its Lipschitz constant $L$ ?
Convex functions, subdifferentials, and KKT conditions
Self-check: Can you compute $\partial \|\mathbf{x}\|_1$ at $\mathbf{x}=\mathbf{0}$ ?
MAP and MMSE Bayesian estimation (Ch 8)(Review ch08)
Self-check: When do MAP and MMSE coincide, and when do they differ qualitatively?

Notation for This Chapter

Symbols used throughout Chapter 14. Most were introduced in Chapter 13; the algorithmic quantities (step size, momentum, residual) are new here.

Symbol	Meaning	Introduced
$\mathbf{A}\in\mathbb{R}^{M\times N}$	Sensing (measurement) matrix; $M$ observations of an $N$ -dim signal	s01
$\mathbf{x}\in\mathbb{R}^N$	Unknown sparse signal (the optimization variable)	s01
$\mathbf{y}\in\mathbb{R}^M$	Measurements; $\mathbf{y} = \mathbf{A}\mathbf{x}_\star + \mathbf{w}$	s01
$\lambda$	Regularization parameter (controls sparsity vs. fit)	s01
$S_\tau(u)$	Soft-threshold operator with threshold $\tau$	s01
$L$	Lipschitz constant of $\nabla f$ ; here $L=\\|\mathbf{A}\\|_2^2=\sigma_{\max}^2(\mathbf{A})$	s01
$t_k$	FISTA momentum sequence: $t_{k+1}=(1+\sqrt{1+4t_k^2})/2$	s01
$\mathbf{r}^{(k)}$	Primal residual $\mathbf{x}^{(k)}-\mathbf{z}^{(k)}$ in ADMM	s02
$\rho$	ADMM penalty parameter (augmented-Lagrangian weight); also Bernoulli activation probability in s04	s02
$\mathbf{u}$	Scaled dual variable in ADMM	s02
$\mathcal{S}^{(k)}$	Support estimate at iteration $k$ (greedy algorithms)	s03
$H_s(\cdot)$	Hard-thresholding operator: keep the $s$ largest magnitudes	s03
$s$	Target sparsity level (number of nonzeros)	s03
$\pi_i = \mathbb{P}(x_i\neq 0)$	Bernoulli activation probability in spike-and-slab prior	s04
$\gamma_i$	SBL hyper-parameter (per-coefficient prior variance)	s04

← Ch 13 ISTA and FISTA