Prerequisites & Notation

Before You Begin

This chapter assumes mastery of the sparse estimation theory developed in Chapter 13 (LASSO, BPDN, β„“1\ell_1 relaxation, RIP and coherence guarantees). Here we turn from "does a sparse solution exist?" to "how do we actually compute it?" β€” a shift that demands first-order convex optimization, proximal operators, and some taste for Bayesian modeling.

  • LASSO and basis pursuit denoising (Ch 13)(Review ch13)

    Self-check: Can you write the LASSO objective 12βˆ₯Axβˆ’yβˆ₯22+Ξ»βˆ₯xβˆ₯1\tfrac{1}{2}\|\mathbf{A}\mathbf{x}-\mathbf{y}\|_2^2 + \lambda\|\mathbf{x}\|_1 and explain why Ξ»\lambda trades off sparsity against fit?

  • Restricted Isometry Property and coherence (Ch 13)(Review ch13)

    Self-check: When does a random Gaussian A\mathbf{A} satisfy RIP-ss with M=O(slog⁑(N/s))M=O(s\log(N/s)) measurements?

  • Gradient descent and Lipschitz continuity of gradients

    Self-check: For f(x)=12βˆ₯Axβˆ’yβˆ₯22f(\mathbf{x}) = \tfrac{1}{2}\|\mathbf{A}\mathbf{x}-\mathbf{y}\|_2^2, what is βˆ‡f\nabla f and what is its Lipschitz constant LL?

  • Convex functions, subdifferentials, and KKT conditions

    Self-check: Can you compute βˆ‚βˆ₯xβˆ₯1\partial \|\mathbf{x}\|_1 at x=0\mathbf{x}=\mathbf{0}?

  • MAP and MMSE Bayesian estimation (Ch 8)(Review ch08)

    Self-check: When do MAP and MMSE coincide, and when do they differ qualitatively?

Notation for This Chapter

Symbols used throughout Chapter 14. Most were introduced in Chapter 13; the algorithmic quantities (step size, momentum, residual) are new here.

SymbolMeaningIntroduced
A∈RMΓ—N\mathbf{A}\in\mathbb{R}^{M\times N}Sensing (measurement) matrix; MM observations of an NN-dim signals01
x∈RN\mathbf{x}\in\mathbb{R}^NUnknown sparse signal (the optimization variable)s01
y∈RM\mathbf{y}\in\mathbb{R}^MMeasurements; y=Ax⋆+w\mathbf{y} = \mathbf{A}\mathbf{x}_\star + \mathbf{w}s01
Ξ»\lambdaRegularization parameter (controls sparsity vs. fit)s01
SΟ„(u)S_\tau(u)Soft-threshold operator with threshold Ο„\taus01
LLLipschitz constant of βˆ‡f\nabla f; here L=βˆ₯Aβˆ₯22=Οƒmax⁑2(A)L=\|\mathbf{A}\|_2^2=\sigma_{\max}^2(\mathbf{A})s01
tkt_kFISTA momentum sequence: tk+1=(1+1+4tk2)/2t_{k+1}=(1+\sqrt{1+4t_k^2})/2s01
r(k)\mathbf{r}^{(k)}Primal residual x(k)βˆ’z(k)\mathbf{x}^{(k)}-\mathbf{z}^{(k)} in ADMMs02
ρ\rhoADMM penalty parameter (augmented-Lagrangian weight); also Bernoulli activation probability in s04s02
u\mathbf{u}Scaled dual variable in ADMMs02
S(k)\mathcal{S}^{(k)}Support estimate at iteration kk (greedy algorithms)s03
Hs(β‹…)H_s(\cdot)Hard-thresholding operator: keep the ss largest magnitudess03
ssTarget sparsity level (number of nonzeros)s03
πi=P(xi≠0)\pi_i = \mathbb{P}(x_i\neq 0)Bernoulli activation probability in spike-and-slab priors04
Ξ³i\gamma_iSBL hyper-parameter (per-coefficient prior variance)s04