Chapter Summary

Key Points

1.
AMP = ISTA + Onsager correction. The iteration $\hat{\mathbf{x}}^{t+1} = \eta(\mathbf{A}^{H}\mathbf{r}^t+\hat{\mathbf{x}}^t;\theta_t)$ , $\mathbf{r}^{t+1} = \mathbf{y}-\mathbf{A}\hat{\mathbf{x}}^{t+1} + \delta^{-1}\langle\eta'\rangle\mathbf{r}^t$ differs from proximal gradient by a single additive term — the Onsager correction — that removes the bias introduced by feedback through $\mathbf{A}^{H}\mathbf{A}$ . It is the defining feature of AMP.
2.
Gaussianity of pseudo-data. In the large- $N$ limit with i.i.d. Gaussian $\mathbf{A}$ , the denoiser input $\mathbf{A}^{H}\mathbf{r}^t+\hat{\mathbf{x}}^t$ is distributed as the true signal plus independent AWGN of variance $\tau_t^2$ . This justifies calling $\eta$ a denoiser and underpins all downstream analysis.
3.
State evolution. AMP's per-iteration MSE is governed by the scalar recursion $\tau_{t+1}^2 = \sigma^2 + \delta^{-1}\mathbb{E}[(\eta(X+\tau_t Z)-X)^2]$ . Fixed points of this recursion determine the terminal MSE; their stability determines convergence. This one-dimensional description is the most striking analytical feature of AMP.
4.
Phase transitions and Bayes-optimality. Plotting SE fixed points over $(\delta,\rho)$ reveals the Donoho--Tanner curve separating recovery from failure. With the MMSE denoiser matched to the prior, the AMP fixed point coincides with the replica-symmetric prediction of the Bayes MMSE — i.e., AMP is asymptotically Bayes-optimal.
5.
Denoiser design is the central knob. Soft-thresholding yields LASSO (with a calibration formula); posterior means yield Bayes-optimal estimation; learned neural denoisers yield D-AMP and deep unfolding. All fit in the same AMP scaffold provided the denoiser is Lipschitz and its divergence can be computed.
6.
AMP fails for structured matrices. The Onsager coefficient is calibrated to the Marchenko--Pastur spectrum. Sub-sampled DFT, Hadamard, ill-conditioned, or Kronecker-structured matrices break the calibration and produce divergence. Damping stabilises AMP but slows convergence and breaks state evolution; the principled fix is OAMP/VAMP (Chapter 21).

Looking Ahead

Chapter 21 generalises the AMP framework to right-rotationally-invariant sensing matrices via OAMP (orthogonal AMP) and VAMP (vector AMP). The key idea is to replace the simple transpose $\mathbf{A}^{H}$ with an LMMSE estimator and enforce orthogonality between the linear and prior-denoising steps. This restores the Gaussianity-of-pseudo-data property for a much larger class of matrices — including structured sensing operators typical of communications and imaging — at the cost of an $O(N^3)$ inversion per iteration (amenable to Kronecker factorisation when the structure is known). Chapter 21 also introduces GAMP for generalised linear models (non-Gaussian likelihoods, e.g., 1-bit compressed sensing) and LAMP/LISTA for learned message passing via deep unfolding.

When AMP Fails: Structured Matrices Exercises