When AMP Fails: Structured Matrices
Structured Matrices Are the Norm, Not the Exception
The state-evolution theorem of Section 20.2 assumes has i.i.d. Gaussian entries. This assumption is mathematically convenient but physically unrealistic in most applications:
- In MRI and radar: is a sub-sampled Fourier matrix.
- In communications: comes from pilot/channel matrices with correlation and structure (Toeplitz, Kronecker, block-diagonal).
- In computational imaging: is a convolution with a specific kernel.
For such matrices AMP's Onsager correction is miscalibrated, the pseudo-data is no longer conditionally Gaussian, and the algorithm can diverge spectacularly — even if the problem is convex and has a perfectly well-defined LASSO solution.
Theorem: AMP Divergence for Non-i.i.d. Matrices
Let be the SVD with right-singular vectors sampled from Haar measure on and singular-value profile with condition number .
If the squared singular-value distribution differs from the Marchenko-- Pastur density (which corresponds to i.i.d.\ Gaussian ), then there exist condition numbers beyond which AMP with soft- thresholding diverges for any , regardless of sparsity.
The Onsager coefficient is derived assuming Marchenko--Pastur spectral statistics. If the actual has a different spectrum, the derived coefficient no longer cancels the feedback bias — residual correlation accumulates and the iteration amplifies its own errors.
Linear perturbation analysis
Linearise AMP around the noiseless fixed point. The one-step error satisfies to first order, where depends on and the spectrum of .
Spectral radius of the linearisation
For i.i.d.\ Gaussian (Marchenko--Pastur spectrum) the Onsager correction is tuned so that whenever state evolution contracts. For matrices with broader eigenvalue spread, the same Onsager coefficient gives for sufficiently ill-conditioned , producing geometric divergence.
Constructing a counter-example
Take a singular-value profile (polynomial decay). For the spectrum deviates sufficiently from Marchenko--Pastur that the linearised iteration has eigenvalues of modulus . Numerical experiments (Rangan et al., 2014) exhibit explosive divergence even for moderate .
Definition: Damped AMP
Damped AMP
Fix a damping parameter . Replace the AMP updates by Damping averages each new iterate with the previous one, reducing the effective step size.
Damping is the simplest fix to stabilise AMP on moderately structured matrices. It is universally helpful and cheap, but comes with the caveat in Theorem TDamping Helps But Does Not Cure.
Theorem: Damping Helps But Does Not Cure
For damped AMP with parameter :
(a) For any right-rotationally-invariant with bounded condition number , there exists such that damped AMP with converges.
(b) The required damping satisfies as .
(c) State evolution does not describe the per-iteration MSE of damped AMP exactly, except in the limit (infinitely many iterations).
Part (a) says: for any fixed non-pathological , enough damping makes AMP converge. Part (b) says: the worse conditioned the matrix, the more damping needed, and the convergence rate slows accordingly. Part (c) says: even when damping works, we lose the analytical tractability that made AMP attractive in the first place.
Stability from contraction mapping
Damped AMP is of the form for an operator . If (finite), then the damped iteration has Lipschitz constant , which is for when . Hence convergence for sufficiently small .
Scaling with condition number
The Lipschitz constant of the AMP operator scales with the spectral spread of . For -conditioned matrices, , giving . As , , i.e., infinitely many iterations required.
Breakdown of state evolution
State evolution was derived under the exact AMP dynamics. Damping introduces a memory term that changes the covariance structure of the pseudo-data over iterations — the conditional Gaussianity survives only in the limit.
AMP on i.i.d. vs Structured Matrices
Run AMP on a Bernoulli--Gaussian signal with two sensing matrices at matched and matched noise: (i) i.i.d.\ Gaussian and (ii) a sub- sampled DFT matrix of the same dimensions. The i.i.d.\ trajectory tracks the SE prediction; the structured-matrix trajectory diverges or stalls.
Parameters
Damped AMP vs Undamped
Illustrate how damping stabilises AMP on an ill-conditioned matrix, at the cost of substantially slower convergence. For AMP diverges; for small it converges but may take 10 more iterations than the i.i.d. baseline.
Parameters
Why This Matters: Why OAMP/VAMP? (Chapter 21)
Structured matrices are the default in communications: pilot matrices have Toeplitz/block structure, beamforming matrices are unitary by design, and precoder-channel products inherit Kronecker factorisation from the antenna geometry. Damped AMP often works but surrenders the sharp analytical characterisation of state evolution.
Orthogonal AMP (OAMP) and Vector AMP (VAMP) re-establish the Gaussianity-of-pseudo-data property for the broad class of right-rotationally-invariant matrices, which includes unitary/DFT matrices and arbitrary conditioned Gaussian ensembles. They achieve this by replacing the simple transpose in the residual step with an LMMSE operator and enforcing orthogonality between the linear estimator and the prior denoiser. The price is an matrix inverse per iteration (mitigated by Kronecker structure, Chapter 21.3).
Common Mistake: Deploying AMP Without Checking the Matrix
Mistake:
Implementing AMP for a real application (MRI reconstruction, radar, channel estimation) without first verifying that the sensing matrix is sufficiently close to i.i.d.\ Gaussian. AMP is then deployed with the belief that it enjoys the state-evolution guarantees, and mysterious divergence or stalling is blamed on numerics.
Correction:
Before deploying AMP: (i) inspect the singular-value distribution of and compare to Marchenko--Pastur; (ii) if the match is poor, add damping and reduce expectations, or switch to OAMP/VAMP (Chapter 21); (iii) if divergence persists, fall back to provably-convergent proximal methods (FISTA with line search).
Historical Note: The Great AMP Disappointment (2009--2014)
2009-2019When AMP was introduced in 2009 it promised a revolution: single-digit- iteration compressed sensing with Bayes-optimal guarantees. Researchers in communications and imaging rushed to deploy it on their problems — only to discover that AMP diverged on sub-sampled Fourier, Hadamard, and most practical sensing operators.
This "disappointment" drove the development of OAMP (Ma--Ping 2017) and VAMP (Rangan--Schniter--Fletcher 2019), which extended the Gaussianity-of-pseudo-data guarantee to right-rotationally-invariant matrices. The episode is a salutary reminder that asymptotic theorems for random matrices do not automatically transfer to the structured matrices of engineering practice.
Key Takeaway
AMP's analytical magic — scalar state evolution, phase transitions, Bayes-optimality — requires i.i.d.\ Gaussian-like sensing. For the structured matrices of real applications the Onsager correction is miscalibrated and AMP diverges. Damping stabilises but slows convergence and breaks state evolution. The principled remedy is OAMP/VAMP (Chapter 21), which generalises the AMP framework to right-rotationally- invariant matrices at the cost of an LMMSE step per iteration.