When AMP Fails: Structured Matrices

Structured Matrices Are the Norm, Not the Exception

The state-evolution theorem of Section 20.2 assumes A\mathbf{A} has i.i.d. Gaussian entries. This assumption is mathematically convenient but physically unrealistic in most applications:

  • In MRI and radar: A\mathbf{A} is a sub-sampled Fourier matrix.
  • In communications: A\mathbf{A} comes from pilot/channel matrices with correlation and structure (Toeplitz, Kronecker, block-diagonal).
  • In computational imaging: A\mathbf{A} is a convolution with a specific kernel.

For such matrices AMP's Onsager correction is miscalibrated, the pseudo-data is no longer conditionally Gaussian, and the algorithm can diverge spectacularly — even if the problem is convex and has a perfectly well-defined LASSO solution.

Theorem: AMP Divergence for Non-i.i.d. Matrices

Let A=UΣVH\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H be the SVD with right-singular vectors V\mathbf{V} sampled from Haar measure on O(N)O(N) and singular-value profile {σi}\{\sigma_i\} with condition number κ=σmax/σmin\kappa = \sigma_{\max}/\sigma_{\min}.

If the squared singular-value distribution differs from the Marchenko-- Pastur density (which corresponds to i.i.d.\ Gaussian A\mathbf{A}), then there exist condition numbers κ\kappa beyond which AMP with soft- thresholding diverges for any λ>0\lambda > 0, regardless of sparsity.

The Onsager coefficient δ1η\delta^{-1}\langle\eta'\rangle is derived assuming Marchenko--Pastur spectral statistics. If the actual AHA\mathbf{A}^{H}\mathbf{A} has a different spectrum, the derived coefficient no longer cancels the feedback bias — residual correlation accumulates and the iteration amplifies its own errors.

Definition:

Damped AMP

Fix a damping parameter θ(0,1]\theta \in (0, 1]. Replace the AMP updates by x^t+1=(1θ)x^t+θη(AHrt+x^t;θt),\hat{\mathbf{x}}^{t+1} = (1-\theta)\hat{\mathbf{x}}^t + \theta\,\eta(\mathbf{A}^{H}\mathbf{r}^t+\hat{\mathbf{x}}^t;\theta_t), rt+1=(1θ)rt+θ(yAx^t+1+δ1ηrt).\mathbf{r}^{t+1} = (1-\theta)\mathbf{r}^t + \theta\bigl(\mathbf{y}-\mathbf{A}\hat{\mathbf{x}}^{t+1} + \delta^{-1}\langle\eta'\rangle\,\mathbf{r}^t\bigr). Damping averages each new iterate with the previous one, reducing the effective step size.

Damping is the simplest fix to stabilise AMP on moderately structured matrices. It is universally helpful and cheap, but comes with the caveat in Theorem TDamping Helps But Does Not Cure.

Theorem: Damping Helps But Does Not Cure

For damped AMP with parameter θ(0,1]\theta \in (0,1]:

(a) For any right-rotationally-invariant A\mathbf{A} with bounded condition number κ\kappa, there exists θ(κ)>0\theta^\star(\kappa) > 0 such that damped AMP with θθ(κ)\theta \le \theta^\star(\kappa) converges.

(b) The required damping θ(κ)\theta^\star(\kappa) satisfies θ(κ)0\theta^\star(\kappa) \to 0 as κ\kappa \to \infty.

(c) State evolution does not describe the per-iteration MSE of damped AMP exactly, except in the limit θ0\theta \to 0 (infinitely many iterations).

Part (a) says: for any fixed non-pathological A\mathbf{A}, enough damping makes AMP converge. Part (b) says: the worse conditioned the matrix, the more damping needed, and the convergence rate slows accordingly. Part (c) says: even when damping works, we lose the analytical tractability that made AMP attractive in the first place.

,

AMP on i.i.d. vs Structured Matrices

Run AMP on a Bernoulli--Gaussian signal with two sensing matrices at matched (M,N)(M,N) and matched noise: (i) i.i.d.\ Gaussian and (ii) a sub- sampled DFT matrix of the same dimensions. The i.i.d.\ trajectory tracks the SE prediction; the structured-matrix trajectory diverges or stalls.

Parameters
1024
0.5
0.1
40
25

Damped AMP vs Undamped

Illustrate how damping θ\theta stabilises AMP on an ill-conditioned matrix, at the cost of substantially slower convergence. For θ=1\theta=1 AMP diverges; for small θ\theta it converges but may take 10×\times more iterations than the i.i.d. baseline.

Parameters
0.3
10
500
0.6
40

Why This Matters: Why OAMP/VAMP? (Chapter 21)

Structured A\mathbf{A} matrices are the default in communications: pilot matrices have Toeplitz/block structure, beamforming matrices are unitary by design, and precoder-channel products inherit Kronecker factorisation from the antenna geometry. Damped AMP often works but surrenders the sharp analytical characterisation of state evolution.

Orthogonal AMP (OAMP) and Vector AMP (VAMP) re-establish the Gaussianity-of-pseudo-data property for the broad class of right-rotationally-invariant matrices, which includes unitary/DFT matrices and arbitrary conditioned Gaussian ensembles. They achieve this by replacing the simple transpose AH\mathbf{A}^{H} in the residual step with an LMMSE operator and enforcing orthogonality between the linear estimator and the prior denoiser. The price is an O(N3)O(N^3) matrix inverse per iteration (mitigated by Kronecker structure, Chapter 21.3).

Common Mistake: Deploying AMP Without Checking the Matrix

Mistake:

Implementing AMP for a real application (MRI reconstruction, radar, channel estimation) without first verifying that the sensing matrix is sufficiently close to i.i.d.\ Gaussian. AMP is then deployed with the belief that it enjoys the state-evolution guarantees, and mysterious divergence or stalling is blamed on numerics.

Correction:

Before deploying AMP: (i) inspect the singular-value distribution of AHA/M\mathbf{A}^{H}\mathbf{A}/M and compare to Marchenko--Pastur; (ii) if the match is poor, add damping and reduce expectations, or switch to OAMP/VAMP (Chapter 21); (iii) if divergence persists, fall back to provably-convergent proximal methods (FISTA with line search).

Historical Note: The Great AMP Disappointment (2009--2014)

2009-2019

When AMP was introduced in 2009 it promised a revolution: single-digit- iteration compressed sensing with Bayes-optimal guarantees. Researchers in communications and imaging rushed to deploy it on their problems — only to discover that AMP diverged on sub-sampled Fourier, Hadamard, and most practical sensing operators.

This "disappointment" drove the development of OAMP (Ma--Ping 2017) and VAMP (Rangan--Schniter--Fletcher 2019), which extended the Gaussianity-of-pseudo-data guarantee to right-rotationally-invariant matrices. The episode is a salutary reminder that asymptotic theorems for random matrices do not automatically transfer to the structured matrices of engineering practice.

Key Takeaway

AMP's analytical magic — scalar state evolution, phase transitions, Bayes-optimality — requires i.i.d.\ Gaussian-like sensing. For the structured matrices of real applications the Onsager correction is miscalibrated and AMP diverges. Damping stabilises but slows convergence and breaks state evolution. The principled remedy is OAMP/VAMP (Chapter 21), which generalises the AMP framework to right-rotationally- invariant matrices at the cost of an LMMSE step per iteration.