Ferkans — Interactive Telecom Tutor

Structured Matrices Are the Norm, Not the Exception

The state-evolution theorem of Section 20.2 assumes $\mathbf{A}$ has i.i.d. Gaussian entries. This assumption is mathematically convenient but physically unrealistic in most applications:

In MRI and radar: $\mathbf{A}$ is a sub-sampled Fourier matrix.
In communications: $\mathbf{A}$ comes from pilot/channel matrices with correlation and structure (Toeplitz, Kronecker, block-diagonal).
In computational imaging: $\mathbf{A}$ is a convolution with a specific kernel.

For such matrices AMP's Onsager correction is miscalibrated, the pseudo-data is no longer conditionally Gaussian, and the algorithm can diverge spectacularly — even if the problem is convex and has a perfectly well-defined LASSO solution.

Theorem: AMP Divergence for Non-i.i.d. Matrices

Let $\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H$ be the SVD with right-singular vectors $\mathbf{V}$ sampled from Haar measure on $O(N)$ and singular-value profile $\{\sigma_i\}$ with condition number $\kappa = \sigma_{\max}/\sigma_{\min}$ .

If the squared singular-value distribution differs from the Marchenko-- Pastur density (which corresponds to i.i.d.\ Gaussian $\mathbf{A}$ ), then there exist condition numbers $\kappa$ beyond which AMP with soft- thresholding diverges for any $\lambda > 0$ , regardless of sparsity.

The Onsager coefficient $\delta^{-1}\langle\eta'\rangle$ is derived assuming Marchenko--Pastur spectral statistics. If the actual $\mathbf{A}^{H}\mathbf{A}$ has a different spectrum, the derived coefficient no longer cancels the feedback bias — residual correlation accumulates and the iteration amplifies its own errors.

Proof

Linear perturbation analysis

Linearise AMP around the noiseless fixed point. The one-step error $\mathbf{e}^t = \hat{\mathbf{x}}^t - \mathbf{x}$ satisfies $\mathbf{e}^{t+1} = \mathbf{J}_t \mathbf{e}^t$ to first order, where $\mathbf{J}_t$ depends on $\mathbf{V}^H\mathrm{diag}(\eta'_t)\mathbf{V}$ and the spectrum of $\boldsymbol{\Sigma}$ .

Spectral radius of the linearisation

For i.i.d.\ Gaussian $\mathbf{A}$ (Marchenko--Pastur spectrum) the Onsager correction is tuned so that $\rho(\mathbf{J}_t)<1$ whenever state evolution contracts. For matrices with broader eigenvalue spread, the same Onsager coefficient gives $\rho(\mathbf{J}_t) \ge 1$ for sufficiently ill-conditioned $\boldsymbol{\Sigma}$ , producing geometric divergence.

Constructing a counter-example

Take a singular-value profile $\sigma_i \propto i^{-\beta}$ (polynomial decay). For $\beta > \beta^\star(\delta)$ the spectrum deviates sufficiently from Marchenko--Pastur that the linearised iteration has eigenvalues of modulus $> 1$ . Numerical experiments (Rangan et al., 2014) exhibit explosive divergence even for moderate $\beta$ .

Definition:
Damped AMP

Fix a damping parameter $\theta \in (0, 1]$ . Replace the AMP updates by $\hat{\mathbf{x}}^{t+1} = (1-\theta)\hat{\mathbf{x}}^t + \theta\,\eta(\mathbf{A}^{H}\mathbf{r}^t+\hat{\mathbf{x}}^t;\theta_t),$ $\mathbf{r}^{t+1} = (1-\theta)\mathbf{r}^t + \theta\bigl(\mathbf{y}-\mathbf{A}\hat{\mathbf{x}}^{t+1} + \delta^{-1}\langle\eta'\rangle\,\mathbf{r}^t\bigr).$ Damping averages each new iterate with the previous one, reducing the effective step size.

Damping is the simplest fix to stabilise AMP on moderately structured matrices. It is universally helpful and cheap, but comes with the caveat in Theorem TDamping Helps But Does Not Cure.

Theorem: Damping Helps But Does Not Cure

For damped AMP with parameter $\theta \in (0,1]$ :

(a) For any right-rotationally-invariant $\mathbf{A}$ with bounded condition number $\kappa$ , there exists $\theta^\star(\kappa) > 0$ such that damped AMP with $\theta \le \theta^\star(\kappa)$ converges.

(b) The required damping $\theta^\star(\kappa)$ satisfies $\theta^\star(\kappa) \to 0$ as $\kappa \to \infty$ .

(c) State evolution does not describe the per-iteration MSE of damped AMP exactly, except in the limit $\theta \to 0$ (infinitely many iterations).

Part (a) says: for any fixed non-pathological $\mathbf{A}$ , enough damping makes AMP converge. Part (b) says: the worse conditioned the matrix, the more damping needed, and the convergence rate slows accordingly. Part (c) says: even when damping works, we lose the analytical tractability that made AMP attractive in the first place.

Proof

Stability from contraction mapping

Damped AMP is of the form $\mathbf{z}^{t+1} = (1-\theta)\mathbf{z}^t + \theta\,T(\mathbf{z}^t)$ for an operator $T$ . If $\|T\|_{\mathrm{Lip}} \le L$ (finite), then the damped iteration has Lipschitz constant $1-\theta+\theta L$ , which is $<1$ for $\theta < 1/(L-1)$ when $L>1$ . Hence convergence for sufficiently small $\theta$ .

Scaling with condition number

The Lipschitz constant $L$ of the AMP operator scales with the spectral spread of $\mathbf{A}$ . For $\kappa$ -conditioned matrices, $L = \Theta(\kappa)$ , giving $\theta^\star = \Theta(1/\kappa)$ . As $\kappa \to \infty$ , $\theta^\star \to 0$ , i.e., infinitely many iterations required.

Breakdown of state evolution

State evolution was derived under the exact AMP dynamics. Damping introduces a memory term that changes the covariance structure of the pseudo-data over iterations — the conditional Gaussianity survives only in the $\theta\to 0$ limit.

,

AMP on i.i.d. vs Structured Matrices

Run AMP on a Bernoulli--Gaussian signal with two sensing matrices at matched $(M,N)$ and matched noise: (i) i.i.d.\ Gaussian and (ii) a sub- sampled DFT matrix of the same dimensions. The i.i.d.\ trajectory tracks the SE prediction; the structured-matrix trajectory diverges or stalls.

Parameters

N1024

\delta

0.5

\rho

0.1

SNR (dB)40

Matrix type

Iterations25

Damped AMP vs Undamped

Illustrate how damping $\theta$ stabilises AMP on an ill-conditioned matrix, at the cost of substantially slower convergence. For $\theta=1$ AMP diverges; for small $\theta$ it converges but may take 10 $\times$ more iterations than the i.i.d. baseline.

Parameters

\theta

0.3

Condition number

\kappa

10

N500

\delta

0.6

Iterations40

Why This Matters: Why OAMP/VAMP? (Chapter 21)

Structured $\mathbf{A}$ matrices are the default in communications: pilot matrices have Toeplitz/block structure, beamforming matrices are unitary by design, and precoder-channel products inherit Kronecker factorisation from the antenna geometry. Damped AMP often works but surrenders the sharp analytical characterisation of state evolution.

Orthogonal AMP (OAMP) and Vector AMP (VAMP) re-establish the Gaussianity-of-pseudo-data property for the broad class of right-rotationally-invariant matrices, which includes unitary/DFT matrices and arbitrary conditioned Gaussian ensembles. They achieve this by replacing the simple transpose $\mathbf{A}^{H}$ in the residual step with an LMMSE operator and enforcing orthogonality between the linear estimator and the prior denoiser. The price is an $O(N^3)$ matrix inverse per iteration (mitigated by Kronecker structure, Chapter 21.3).

Common Mistake: Deploying AMP Without Checking the Matrix

Mistake:

Implementing AMP for a real application (MRI reconstruction, radar, channel estimation) without first verifying that the sensing matrix is sufficiently close to i.i.d.\ Gaussian. AMP is then deployed with the belief that it enjoys the state-evolution guarantees, and mysterious divergence or stalling is blamed on numerics.

Correction:

Before deploying AMP: (i) inspect the singular-value distribution of $\mathbf{A}^{H}\mathbf{A}/M$ and compare to Marchenko--Pastur; (ii) if the match is poor, add damping and reduce expectations, or switch to OAMP/VAMP (Chapter 21); (iii) if divergence persists, fall back to provably-convergent proximal methods (FISTA with line search).

Historical Note: The Great AMP Disappointment (2009--2014)

2009-2019

When AMP was introduced in 2009 it promised a revolution: single-digit- iteration compressed sensing with Bayes-optimal guarantees. Researchers in communications and imaging rushed to deploy it on their problems — only to discover that AMP diverged on sub-sampled Fourier, Hadamard, and most practical sensing operators.

This "disappointment" drove the development of OAMP (Ma--Ping 2017) and VAMP (Rangan--Schniter--Fletcher 2019), which extended the Gaussianity-of-pseudo-data guarantee to right-rotationally-invariant matrices. The episode is a salutary reminder that asymptotic theorems for random matrices do not automatically transfer to the structured matrices of engineering practice.

Key Takeaway

AMP's analytical magic — scalar state evolution, phase transitions, Bayes-optimality — requires i.i.d.\ Gaussian-like sensing. For the structured matrices of real applications the Onsager correction is miscalibrated and AMP diverges. Damping stabilises but slows convergence and breaks state evolution. The principled remedy is OAMP/VAMP (Chapter 21), which generalises the AMP framework to right-rotationally- invariant matrices at the cost of an LMMSE step per iteration.

When AMP Fails: Structured Matrices

Structured Matrices Are the Norm, Not the Exception

Theorem: AMP Divergence for Non-i.i.d. Matrices

Linear perturbation analysis

Spectral radius of the linearisation

Constructing a counter-example

Definition: Damped AMP

Theorem: Damping Helps But Does Not Cure

Stability from contraction mapping

Scaling with condition number

Breakdown of state evolution

AMP on i.i.d. vs Structured Matrices

Parameters

Damped AMP vs Undamped

Parameters

Why This Matters: Why OAMP/VAMP? (Chapter 21)

Common Mistake: Deploying AMP Without Checking the Matrix

Historical Note: The Great AMP Disappointment (2009--2014)

Key Takeaway

Definition:
Damped AMP