Ferkans — Interactive Telecom Tutor

Why AMP Fails for Imaging — And Why That Matters

Approximate Message Passing (AMP) is the most powerful algorithm for sparse recovery when the sensing matrix has i.i.d. Gaussian entries: it is computationally cheap, admits a precise scalar characterization (state evolution), and achieves Bayes-optimal performance with the right denoiser. Yet the RF imaging sensing matrix $\mathbf{A}$ is not i.i.d. Gaussian — it has Kronecker structure ( $\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2}$ ), partial DFT rows, and highly non-uniform singular values.

When AMP is applied to such matrices, it diverges. This is not a minor numerical inconvenience: the Onsager correction that makes AMP work for Gaussian matrices produces a catastrophically wrong decorrelation for structured matrices. Understanding this failure is the first step toward OAMP, which we develop in the rest of this chapter.

Definition:
AMP Iteration

Given the linear model $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ with $\mathbf{A} \in \mathbb{R}^{M \times N}$ having i.i.d. $\mathcal{N}(0, 1/M)$ entries, AMP iterates:

$\mathbf{r}^t = \mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^t + \frac{N}{M}\langle \eta'_{t-1} \rangle \, \mathbf{r}^{t-1},$

$\hat{\mathbf{c}}^{t+1} = \eta_t\!\bigl( \mathbf{A}^{\mathsf{T}} \mathbf{r}^t + \hat{\mathbf{c}}^t \bigr),$

where $\eta_t : \mathbb{R}^N \to \mathbb{R}^N$ is a component-wise denoiser and $\langle \eta'_t \rangle = \frac{1}{N} \sum_{i=1}^N \eta'_t([\mathbf{A}^{\mathsf{T}}\mathbf{r}^t + \hat{\mathbf{c}}^t]_i)$ is the average derivative of the denoiser.

The term $\frac{N}{M}\langle \eta'_{t-1}\rangle\,\mathbf{r}^{t-1}$ is the Onsager correction. Without it, AMP reduces to the iterative soft-thresholding algorithm (ISTA).

Definition:
The Onsager Correction

The Onsager correction is the term

$b^t \, \mathbf{r}^{t-1}, \qquad b^t = \frac{N}{M}\langle \eta'_{t-1} \rangle = \frac{1}{M}\sum_{i=1}^N \frac{\partial [\eta_{t-1}(\mathbf{q}^{t-1})]_i} {\partial q_i^{t-1}},$

where $\mathbf{q}^{t-1} = \mathbf{A}^{\mathsf{T}}\mathbf{r}^{t-1} + \hat{\mathbf{c}}^{t-1}$ .

Its role is to subtract the contribution of the current estimate to the residual, ensuring that the "effective observation" $\mathbf{A}^{\mathsf{T}}\mathbf{r}^t + \hat{\mathbf{c}}^t$ seen by the denoiser behaves as $\mathbf{c} + \mathcal{N}(\mathbf{0}, \tau_t \mathbf{I})$ in the large-system limit — a scalar AWGN channel.

Theorem: State Evolution for AMP

Let $\mathbf{A} \in \mathbb{R}^{M \times N}$ have i.i.d. $\mathcal{N}(0, 1/M)$ entries, and let $\eta_t$ be a sequence of Lipschitz denoisers. In the limit $M, N \to \infty$ with $M/N \to \delta$ , the per-component MSE of AMP is tracked by the deterministic recursion

$\tau_{t+1} = \sigma^2 + \frac{1}{\delta}\,\mathbb{E}\bigl[ |\eta_t(C_0 + \sqrt{\tau_t}\,Z) - C_0|^2\bigr],$

where $C_0 \sim p_0$ is drawn from the prior and $Z \sim \mathcal{N}(0, 1)$ independently. Specifically,

$\frac{1}{N}\|\hat{\mathbf{c}}^t - \mathbf{c}\|^2 \xrightarrow{\text{a.s.}} \mathbb{E}\bigl[|\eta_t(C_0 + \sqrt{\tau_t}\,Z) - C_0|^2\bigr].$

State evolution collapses the entire high-dimensional iteration into a single scalar recursion. At each step, the denoiser faces an equivalent scalar AWGN problem with noise variance $\tau_t$ . This is why AMP is so powerful for Gaussian matrices: all the complexity of the measurement system is captured by one number.

Proof

Key idea: conditional distribution of the effective observation

The proof (Bayati and Montanari, 2011) uses a conditioning technique. Define the effective observation $\mathbf{q}^t = \mathbf{A}^{\mathsf{T}}\mathbf{r}^t + \hat{\mathbf{c}}^t$ . The Onsager correction ensures that, conditioned on $\mathbf{c}$ , the vector $\mathbf{q}^t$ has the same empirical distribution as $\mathbf{c} + \sqrt{\tau_t}\mathbf{z}$ where $\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ .

Induction on the iteration count

The proof proceeds by induction on $t$ . At $t = 0$ , $\hat{\mathbf{c}}^0 = \mathbf{0}$ and $\mathbf{r}^0 = \mathbf{y}$ , so $\mathbf{A}^{\mathsf{T}}\mathbf{r}^0 = \mathbf{A}^{\mathsf{T}}\mathbf{A}\mathbf{c} + \mathbf{A}^{\mathsf{T}}\mathbf{w}$ . For Gaussian $\mathbf{A}$ , $\mathbf{A}^{\mathsf{T}}\mathbf{A} \approx \mathbf{I}$ (up to a scaling), confirming the base case.

The role of the Onsager correction in the inductive step

At step $t+1$ , the Onsager term subtracts the correlation between $\mathbf{r}^t$ and the previous denoiser output. Without this term, $\mathbf{q}^{t+1}$ would have a bias that grows with each iteration. The specific coefficient $\frac{N}{M}\langle \eta'_t \rangle$ is chosen so that the bias is exactly cancelled in the large-system limit.

Convergence of the SE recursion

The fixed points of the SE map $g(\tau) = \sigma^2 + \frac{1}{\delta} \text{mmse}(\eta_t, \tau)$ characterize the achievable MSE. With the Bayes-optimal denoiser, the lowest fixed point equals the information-theoretic MMSE. $\blacksquare$

Historical Note: From Spin Glasses to Compressed Sensing

1977–2009

AMP's Onsager correction traces back to the TAP equations (Thouless, Anderson, and Palmer, 1977) developed for the Sherrington-Kirkpatrick spin glass model. The connection to compressed sensing was made by Donoho, Maleki, and Montanari (2009), who recognized that the factor graph of $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ with Gaussian $\mathbf{A}$ is structurally identical to the spin-glass factor graph. Kabashima (2003) had independently derived essentially the same algorithm for CDMA multiuser detection, demonstrating the deep connection between statistical physics and communications.

Why AMP Fails for Structured Matrices

The state evolution guarantee — and hence AMP's entire theoretical foundation — rests on the matrix $\mathbf{A}$ having i.i.d. entries. This ensures two properties:

Self-averaging: $\mathbf{A}^{\mathsf{T}}\mathbf{A} \approx \mathbf{I}$ in spectral norm, so $\mathbf{A}^{\mathsf{T}}\mathbf{r}^t$ is approximately an unbiased linear observation of $\mathbf{c}$ .
Onsager decorrelation: The scalar correction $\frac{N}{M}\langle \eta'_t \rangle$ exactly removes the correlation between $\mathbf{r}^t$ and $\hat{\mathbf{c}}^t$ .

For structured matrices — partial DFT, Kronecker products, or the RF imaging operator $\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2}$ — neither property holds:

The eigenvalues of $\mathbf{A}^{H}\mathbf{A}$ are far from uniform; the Marchenko-Pastur law does not apply.
The scalar Onsager coefficient $\frac{N}{M}\langle\eta'_t \rangle$ is incorrect: the decorrelation depends on the full singular value distribution of $\mathbf{A}$ , not just the ratio $N/M$ .

As a result, AMP applied to structured matrices produces residuals $\mathbf{r}^t$ that are not decorrelated from $\hat{\mathbf{c}}^t$ . The denoiser receives biased input, and the algorithm diverges.

Example: AMP Divergence on a Kronecker Sensing Matrix

Consider the RF imaging forward model with $\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2}$ where $\mathbf{A}_{1} \in \mathbb{C}^{20 \times 32}$ and $\mathbf{A}_{2} \in \mathbb{C}^{20 \times 32}$ are partial DFT matrices (random row selections from the DFT). The product dimensions are $M = 400$ , $N = 1024$ . The reflectivity $\mathbf{c}$ is Bernoulli-Gaussian with sparsity $\rho = 0.1$ , and $\text{SNR} = 25\,\text{dB}$ .

Run AMP with soft-thresholding denoiser. Compare with ISTA (AMP without the Onsager term).

Solution

Setup

We generate $\mathbf{A}_{1}$ and $\mathbf{A}_{2}$ by selecting 20 random rows from the $32 \times 32$ DFT matrix, then form $\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2}$ . The reflectivity is $c_i \sim (1-\rho)\delta(c) + \rho\, \mathcal{CN}(0, 1)$ .

AMP diverges

Running AMP for 100 iterations, the NMSE increases after the first few iterations:

Iteration	NMSE (AMP)	NMSE (ISTA)
1	$-3.1$ dB	$-3.1$ dB
5	$-5.2$ dB	$-7.8$ dB
10	$+2.7$ dB	$-9.4$ dB
50	$> 0$ dB (diverged)	$-10.1$ dB

AMP diverges because the Onsager correction is miscalibrated for the Kronecker DFT matrix. Ironically, ISTA (which has no Onsager correction) converges, although to a suboptimal point.

Root cause

The eigenvalues of $\mathbf{A}^{H}\mathbf{A} = (\mathbf{A}_{1}^{H}\mathbf{A}_{1}) \otimes (\mathbf{A}_{2}^{H}\mathbf{A}_{2})$ range from 0 to values much larger than 1. AMP's Onsager coefficient $\frac{N}{M}\langle\eta'_t\rangle$ is calibrated for the Marchenko-Pastur distribution (where eigenvalues cluster near 1 for $\delta$ close to 1). The mismatch causes the residual to grow rather than shrink.

AMP vs OAMP Convergence on RF Sensing Matrices

Compare the NMSE convergence of AMP and OAMP on an RF imaging operator with Kronecker structure. AMP diverges while OAMP converges to a good reconstruction.

Parameters

N_1

(factor dimension 1)32

N_2

(factor dimension 2)32

\delta = M/N

0.4

Sparsity

\rho

0.1

SNR (dB)25

Matrix type

Common Mistake: Damping Does Not Fix AMP for Structured Matrices

Mistake:

A common attempt to fix AMP divergence is damping: replacing the update $\hat{\mathbf{c}}^{t+1} = \eta_t(\mathbf{q}^t)$ with

$\hat{\mathbf{c}}^{t+1} = \alpha\,\eta_t(\mathbf{q}^t) + (1 - \alpha)\,\hat{\mathbf{c}}^t$

for some $\alpha \in (0, 1)$ . This can stabilize the iterations and prevent outright divergence.

Correction:

Damping addresses the symptom (divergence) but not the cause (incorrect decorrelation). Damped AMP converges to a suboptimal fixed point — the effective noise seen by the denoiser is not Gaussian with known variance, so state evolution does not hold and the denoiser cannot be optimally tuned.

For non-i.i.d. matrices, one must replace the Onsager correction mechanism entirely. This is what OAMP/VAMP does.

Common Mistake: Complex-Valued AMP Requires Wirtinger Derivatives

Mistake:

When extending AMP to complex-valued signals and measurements (as in RF imaging with $\mathbf{c} \in \mathbb{C}^N$ ), some implementations use the real-valued Onsager correction formula.

Correction:

The complex Onsager coefficient involves the Wirtinger derivative:

$b^t = \frac{1}{M}\sum_{i=1}^N \frac{\partial [\eta_t(\mathbf{q}^t)]_i} {\partial \bar{q}_i^t}.$

For component-wise denoisers like soft thresholding applied to magnitudes, this equals the fraction of active components — the same as the real case. But for more complex denoisers (e.g., those coupling real and imaginary parts), the distinction matters.

Common Mistake: Check Eigenvalue Distribution Before Running AMP

Mistake:

Running AMP on a measurement matrix without inspecting its singular value distribution, assuming that "close to Gaussian" is good enough.

Correction:

Compute the eigenvalues of $\mathbf{A}^{H}\mathbf{A}$ and compare with the Marchenko-Pastur distribution. If the empirical distribution deviates significantly — especially if there are eigenvalues near zero or far above the Marchenko-Pastur upper edge $(1 + 1/\sqrt{\delta})^2$ — AMP will likely fail. Use OAMP/VAMP instead.

Quick Check

What happens to AMP if the Onsager correction term $\frac{N}{M}\langle\eta'_t\rangle\,\mathbf{r}^{t-1}$ is removed?

AMP becomes ISTA (iterative soft thresholding), which converges but to a suboptimal point

AMP still converges to the same fixed point but slower

AMP diverges for all matrices

Correction:

AMP becomes ISTA (iterative soft thresholding), which converges but to a suboptimal point

Without the Onsager correction, AMP reduces to ISTA. The residual is correlated with the estimate, so the denoiser receives biased input and cannot achieve Bayes-optimal performance.

Key Takeaway

AMP is the gold standard for sparse recovery with i.i.d. Gaussian sensing matrices, but it diverges for the structured matrices arising in RF imaging. The root cause is that the scalar Onsager correction cannot decorrelate the residual when the singular values of $\mathbf{A}$ deviate from the Marchenko-Pastur distribution. This motivates OAMP/VAMP, which replaces the scalar correction with a matrix-valued orthogonalization.

Onsager correction

The term in the AMP residual update that subtracts the correlation between the current residual and the previous denoiser output. For i.i.d. Gaussian matrices, this is a scalar multiple of the previous residual: $b^t \mathbf{r}^{t-1}$ with $b^t = \frac{N}{M}\langle\eta'_{t-1}\rangle$ . Named after Lars Onsager's reciprocal relations in thermodynamics, via the TAP equations of spin glass theory.

State evolution

A deterministic scalar recursion that exactly predicts the per-iteration MSE of AMP (or OAMP/VAMP) in the large-system limit. Reduces algorithm analysis to a one-dimensional map.

Related: State Evolution for AMP

Marchenko-Pastur distribution

The limiting spectral distribution of $\frac{1}{M}\mathbf{A}^{H} \mathbf{A}$ when $\mathbf{A}$ has i.i.d. entries and $M/N \to \delta$ . Supported on $[(1-1/\sqrt{\delta})^2, (1+1/\sqrt{\delta})^2]$ for $\delta \geq 1$ , with an additional point mass at 0 for $\delta < 1$ .

AMP Recap and Its Failure for RF Imaging

Why AMP Fails for Imaging — And Why That Matters

Definition: AMP Iteration

Definition: The Onsager Correction

Theorem: State Evolution for AMP

Key idea: conditional distribution of the effective observation

Induction on the iteration count

The role of the Onsager correction in the inductive step

Convergence of the SE recursion

Historical Note: From Spin Glasses to Compressed Sensing

Why AMP Fails for Structured Matrices

Example: AMP Divergence on a Kronecker Sensing Matrix

Setup

AMP diverges

Root cause

AMP vs OAMP Convergence on RF Sensing Matrices

Parameters

Common Mistake: Damping Does Not Fix AMP for Structured Matrices

Common Mistake: Complex-Valued AMP Requires Wirtinger Derivatives

Common Mistake: Check Eigenvalue Distribution Before Running AMP

Quick Check

Key Takeaway

Onsager correction

State evolution

Marchenko-Pastur distribution

Definition:
AMP Iteration

Definition:
The Onsager Correction