AMP Recap and Its Failure for RF Imaging

Why AMP Fails for Imaging β€” And Why That Matters

Approximate Message Passing (AMP) is the most powerful algorithm for sparse recovery when the sensing matrix has i.i.d. Gaussian entries: it is computationally cheap, admits a precise scalar characterization (state evolution), and achieves Bayes-optimal performance with the right denoiser. Yet the RF imaging sensing matrix A\mathbf{A} is not i.i.d. Gaussian β€” it has Kronecker structure (A=A1βŠ—A2\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2}), partial DFT rows, and highly non-uniform singular values.

When AMP is applied to such matrices, it diverges. This is not a minor numerical inconvenience: the Onsager correction that makes AMP work for Gaussian matrices produces a catastrophically wrong decorrelation for structured matrices. Understanding this failure is the first step toward OAMP, which we develop in the rest of this chapter.

Definition:

AMP Iteration

Given the linear model y=Ac+w\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w} with A∈RMΓ—N\mathbf{A} \in \mathbb{R}^{M \times N} having i.i.d. N(0,1/M)\mathcal{N}(0, 1/M) entries, AMP iterates:

rt=yβˆ’Ac^t+NM⟨ηtβˆ’1β€²βŸ©β€‰rtβˆ’1,\mathbf{r}^t = \mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^t + \frac{N}{M}\langle \eta'_{t-1} \rangle \, \mathbf{r}^{t-1},

c^t+1=Ξ·t ⁣(ATrt+c^t),\hat{\mathbf{c}}^{t+1} = \eta_t\!\bigl( \mathbf{A}^{\mathsf{T}} \mathbf{r}^t + \hat{\mathbf{c}}^t \bigr),

where Ξ·t:RNβ†’RN\eta_t : \mathbb{R}^N \to \mathbb{R}^N is a component-wise denoiser and ⟨ηtβ€²βŸ©=1Nβˆ‘i=1NΞ·tβ€²([ATrt+c^t]i)\langle \eta'_t \rangle = \frac{1}{N} \sum_{i=1}^N \eta'_t([\mathbf{A}^{\mathsf{T}}\mathbf{r}^t + \hat{\mathbf{c}}^t]_i) is the average derivative of the denoiser.

The term NM⟨ηtβˆ’1β€²βŸ©β€‰rtβˆ’1\frac{N}{M}\langle \eta'_{t-1}\rangle\,\mathbf{r}^{t-1} is the Onsager correction. Without it, AMP reduces to the iterative soft-thresholding algorithm (ISTA).

Definition:

The Onsager Correction

The Onsager correction is the term

bt rtβˆ’1,bt=NM⟨ηtβˆ’1β€²βŸ©=1Mβˆ‘i=1Nβˆ‚[Ξ·tβˆ’1(qtβˆ’1)]iβˆ‚qitβˆ’1,b^t \, \mathbf{r}^{t-1}, \qquad b^t = \frac{N}{M}\langle \eta'_{t-1} \rangle = \frac{1}{M}\sum_{i=1}^N \frac{\partial [\eta_{t-1}(\mathbf{q}^{t-1})]_i} {\partial q_i^{t-1}},

where qtβˆ’1=ATrtβˆ’1+c^tβˆ’1\mathbf{q}^{t-1} = \mathbf{A}^{\mathsf{T}}\mathbf{r}^{t-1} + \hat{\mathbf{c}}^{t-1}.

Its role is to subtract the contribution of the current estimate to the residual, ensuring that the "effective observation" ATrt+c^t\mathbf{A}^{\mathsf{T}}\mathbf{r}^t + \hat{\mathbf{c}}^t seen by the denoiser behaves as c+N(0,Ο„tI)\mathbf{c} + \mathcal{N}(\mathbf{0}, \tau_t \mathbf{I}) in the large-system limit β€” a scalar AWGN channel.

Theorem: State Evolution for AMP

Let A∈RMΓ—N\mathbf{A} \in \mathbb{R}^{M \times N} have i.i.d. N(0,1/M)\mathcal{N}(0, 1/M) entries, and let Ξ·t\eta_t be a sequence of Lipschitz denoisers. In the limit M,Nβ†’βˆžM, N \to \infty with M/Nβ†’Ξ΄M/N \to \delta, the per-component MSE of AMP is tracked by the deterministic recursion

Ο„t+1=Οƒ2+1δ E[∣ηt(C0+Ο„t Z)βˆ’C0∣2],\tau_{t+1} = \sigma^2 + \frac{1}{\delta}\,\mathbb{E}\bigl[ |\eta_t(C_0 + \sqrt{\tau_t}\,Z) - C_0|^2\bigr],

where C0∼p0C_0 \sim p_0 is drawn from the prior and Z∼N(0,1)Z \sim \mathcal{N}(0, 1) independently. Specifically,

1Nβˆ₯c^tβˆ’cβˆ₯2β†’a.s.E[∣ηt(C0+Ο„t Z)βˆ’C0∣2].\frac{1}{N}\|\hat{\mathbf{c}}^t - \mathbf{c}\|^2 \xrightarrow{\text{a.s.}} \mathbb{E}\bigl[|\eta_t(C_0 + \sqrt{\tau_t}\,Z) - C_0|^2\bigr].

State evolution collapses the entire high-dimensional iteration into a single scalar recursion. At each step, the denoiser faces an equivalent scalar AWGN problem with noise variance Ο„t\tau_t. This is why AMP is so powerful for Gaussian matrices: all the complexity of the measurement system is captured by one number.

Historical Note: From Spin Glasses to Compressed Sensing

1977–2009

AMP's Onsager correction traces back to the TAP equations (Thouless, Anderson, and Palmer, 1977) developed for the Sherrington-Kirkpatrick spin glass model. The connection to compressed sensing was made by Donoho, Maleki, and Montanari (2009), who recognized that the factor graph of y=Ac+w\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w} with Gaussian A\mathbf{A} is structurally identical to the spin-glass factor graph. Kabashima (2003) had independently derived essentially the same algorithm for CDMA multiuser detection, demonstrating the deep connection between statistical physics and communications.

Why AMP Fails for Structured Matrices

The state evolution guarantee β€” and hence AMP's entire theoretical foundation β€” rests on the matrix A\mathbf{A} having i.i.d. entries. This ensures two properties:

  1. Self-averaging: ATAβ‰ˆI\mathbf{A}^{\mathsf{T}}\mathbf{A} \approx \mathbf{I} in spectral norm, so ATrt\mathbf{A}^{\mathsf{T}}\mathbf{r}^t is approximately an unbiased linear observation of c\mathbf{c}.

  2. Onsager decorrelation: The scalar correction NM⟨ηtβ€²βŸ©\frac{N}{M}\langle \eta'_t \rangle exactly removes the correlation between rt\mathbf{r}^t and c^t\hat{\mathbf{c}}^t.

For structured matrices β€” partial DFT, Kronecker products, or the RF imaging operator A=A1βŠ—A2\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2} β€” neither property holds:

  • The eigenvalues of AHA\mathbf{A}^{H}\mathbf{A} are far from uniform; the Marchenko-Pastur law does not apply.
  • The scalar Onsager coefficient NM⟨ηtβ€²βŸ©\frac{N}{M}\langle\eta'_t \rangle is incorrect: the decorrelation depends on the full singular value distribution of A\mathbf{A}, not just the ratio N/MN/M.

As a result, AMP applied to structured matrices produces residuals rt\mathbf{r}^t that are not decorrelated from c^t\hat{\mathbf{c}}^t. The denoiser receives biased input, and the algorithm diverges.

Example: AMP Divergence on a Kronecker Sensing Matrix

Consider the RF imaging forward model with A=A1βŠ—A2\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2} where A1∈C20Γ—32\mathbf{A}_{1} \in \mathbb{C}^{20 \times 32} and A2∈C20Γ—32\mathbf{A}_{2} \in \mathbb{C}^{20 \times 32} are partial DFT matrices (random row selections from the DFT). The product dimensions are M=400M = 400, N=1024N = 1024. The reflectivity c\mathbf{c} is Bernoulli-Gaussian with sparsity ρ=0.1\rho = 0.1, and SNR=25 dB\text{SNR} = 25\,\text{dB}.

Run AMP with soft-thresholding denoiser. Compare with ISTA (AMP without the Onsager term).

AMP vs OAMP Convergence on RF Sensing Matrices

Compare the NMSE convergence of AMP and OAMP on an RF imaging operator with Kronecker structure. AMP diverges while OAMP converges to a good reconstruction.

Parameters
32
32
0.4
0.1
25

Common Mistake: Damping Does Not Fix AMP for Structured Matrices

Mistake:

A common attempt to fix AMP divergence is damping: replacing the update c^t+1=Ξ·t(qt)\hat{\mathbf{c}}^{t+1} = \eta_t(\mathbf{q}^t) with

c^t+1=α ηt(qt)+(1βˆ’Ξ±) c^t\hat{\mathbf{c}}^{t+1} = \alpha\,\eta_t(\mathbf{q}^t) + (1 - \alpha)\,\hat{\mathbf{c}}^t

for some α∈(0,1)\alpha \in (0, 1). This can stabilize the iterations and prevent outright divergence.

Correction:

Damping addresses the symptom (divergence) but not the cause (incorrect decorrelation). Damped AMP converges to a suboptimal fixed point β€” the effective noise seen by the denoiser is not Gaussian with known variance, so state evolution does not hold and the denoiser cannot be optimally tuned.

For non-i.i.d. matrices, one must replace the Onsager correction mechanism entirely. This is what OAMP/VAMP does.

Common Mistake: Complex-Valued AMP Requires Wirtinger Derivatives

Mistake:

When extending AMP to complex-valued signals and measurements (as in RF imaging with c∈CN\mathbf{c} \in \mathbb{C}^N), some implementations use the real-valued Onsager correction formula.

Correction:

The complex Onsager coefficient involves the Wirtinger derivative:

bt=1Mβˆ‘i=1Nβˆ‚[Ξ·t(qt)]iβˆ‚qΛ‰it.b^t = \frac{1}{M}\sum_{i=1}^N \frac{\partial [\eta_t(\mathbf{q}^t)]_i} {\partial \bar{q}_i^t}.

For component-wise denoisers like soft thresholding applied to magnitudes, this equals the fraction of active components β€” the same as the real case. But for more complex denoisers (e.g., those coupling real and imaginary parts), the distinction matters.

Common Mistake: Check Eigenvalue Distribution Before Running AMP

Mistake:

Running AMP on a measurement matrix without inspecting its singular value distribution, assuming that "close to Gaussian" is good enough.

Correction:

Compute the eigenvalues of AHA\mathbf{A}^{H}\mathbf{A} and compare with the Marchenko-Pastur distribution. If the empirical distribution deviates significantly β€” especially if there are eigenvalues near zero or far above the Marchenko-Pastur upper edge (1+1/Ξ΄)2(1 + 1/\sqrt{\delta})^2 β€” AMP will likely fail. Use OAMP/VAMP instead.

Quick Check

What happens to AMP if the Onsager correction term NM⟨ηtβ€²βŸ©β€‰rtβˆ’1\frac{N}{M}\langle\eta'_t\rangle\,\mathbf{r}^{t-1} is removed?

AMP becomes ISTA (iterative soft thresholding), which converges but to a suboptimal point

AMP still converges to the same fixed point but slower

AMP diverges for all matrices

Key Takeaway

AMP is the gold standard for sparse recovery with i.i.d. Gaussian sensing matrices, but it diverges for the structured matrices arising in RF imaging. The root cause is that the scalar Onsager correction cannot decorrelate the residual when the singular values of A\mathbf{A} deviate from the Marchenko-Pastur distribution. This motivates OAMP/VAMP, which replaces the scalar correction with a matrix-valued orthogonalization.

Onsager correction

The term in the AMP residual update that subtracts the correlation between the current residual and the previous denoiser output. For i.i.d. Gaussian matrices, this is a scalar multiple of the previous residual: btrtβˆ’1b^t \mathbf{r}^{t-1} with bt=NM⟨ηtβˆ’1β€²βŸ©b^t = \frac{N}{M}\langle\eta'_{t-1}\rangle. Named after Lars Onsager's reciprocal relations in thermodynamics, via the TAP equations of spin glass theory.

Related: AMP Iteration, The Onsager Correction

State evolution

A deterministic scalar recursion that exactly predicts the per-iteration MSE of AMP (or OAMP/VAMP) in the large-system limit. Reduces algorithm analysis to a one-dimensional map.

Related: State Evolution for AMP

Marchenko-Pastur distribution

The limiting spectral distribution of 1MAHA\frac{1}{M}\mathbf{A}^{H} \mathbf{A} when A\mathbf{A} has i.i.d. entries and M/Nβ†’Ξ΄M/N \to \delta. Supported on [(1βˆ’1/Ξ΄)2,(1+1/Ξ΄)2][(1-1/\sqrt{\delta})^2, (1+1/\sqrt{\delta})^2] for Ξ΄β‰₯1\delta \geq 1, with an additional point mass at 0 for Ξ΄<1\delta < 1.

Related: Why AMP Fails for Structured Matrices