AMP Recap and Its Failure for RF Imaging
Why AMP Fails for Imaging β And Why That Matters
Approximate Message Passing (AMP) is the most powerful algorithm for sparse recovery when the sensing matrix has i.i.d. Gaussian entries: it is computationally cheap, admits a precise scalar characterization (state evolution), and achieves Bayes-optimal performance with the right denoiser. Yet the RF imaging sensing matrix is not i.i.d. Gaussian β it has Kronecker structure (), partial DFT rows, and highly non-uniform singular values.
When AMP is applied to such matrices, it diverges. This is not a minor numerical inconvenience: the Onsager correction that makes AMP work for Gaussian matrices produces a catastrophically wrong decorrelation for structured matrices. Understanding this failure is the first step toward OAMP, which we develop in the rest of this chapter.
Definition: AMP Iteration
AMP Iteration
Given the linear model with having i.i.d. entries, AMP iterates:
where is a component-wise denoiser and is the average derivative of the denoiser.
The term is the Onsager correction. Without it, AMP reduces to the iterative soft-thresholding algorithm (ISTA).
Definition: The Onsager Correction
The Onsager Correction
The Onsager correction is the term
where .
Its role is to subtract the contribution of the current estimate to the residual, ensuring that the "effective observation" seen by the denoiser behaves as in the large-system limit β a scalar AWGN channel.
Theorem: State Evolution for AMP
Let have i.i.d. entries, and let be a sequence of Lipschitz denoisers. In the limit with , the per-component MSE of AMP is tracked by the deterministic recursion
where is drawn from the prior and independently. Specifically,
State evolution collapses the entire high-dimensional iteration into a single scalar recursion. At each step, the denoiser faces an equivalent scalar AWGN problem with noise variance . This is why AMP is so powerful for Gaussian matrices: all the complexity of the measurement system is captured by one number.
Key idea: conditional distribution of the effective observation
The proof (Bayati and Montanari, 2011) uses a conditioning technique. Define the effective observation . The Onsager correction ensures that, conditioned on , the vector has the same empirical distribution as where .
Induction on the iteration count
The proof proceeds by induction on . At , and , so . For Gaussian , (up to a scaling), confirming the base case.
The role of the Onsager correction in the inductive step
At step , the Onsager term subtracts the correlation between and the previous denoiser output. Without this term, would have a bias that grows with each iteration. The specific coefficient is chosen so that the bias is exactly cancelled in the large-system limit.
Convergence of the SE recursion
The fixed points of the SE map characterize the achievable MSE. With the Bayes-optimal denoiser, the lowest fixed point equals the information-theoretic MMSE.
Historical Note: From Spin Glasses to Compressed Sensing
1977β2009AMP's Onsager correction traces back to the TAP equations (Thouless, Anderson, and Palmer, 1977) developed for the Sherrington-Kirkpatrick spin glass model. The connection to compressed sensing was made by Donoho, Maleki, and Montanari (2009), who recognized that the factor graph of with Gaussian is structurally identical to the spin-glass factor graph. Kabashima (2003) had independently derived essentially the same algorithm for CDMA multiuser detection, demonstrating the deep connection between statistical physics and communications.
Why AMP Fails for Structured Matrices
The state evolution guarantee β and hence AMP's entire theoretical foundation β rests on the matrix having i.i.d. entries. This ensures two properties:
-
Self-averaging: in spectral norm, so is approximately an unbiased linear observation of .
-
Onsager decorrelation: The scalar correction exactly removes the correlation between and .
For structured matrices β partial DFT, Kronecker products, or the RF imaging operator β neither property holds:
- The eigenvalues of are far from uniform; the Marchenko-Pastur law does not apply.
- The scalar Onsager coefficient is incorrect: the decorrelation depends on the full singular value distribution of , not just the ratio .
As a result, AMP applied to structured matrices produces residuals that are not decorrelated from . The denoiser receives biased input, and the algorithm diverges.
Example: AMP Divergence on a Kronecker Sensing Matrix
Consider the RF imaging forward model with where and are partial DFT matrices (random row selections from the DFT). The product dimensions are , . The reflectivity is Bernoulli-Gaussian with sparsity , and .
Run AMP with soft-thresholding denoiser. Compare with ISTA (AMP without the Onsager term).
Setup
We generate and by selecting 20 random rows from the DFT matrix, then form . The reflectivity is .
AMP diverges
Running AMP for 100 iterations, the NMSE increases after the first few iterations:
| Iteration | NMSE (AMP) | NMSE (ISTA) |
|---|---|---|
| 1 | dB | dB |
| 5 | dB | dB |
| 10 | dB | dB |
| 50 | dB (diverged) | dB |
AMP diverges because the Onsager correction is miscalibrated for the Kronecker DFT matrix. Ironically, ISTA (which has no Onsager correction) converges, although to a suboptimal point.
Root cause
The eigenvalues of range from 0 to values much larger than 1. AMP's Onsager coefficient is calibrated for the Marchenko-Pastur distribution (where eigenvalues cluster near 1 for close to 1). The mismatch causes the residual to grow rather than shrink.
AMP vs OAMP Convergence on RF Sensing Matrices
Compare the NMSE convergence of AMP and OAMP on an RF imaging operator with Kronecker structure. AMP diverges while OAMP converges to a good reconstruction.
Parameters
Common Mistake: Damping Does Not Fix AMP for Structured Matrices
Mistake:
A common attempt to fix AMP divergence is damping: replacing the update with
for some . This can stabilize the iterations and prevent outright divergence.
Correction:
Damping addresses the symptom (divergence) but not the cause (incorrect decorrelation). Damped AMP converges to a suboptimal fixed point β the effective noise seen by the denoiser is not Gaussian with known variance, so state evolution does not hold and the denoiser cannot be optimally tuned.
For non-i.i.d. matrices, one must replace the Onsager correction mechanism entirely. This is what OAMP/VAMP does.
Common Mistake: Complex-Valued AMP Requires Wirtinger Derivatives
Mistake:
When extending AMP to complex-valued signals and measurements (as in RF imaging with ), some implementations use the real-valued Onsager correction formula.
Correction:
The complex Onsager coefficient involves the Wirtinger derivative:
For component-wise denoisers like soft thresholding applied to magnitudes, this equals the fraction of active components β the same as the real case. But for more complex denoisers (e.g., those coupling real and imaginary parts), the distinction matters.
Common Mistake: Check Eigenvalue Distribution Before Running AMP
Mistake:
Running AMP on a measurement matrix without inspecting its singular value distribution, assuming that "close to Gaussian" is good enough.
Correction:
Compute the eigenvalues of and compare with the Marchenko-Pastur distribution. If the empirical distribution deviates significantly β especially if there are eigenvalues near zero or far above the Marchenko-Pastur upper edge β AMP will likely fail. Use OAMP/VAMP instead.
Quick Check
What happens to AMP if the Onsager correction term is removed?
AMP becomes ISTA (iterative soft thresholding), which converges but to a suboptimal point
AMP still converges to the same fixed point but slower
AMP diverges for all matrices
Without the Onsager correction, AMP reduces to ISTA. The residual is correlated with the estimate, so the denoiser receives biased input and cannot achieve Bayes-optimal performance.
Key Takeaway
AMP is the gold standard for sparse recovery with i.i.d. Gaussian sensing matrices, but it diverges for the structured matrices arising in RF imaging. The root cause is that the scalar Onsager correction cannot decorrelate the residual when the singular values of deviate from the Marchenko-Pastur distribution. This motivates OAMP/VAMP, which replaces the scalar correction with a matrix-valued orthogonalization.
Onsager correction
The term in the AMP residual update that subtracts the correlation between the current residual and the previous denoiser output. For i.i.d. Gaussian matrices, this is a scalar multiple of the previous residual: with . Named after Lars Onsager's reciprocal relations in thermodynamics, via the TAP equations of spin glass theory.
Related: AMP Iteration, The Onsager Correction
State evolution
A deterministic scalar recursion that exactly predicts the per-iteration MSE of AMP (or OAMP/VAMP) in the large-system limit. Reduces algorithm analysis to a one-dimensional map.
Related: State Evolution for AMP
Marchenko-Pastur distribution
The limiting spectral distribution of when has i.i.d. entries and . Supported on for , with an additional point mass at 0 for .