AMP from First Principles
Why AMP?
Chapters 17 and 18 established ISTA (Iterative Soft-Thresholding Algorithm) as the canonical first-order method for LASSO-type sparse recovery. ISTA is simple and convergent, but in the proportional asymptotic regime ( with fixed) its per-iteration behaviour does not admit a sharp closed-form characterisation. Each iterate has a residual distribution that drifts away from Gaussianity because of feedback through .
Approximate Message Passing (AMP), introduced by Donoho, Maleki, and Montanari (2009), repairs this defect with a single additive term: the Onsager correction. This term is inherited from the TAP (Thouless- Anderson-Palmer) equations of spin-glass physics. With the correction in place, the effective noise at the denoiser input at every iteration is exactly Gaussian in the large-system limit, which makes the algorithm analytically tractable through a one-dimensional recursion called state evolution.
This section derives AMP as a careful approximation to loopy belief propagation on the dense factor graph of the compressed-sensing problem, explains the role of the Onsager term, and states the algorithm precisely.
Definition: Linear Observation Model
Linear Observation Model
We observe with fixed. The entries of are i.i.d. . The signal has i.i.d. entries from a prior (possibly sparse, e.g. Bernoulli--Gaussian). The goal is to recover from .
The normalisation is the AMP convention and guarantees that by the law of large numbers. Different references scale differently — check before comparing formulas.
Definition: Denoiser
Denoiser
A denoiser is a scalar function , , applied componentwise to a vector input. We require to be Lipschitz and piecewise differentiable in its first argument. The derivative is the sensitivity of the denoiser.
Common choices include soft-thresholding , the posterior mean , and, more recently, neural networks trained for denoising in AWGN.
Onsager correction
A memory term added to the AMP residual. It cancels the bias that loopy feedback through would otherwise inject into the residual, thereby Gaussianising the denoiser input in the large-system limit.
Related: AMP iteration, State Evolution
AMP iteration
The two-line update and that jointly constitute approximate message passing.
Related: Onsager correction, State Evolution
Sketch: From Loopy BP to AMP
Write the posterior as and run sum- product on the bipartite factor graph with variable nodes and factor nodes. Each factor-to-variable message is a convolution of independent distributions; by the CLT, in the large- regime it is approximately Gaussian and is characterised by its mean and variance alone.
Tracking Gaussian messages is still too expensive. The relaxed-BP simplification replaces message by its Taylor expansion around the marginal means, which produces an algorithm of complexity per iteration. However, relaxed BP retains a bias that grows with the effective correlation between iterations.
AMP is obtained by a further, careful cancellation: the Onsager term removes precisely the bias that relaxed BP would accumulate. The full derivation is algebraic and is spelled out in Bayati--Montanari (2011, App. A) and in Rangan's GAMP paper.
Approximate Message Passing (AMP)
Complexity: Per iteration: one matrix-vector product and one product . Total: flops, identical to ISTA.The only structural difference from ISTA is the term in the residual. Drop it and you recover ISTA with gradient step .
What the Onsager Term Actually Does
The key observation is that is not white Gaussian noise added to — it is correlated with through the history of iterates. ISTA ignores this correlation, which is why its per-iteration distribution is not Gaussian and why sharp analytical predictions for ISTA are hard to obtain.
The Onsager term is precisely the first-order correction that removes this correlation. After its inclusion, one can prove (see Section 20.2) that for a deterministic scalar — the pseudo-data looks like the signal plus independent Gaussian noise. This is the property that justifies calling a denoiser.
Common Mistake: Forgetting the Onsager Correction
Mistake:
A common mistake when implementing AMP from scratch is to drop the term — "it looks like ISTA anyway". The resulting algorithm is ISTA in disguise and does not exhibit the sharp phase-transition behaviour that makes AMP theoretically attractive.
Correction:
The Onsager term is the defining feature of AMP, not a bookkeeping detail. It costs one vector scale-and-add per iteration and changes the asymptotic behaviour from loose to tight. Always verify that is recomputed at every iteration as the empirical mean of on the current pseudo-data.
Example: AMP with Soft-Thresholding (L1-AMP)
Specialise AMP to the soft-threshold denoiser and write out the resulting iteration. Compute in closed form.
Denoiser and its derivative
The soft-threshold denoiser is Its derivative is the indicator function , so This is exactly the fraction of components AMP declares non-zero at the current iteration.
L1-AMP iteration
Plugging into the generic AMP iteration: The Onsager coefficient is simply the sparsity level of the current estimate, divided by .
Tuning the threshold
A common choice is where is an empirical estimate of the effective noise variance and is tuned to match the expected sparsity of . Section 20.2 shows how state evolution determines the optimal analytically.
AMP Iteration Trajectory
Run AMP with a soft-threshold denoiser on a synthetic Bernoulli--Gaussian signal and plot the normalised MSE versus iteration . Compare against ISTA (same update, Onsager term removed). For large enough relative to the sparsity, AMP converges in a handful of iterations; ISTA needs hundreds.
Parameters
Signal dimension
Undersampling ratio
Sparsity (fraction of non-zeros)
Signal-to-noise ratio
Soft-Thresholding Denoiser
Visualise together with its derivative . Adjust to see how the dead-zone widens and the Onsager coefficient changes accordingly.
Parameters
Threshold
Implementation Notes for AMP
In practice, AMP is startlingly simple to implement — a clean MATLAB/Python version is about 15 lines. The subtle implementation points are:
- Matrix normalisation. Use . If you use a different normalisation, the Onsager coefficient and the state- evolution formulas change accordingly.
- Threshold schedule. Tuning matters more than the exact denoiser form. Parameter-free AMP (Montanari 2012) sets with (the minimax-optimal value for equal-power sparse vectors).
- Divergence monitoring. Track iteration by iteration. If it stops monotonically decreasing, the matrix is not sufficiently i.i.d. and AMP is diverging — consider damping or switch to OAMP/VAMP (Chapter 21).
- Finite-sample effects. The Gaussianity of the pseudo-data is an asymptotic prediction. For expect noticeable deviations; for the state-evolution prediction tracks empirical MSE to within 1-2%.
- •
Dense with i.i.d. (approximately) Gaussian entries
- •
Per-iteration cost
- •
Requires Lipschitz differentiable denoiser
Historical Note: TAP Equations and Onsager in Spin Glasses
1936-2009The Onsager correction term is named after Lars Onsager, who first identified a reaction-field correction in his 1936 work on polar liquids. Thouless, Anderson, and Palmer re-derived an analogous term in 1977 for the Sherrington--Kirkpatrick (SK) spin-glass model, where it is essential for getting the right mean-field equations on dense, random interaction graphs.
Donoho, Maleki, and Montanari (2009) observed that the compressed-sensing problem with i.i.d. Gaussian has exactly the structure of an SK model — each observation couples to all unknowns with independent random strengths — and imported the TAP construction into signal processing. The name "Approximate Message Passing" comes from the alternative derivation as an approximation to loopy belief propagation on the dense factor graph; the two perspectives give the same algorithm.
Quick Check
What is the primary role of the Onsager term in AMP?
Accelerate convergence by Nesterov-style momentum
Ensure the denoiser input is distributed as the true signal plus independent Gaussian noise
Regularise to enforce sparsity
Guarantee global convergence even for ill-conditioned
Correct. Without the Onsager term the feedback through creates correlation between the residual and the iterate, and the denoiser input is no longer Gaussian. Adding cancels this correlation to leading order.
Quick Check
What is the per-iteration computational complexity of AMP?
, identical to ISTA
because of the Onsager correction
because of a matrix inversion
using FFT
AMP requires only two matrix-vector products per iteration, the same as ISTA. The Onsager coefficient is a single scalar, negligible in comparison.
Key Takeaway
AMP is ISTA plus one extra term. The extra term — the Onsager correction — is what turns a generic proximal-gradient iteration into an algorithm with a tight, one- dimensional, predictable trajectory in the large-system limit.