VAMP β Vector Approximate Message Passing
VAMP: AMP from a Graph-Theoretic Viewpoint
OAMP was derived by fixing AMP's linear step. VAMP arrives at the same algorithm from the opposite direction: it starts from the factor graph of the estimation problem and writes down the expectation-consistency equations that two local estimators must satisfy to agree on the posterior moments.
Why two derivations of the same algorithm? Because each viewpoint makes a different property transparent. OAMP explains why orthogonality is enforced. VAMP explains what the algorithm is doing β passing messages between a "linear" node (which knows the measurement model ) and a "prior" node (which knows ). Each node computes a local posterior, matches moments, and sends the result back. The fixed-point of this exchange is the OAMP estimate.
This view also makes the generalization to GAMP, learned VAMP, and multi-layer VAMP very natural β you just change the nodes.
Definition: Two-Node Factorization
Two-Node Factorization
VAMP represents the posterior as a product of two factors and introduces an auxiliary variable with the constraint . Writing the joint density as
we get a chain with two factor nodes (prior and likelihood) and one equality constraint. VAMP passes Gaussian messages between these nodes and enforces expectation consistency β the two local posterior beliefs must share the same mean and variance.
The delta function looks contrived but it is the standard trick that turns a monolithic inference problem into a two-node message-passing problem, enabling separate treatment of the prior and the linear observation.
Definition: Expectation Consistency (EC)
Expectation Consistency (EC)
Given two candidate approximate posteriors (from the prior node) and (from the likelihood node), both Gaussian with means and covariances , expectation consistency requires
The combined belief has precision (the precisions add) and is the moment-matched target for the next message update.
EC is the principle that links the two derivations: OAMP's orthogonality is the precision-addition rule under the Gaussian approximation. The two "errors" β one from each node β are independent by construction because each node ignores the information that will come from the other side.
Definition: VAMP Algorithm
VAMP Algorithm
Let be the SVD of the sensing matrix. Initialize , . For :
The two local estimators are:
- (prior MMSE denoiser);
- (LMMSE with pseudo-prior ).
The scalar is the average sensitivity of the local estimator, and the formulas for implement the Gaussian message-subtraction step that makes the next input extrinsic.
Theorem: VAMP State Evolution
For right-rotationally-invariant with limiting spectrum , separable prior with finite variance, and Lipschitz-continuous denoiser , the VAMP iterates satisfy, in the large-system limit,
with scalar variances evolving according to a deterministic recursion depending only on the spectrum , the noise level , and the denoiser MSE curve. The fixed-point of this recursion coincides with the replica-predicted Bayes-optimal MMSE when is the posterior-mean denoiser.
State evolution for VAMP is again a one-dimensional recursion β that is the whole point. The RRI assumption gives us the rotational symmetry needed to reduce a high-dimensional iterative algorithm to a scalar fixed-point equation, which then predicts the algorithm's performance without running it.
Reduce to scalar coordinates via SVD
Express every iterate in the SVD basis of . Because is Haar, the prior-node messages arrive in a uniformly random basis relative to the likelihood-node coordinates; the problem decouples into independent scalar channels along the singular directions.
Track precision propagation
At each stage, the outgoing precision is , which is the Gaussian-message subtraction formula. Summing contributions across singular values gives the scalar MSE recursion.
Close the loop with the denoiser
The denoiser MSE closes the fixed-point equation. Lipschitz continuity of plus the Haar property of are precisely the hypotheses needed for rigorous convergence of the empirical trajectory to the state-evolution prediction.
Example: VAMP as Message Passing on a Scalar Channel
Take the simplest scalar case: , , , . Show that VAMP reduces to the standard MMSE scalar denoiser and that its two-step message passing is exactly Bayes' rule applied twice.
Identify the two factors
The prior node carries , and the likelihood node carries . Expectation consistency requires matching the two local posteriors' mean and variance, which in this one-dimensional case gives Bayes' rule exactly: .
Run the VAMP update
Starting from , the denoiser produces , the precision-subtraction gives , and the LMMSE node combines this with the scalar measurement to give .
Verify the fixed point
At convergence the two posteriors agree, so . In this scalar problem convergence is immediate and the fixed point is the MMSE estimate. The two-node dance is unnecessary here β but it shows that VAMP is a consistent Bayes engine that reduces correctly in the easy case.
VAMP State Evolution vs Empirical MSE
Run VAMP on a synthetic right-rotationally-invariant sensing matrix with a Bernoulli-Gaussian prior and compare the empirical MSE trajectory to the scalar state-evolution prediction. The two curves should overlay β this is the headline guarantee of the chapter.
Parameters
OAMP and VAMP Are the Same Algorithm
Up to rescaling conventions, OAMP and VAMP produce identical iterates. OAMP derives the LMMSE filter from orthogonality; VAMP derives it from expectation consistency under a Gaussian approximation. The two derivations terminate at the same fixed-point equations because orthogonality of two zero-mean Gaussian errors is equivalent to additivity of their precisions.
The practical consequence: any implementation that computes efficiently is simultaneously an OAMP and a VAMP implementation. Choose the derivation that is more convenient for the extension you want to build β orthogonality for physics-motivated analysis, or message passing for graph-structured models and learned variants.
Damping and Numerical Stability
Even though VAMP is provably convergent under the RRI assumption, finite- implementations can oscillate, particularly when the LMMSE precision becomes very small (flat directions of ) or very large (well-observed directions). Standard practice is to damp the precision updates:
with . One also clamps to a small positive number (e.g., ) to avoid division blowup when the denoiser is nearly constant. These tweaks do not change the fixed-point but vastly improve the transient behavior on finite-dimensional instances.
- β’
Precision clamping:
- β’
Damping factor:
AMP vs OAMP vs VAMP
| Property | AMP | OAMP / VAMP |
|---|---|---|
| Linear step | Matched filter | LMMSE |
| Per-iter cost (dense ) | or via SVD | |
| Matrix class (provable) | i.i.d. sub-Gaussian | Right-rotationally invariant |
| State evolution | Valid (i.i.d. case) | Valid (RRI case) |
| Requires SVD / inverse | No | Yes (or iterative solve) |
| Bayes-optimal fixed point | Yes (if SE converges) | Yes (if SE converges) |
| Onsager correction | Explicit term | Absorbed into normalization |
| Typical imaging use | Demos on i.i.d. Gaussian | Structured / physical |
Common Mistake: Assuming Any Structured Matrix Is RRI
Mistake:
Assuming that because VAMP works for partial-DFT and random-unitary sensing, it will work for any structured matrix β including, say, near-field imaging operators with strong spatial correlations.
Correction:
RRI is a specific statistical assumption on the right singular basis. Imaging operators constructed from physical propagation typically have highly structured right singular vectors that are not Haar-distributed. On such operators, VAMP's state evolution is no longer exact, and empirical MSE can deviate noticeably from the predicted curve. When this happens, either (a) apply a random unitary pre-rotation to make the effective operator closer to RRI, or (b) use multi-layer VAMP / learned VAMP, which tolerates structured mismatches.
VAMP (Vector AMP)
A message-passing algorithm for linear inverse problems that alternates between a denoising step (prior node) and an LMMSE step (likelihood node), enforcing expectation consistency between the two local beliefs. Equivalent to OAMP and provably correct for the right-rotationally-invariant class of sensing matrices.
Related: OAMP (Orthogonal AMP), Expectation consistency
Expectation consistency
A principle for approximate inference in factor graphs that requires all local Gaussian beliefs on a shared variable to agree on mean and variance. When satisfied, the resulting fixed point is a stationary point of a free-energy functional and, under the right symmetry assumptions, coincides with the Bayes-optimal posterior.
Related: VAMP (Vector AMP)
Historical Note: Expectation Consistency and the VAMP Genealogy
2005-2017Opper and Winther introduced expectation consistency as a refinement of expectation propagation around 2005. Manfred Opper observed that the precision of a belief is additive across messages in a Gaussian approximation β a consequence of the entropy functional being quadratic in the sufficient statistics.
The SchniterβRanganβFletcher team's 2017 VAMP paper took this principle and combined it with the AMP framework, producing an algorithm that was immediately recognized as equivalent to Ma and Ping's OAMP. Sundeep Rangan later remarked that "the two derivations don't look alike, but the iteration is the same to the last trigonometric identity" β a common phenomenon in message passing where multiple paths lead to the same fixed-point equations.
Quick Check
In VAMP, what role does the auxiliary variable and the constraint play?
It introduces additional measurements to over-determine the system.
It splits the inference problem into two local subproblems that exchange Gaussian messages.
It converts a non-convex problem into a convex one.
It enforces sparsity on to match the signal's prior.
The delta-function constraint decouples the prior from the likelihood, allowing each node to compute a local posterior. Expectation consistency then reconciles the two beliefs.