VAMP β€” Vector Approximate Message Passing

VAMP: AMP from a Graph-Theoretic Viewpoint

OAMP was derived by fixing AMP's linear step. VAMP arrives at the same algorithm from the opposite direction: it starts from the factor graph of the estimation problem and writes down the expectation-consistency equations that two local estimators must satisfy to agree on the posterior moments.

Why two derivations of the same algorithm? Because each viewpoint makes a different property transparent. OAMP explains why orthogonality is enforced. VAMP explains what the algorithm is doing β€” passing messages between a "linear" node (which knows the measurement model \ntnobs=Ax+w\ntn{obs} = \mathbf{A}\mathbf{x} + \mathbf{w}) and a "prior" node (which knows pXp_X). Each node computes a local posterior, matches moments, and sends the result back. The fixed-point of this exchange is the OAMP estimate.

This view also makes the generalization to GAMP, learned VAMP, and multi-layer VAMP very natural β€” you just change the nodes.

Definition:

Two-Node Factorization

VAMP represents the posterior p(x∣\ntnobs)∝pX(x) p(\ntnobs∣x)p(\mathbf{x}|\ntn{obs}) \propto p_X(\mathbf{x})\,p(\ntn{obs}|\mathbf{x}) as a product of two factors and introduces an auxiliary variable x2\mathbf{x}_2 with the constraint x1=x2\mathbf{x}_1 = \mathbf{x}_2. Writing the joint density as

p(x1,x2∣\ntnobs)∝pX(x1) δ(x1βˆ’x2) p(\ntnobs∣x2),p(\mathbf{x}_1,\mathbf{x}_2|\ntn{obs}) \propto p_X(\mathbf{x}_1) \, \delta(\mathbf{x}_1 - \mathbf{x}_2)\, p(\ntn{obs}|\mathbf{x}_2),

we get a chain with two factor nodes (prior and likelihood) and one equality constraint. VAMP passes Gaussian messages between these nodes and enforces expectation consistency β€” the two local posterior beliefs must share the same mean and variance.

The delta function looks contrived but it is the standard trick that turns a monolithic inference problem into a two-node message-passing problem, enabling separate treatment of the prior and the linear observation.

Definition:

Expectation Consistency (EC)

Given two candidate approximate posteriors q1(x)q_1(\mathbf{x}) (from the prior node) and q2(x)q_2(\mathbf{x}) (from the likelihood node), both Gaussian with means ΞΌ1,ΞΌ2\boldsymbol{\mu}_1, \boldsymbol{\mu}_2 and covariances Ξ³1βˆ’1I,Ξ³2βˆ’1I\gamma_1^{-1}\mathbf{I}, \gamma_2^{-1}\mathbf{I}, expectation consistency requires

Eq1[x]=Eq2[x],Eq1[βˆ₯xβˆ₯2]=Eq2[βˆ₯xβˆ₯2].\mathbb{E}_{q_1}[\mathbf{x}] = \mathbb{E}_{q_2}[\mathbf{x}], \quad \mathbb{E}_{q_1}[\|\mathbf{x}\|^2] = \mathbb{E}_{q_2}[\|\mathbf{x}\|^2].

The combined belief q(x)∝q1(x)q2(x)/N(x;ΞΌ,Ξ³βˆ’1I)q(\mathbf{x}) \propto q_1(\mathbf{x})q_2(\mathbf{x})/\mathcal{N}(\mathbf{x};\boldsymbol{\mu},\gamma^{-1}\mathbf{I}) has precision Ξ³1+Ξ³2\gamma_1 + \gamma_2 (the precisions add) and is the moment-matched target for the next message update.

EC is the principle that links the two derivations: OAMP's orthogonality is the precision-addition rule under the Gaussian approximation. The two "errors" β€” one from each node β€” are independent by construction because each node ignores the information that will come from the other side.

Definition:

VAMP Algorithm

Let A=UΞ›VH\mathbf{A} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{V}^{\mathsf{H}} be the SVD of the sensing matrix. Initialize r1(0)=0\mathbf{r}_1^{(0)} = \mathbf{0}, Ξ³1(0)=1/Var(X)\gamma_1^{(0)} = 1/\mathrm{Var}(X). For t=0,1,…t=0,1,\ldots:

(denoiser)x^1(t)=g1(r1(t),Ξ³1(t)),Ξ±1(t)=⟨g1β€²(r1(t),Ξ³1(t))⟩γ2(t)=Ξ³1(t)(1Ξ±1(t)βˆ’1),r2(t)=x^1(t)βˆ’Ξ±1(t)r1(t)1βˆ’Ξ±1(t)(LMMSE)x^2(t)=g2(r2(t),Ξ³2(t)),Ξ±2(t)=⟨g2β€²(r2(t),Ξ³2(t))⟩γ1(t+1)=Ξ³2(t)(1Ξ±2(t)βˆ’1),r1(t+1)=x^2(t)βˆ’Ξ±2(t)r2(t)1βˆ’Ξ±2(t)\begin{aligned} &\text{(denoiser)} & & \hat{\mathbf{x}}_1^{(t)} = g_1(\mathbf{r}_1^{(t)},\gamma_1^{(t)}), \quad \alpha_1^{(t)} = \langle g_1'(\mathbf{r}_1^{(t)},\gamma_1^{(t)}) \rangle \\ & & & \gamma_2^{(t)} = \gamma_1^{(t)}\left(\frac{1}{\alpha_1^{(t)}} - 1\right), \quad \mathbf{r}_2^{(t)} = \frac{\hat{\mathbf{x}}_1^{(t)} - \alpha_1^{(t)}\mathbf{r}_1^{(t)}}{1 - \alpha_1^{(t)}} \\ &\text{(LMMSE)} & & \hat{\mathbf{x}}_2^{(t)} = g_2(\mathbf{r}_2^{(t)},\gamma_2^{(t)}), \quad \alpha_2^{(t)} = \langle g_2'(\mathbf{r}_2^{(t)},\gamma_2^{(t)}) \rangle \\ & & & \gamma_1^{(t+1)} = \gamma_2^{(t)}\left(\frac{1}{\alpha_2^{(t)}} - 1\right), \quad \mathbf{r}_1^{(t+1)} = \frac{\hat{\mathbf{x}}_2^{(t)} - \alpha_2^{(t)}\mathbf{r}_2^{(t)}}{1 - \alpha_2^{(t)}} \end{aligned}

The two local estimators are:

  • g1(r,Ξ³)=E[x∣x+N(0,Ξ³βˆ’1I)=r]g_1(\mathbf{r},\gamma) = \mathbb{E}[\mathbf{x}|\mathbf{x} + \mathcal{N}(\mathbf{0},\gamma^{-1}\mathbf{I}) = \mathbf{r}] (prior MMSE denoiser);
  • g2(r,Ξ³)=(AHA/Οƒ2+Ξ³I)βˆ’1(AH\ntnobs/Οƒ2+Ξ³r)g_2(\mathbf{r},\gamma) = (\mathbf{A}^{\mathsf{H}}\mathbf{A}/\sigma^2 + \gamma \mathbf{I})^{-1}(\mathbf{A}^{\mathsf{H}}\ntn{obs}/\sigma^2 + \gamma \mathbf{r}) (LMMSE with pseudo-prior N(r,Ξ³βˆ’1I)\mathcal{N}(\mathbf{r},\gamma^{-1}\mathbf{I})).

The scalar Ξ±i=⟨giβ€²βŸ©\alpha_i = \langle g_i' \rangle is the average sensitivity of the local estimator, and the formulas for (rj,Ξ³j)(\mathbf{r}_{j},\gamma_j) implement the Gaussian message-subtraction step that makes the next input extrinsic.

Theorem: VAMP State Evolution

For right-rotationally-invariant A\mathbf{A} with limiting spectrum ΞΌ\mu, separable prior pXp_X with finite variance, and Lipschitz-continuous denoiser g1g_1, the VAMP iterates satisfy, in the large-system limit,

ri(t)=x+N(0, τi(t)I),i∈{1,2},\mathbf{r}_i^{(t)} = \mathbf{x} + \mathcal{N}(\mathbf{0},\,\tau_i^{(t)}\mathbf{I}), \qquad i \in \{1,2\},

with scalar variances Ο„i(t)\tau_i^{(t)} evolving according to a deterministic recursion depending only on the spectrum ΞΌ\mu, the noise level Οƒ2\sigma^2, and the denoiser MSE curve. The fixed-point of this recursion coincides with the replica-predicted Bayes-optimal MMSE when g1g_1 is the posterior-mean denoiser.

State evolution for VAMP is again a one-dimensional recursion β€” that is the whole point. The RRI assumption gives us the rotational symmetry needed to reduce a high-dimensional iterative algorithm to a scalar fixed-point equation, which then predicts the algorithm's performance without running it.

Example: VAMP as Message Passing on a Scalar Channel

Take the simplest scalar case: N=M=1N = M = 1, y=x+wy = x + w, w∼N(0,Οƒ2)w \sim \mathcal{N}(0,\sigma^2), x∼pXx \sim p_X. Show that VAMP reduces to the standard MMSE scalar denoiser and that its two-step message passing is exactly Bayes' rule applied twice.

VAMP State Evolution vs Empirical MSE

Run VAMP on a synthetic right-rotationally-invariant sensing matrix with a Bernoulli-Gaussian prior and compare the empirical MSE trajectory to the scalar state-evolution prediction. The two curves should overlay β€” this is the headline guarantee of the chapter.

Parameters
800
0.5
0.15
25
15

OAMP and VAMP Are the Same Algorithm

Up to rescaling conventions, OAMP and VAMP produce identical iterates. OAMP derives the LMMSE filter from orthogonality; VAMP derives it from expectation consistency under a Gaussian approximation. The two derivations terminate at the same fixed-point equations because orthogonality of two zero-mean Gaussian errors is equivalent to additivity of their precisions.

The practical consequence: any implementation that computes (AHA/Οƒ2+Ξ³I)βˆ’1(\mathbf{A}^{\mathsf{H}}\mathbf{A}/\sigma^2 + \gamma \mathbf{I})^{-1} efficiently is simultaneously an OAMP and a VAMP implementation. Choose the derivation that is more convenient for the extension you want to build β€” orthogonality for physics-motivated analysis, or message passing for graph-structured models and learned variants.

⚠️Engineering Note

Damping and Numerical Stability

Even though VAMP is provably convergent under the RRI assumption, finite-NN implementations can oscillate, particularly when the LMMSE precision Ξ³2\gamma_2 becomes very small (flat directions of A\mathbf{A}) or very large (well-observed directions). Standard practice is to damp the precision updates:

Ξ³(t+1)←(1βˆ’Ξ²)Ξ³(t)+Ξ²Ξ³new(t+1),\gamma^{(t+1)} \leftarrow (1-\beta)\gamma^{(t)} + \beta \gamma^{(t+1)}_{\text{new}},

with β∈[0.5,0.9]\beta \in [0.5, 0.9]. One also clamps Ξ±\alpha to a small positive number (e.g., 10βˆ’610^{-6}) to avoid division blowup when the denoiser is nearly constant. These tweaks do not change the fixed-point but vastly improve the transient behavior on finite-dimensional instances.

Practical Constraints
  • β€’

    Precision clamping: Ξ±i∈[Ο΅,1βˆ’Ο΅]\alpha_i \in [\epsilon, 1-\epsilon]

  • β€’

    Damping factor: β∈[0.5,0.9]\beta \in [0.5, 0.9]

AMP vs OAMP vs VAMP

PropertyAMPOAMP / VAMP
Linear stepMatched filter AH\mathbf{A}^{\mathsf{H}}LMMSE (AHA/Οƒ2+Ξ³I)βˆ’1(\mathbf{A}^{\mathsf{H}}\mathbf{A}/\sigma^2+\gamma\mathbf{I})^{-1}
Per-iter cost (dense A\mathbf{A})O(MN)O(MN)O(N3)O(N^3) or O(MN)O(MN) via SVD
Matrix class (provable)i.i.d. sub-GaussianRight-rotationally invariant
State evolutionValid (i.i.d. case)Valid (RRI case)
Requires SVD / inverseNoYes (or iterative solve)
Bayes-optimal fixed pointYes (if SE converges)Yes (if SE converges)
Onsager correctionExplicit btrtβˆ’1b_t \mathbf{r}_{t-1} termAbsorbed into normalization
Typical imaging useDemos on i.i.d. Gaussian A\mathbf{A}Structured / physical A\mathbf{A}

Common Mistake: Assuming Any Structured Matrix Is RRI

Mistake:

Assuming that because VAMP works for partial-DFT and random-unitary sensing, it will work for any structured matrix β€” including, say, near-field imaging operators with strong spatial correlations.

Correction:

RRI is a specific statistical assumption on the right singular basis. Imaging operators constructed from physical propagation typically have highly structured right singular vectors that are not Haar-distributed. On such operators, VAMP's state evolution is no longer exact, and empirical MSE can deviate noticeably from the predicted curve. When this happens, either (a) apply a random unitary pre-rotation to make the effective operator closer to RRI, or (b) use multi-layer VAMP / learned VAMP, which tolerates structured mismatches.

VAMP (Vector AMP)

A message-passing algorithm for linear inverse problems that alternates between a denoising step (prior node) and an LMMSE step (likelihood node), enforcing expectation consistency between the two local beliefs. Equivalent to OAMP and provably correct for the right-rotationally-invariant class of sensing matrices.

Related: OAMP (Orthogonal AMP), Expectation consistency

Expectation consistency

A principle for approximate inference in factor graphs that requires all local Gaussian beliefs on a shared variable to agree on mean and variance. When satisfied, the resulting fixed point is a stationary point of a free-energy functional and, under the right symmetry assumptions, coincides with the Bayes-optimal posterior.

Related: VAMP (Vector AMP)

Historical Note: Expectation Consistency and the VAMP Genealogy

2005-2017

Opper and Winther introduced expectation consistency as a refinement of expectation propagation around 2005. Manfred Opper observed that the precision of a belief is additive across messages in a Gaussian approximation β€” a consequence of the entropy functional being quadratic in the sufficient statistics.

The Schniter–Rangan–Fletcher team's 2017 VAMP paper took this principle and combined it with the AMP framework, producing an algorithm that was immediately recognized as equivalent to Ma and Ping's OAMP. Sundeep Rangan later remarked that "the two derivations don't look alike, but the iteration is the same to the last trigonometric identity" β€” a common phenomenon in message passing where multiple paths lead to the same fixed-point equations.

Quick Check

In VAMP, what role does the auxiliary variable x2\mathbf{x}_2 and the constraint x1=x2\mathbf{x}_1 = \mathbf{x}_2 play?

It introduces additional measurements to over-determine the system.

It splits the inference problem into two local subproblems that exchange Gaussian messages.

It converts a non-convex problem into a convex one.

It enforces sparsity on x2\mathbf{x}_2 to match the signal's prior.