Ferkans — Interactive Telecom Tutor

VAMP: AMP from a Graph-Theoretic Viewpoint

OAMP was derived by fixing AMP's linear step. VAMP arrives at the same algorithm from the opposite direction: it starts from the factor graph of the estimation problem and writes down the expectation-consistency equations that two local estimators must satisfy to agree on the posterior moments.

Why two derivations of the same algorithm? Because each viewpoint makes a different property transparent. OAMP explains why orthogonality is enforced. VAMP explains what the algorithm is doing — passing messages between a "linear" node (which knows the measurement model $\ntn{obs} = \mathbf{A}\mathbf{x} + \mathbf{w}$ ) and a "prior" node (which knows $p_X$ ). Each node computes a local posterior, matches moments, and sends the result back. The fixed-point of this exchange is the OAMP estimate.

This view also makes the generalization to GAMP, learned VAMP, and multi-layer VAMP very natural — you just change the nodes.

Definition:
Two-Node Factorization

VAMP represents the posterior $p(\mathbf{x}|\ntn{obs}) \propto p_X(\mathbf{x})\,p(\ntn{obs}|\mathbf{x})$ as a product of two factors and introduces an auxiliary variable $\mathbf{x}_2$ with the constraint $\mathbf{x}_1 = \mathbf{x}_2$ . Writing the joint density as

$p(\mathbf{x}_1,\mathbf{x}_2|\ntn{obs}) \propto p_X(\mathbf{x}_1) \, \delta(\mathbf{x}_1 - \mathbf{x}_2)\, p(\ntn{obs}|\mathbf{x}_2),$

we get a chain with two factor nodes (prior and likelihood) and one equality constraint. VAMP passes Gaussian messages between these nodes and enforces expectation consistency — the two local posterior beliefs must share the same mean and variance.

The delta function looks contrived but it is the standard trick that turns a monolithic inference problem into a two-node message-passing problem, enabling separate treatment of the prior and the linear observation.

Definition:
Expectation Consistency (EC)

Given two candidate approximate posteriors $q_1(\mathbf{x})$ (from the prior node) and $q_2(\mathbf{x})$ (from the likelihood node), both Gaussian with means $\boldsymbol{\mu}_1, \boldsymbol{\mu}_2$ and covariances $\gamma_1^{-1}\mathbf{I}, \gamma_2^{-1}\mathbf{I}$ , expectation consistency requires

$\mathbb{E}_{q_1}[\mathbf{x}] = \mathbb{E}_{q_2}[\mathbf{x}], \quad \mathbb{E}_{q_1}[\|\mathbf{x}\|^2] = \mathbb{E}_{q_2}[\|\mathbf{x}\|^2].$

The combined belief $q(\mathbf{x}) \propto q_1(\mathbf{x})q_2(\mathbf{x})/\mathcal{N}(\mathbf{x};\boldsymbol{\mu},\gamma^{-1}\mathbf{I})$ has precision $\gamma_1 + \gamma_2$ (the precisions add) and is the moment-matched target for the next message update.

EC is the principle that links the two derivations: OAMP's orthogonality is the precision-addition rule under the Gaussian approximation. The two "errors" — one from each node — are independent by construction because each node ignores the information that will come from the other side.

Definition:
VAMP Algorithm

Let $\mathbf{A} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{V}^{\mathsf{H}}$ be the SVD of the sensing matrix. Initialize $\mathbf{r}_1^{(0)} = \mathbf{0}$ , $\gamma_1^{(0)} = 1/\mathrm{Var}(X)$ . For $t=0,1,\ldots$ :

$\begin{aligned} &\text{(denoiser)} & & \hat{\mathbf{x}}_1^{(t)} = g_1(\mathbf{r}_1^{(t)},\gamma_1^{(t)}), \quad \alpha_1^{(t)} = \langle g_1'(\mathbf{r}_1^{(t)},\gamma_1^{(t)}) \rangle \\ & & & \gamma_2^{(t)} = \gamma_1^{(t)}\left(\frac{1}{\alpha_1^{(t)}} - 1\right), \quad \mathbf{r}_2^{(t)} = \frac{\hat{\mathbf{x}}_1^{(t)} - \alpha_1^{(t)}\mathbf{r}_1^{(t)}}{1 - \alpha_1^{(t)}} \\ &\text{(LMMSE)} & & \hat{\mathbf{x}}_2^{(t)} = g_2(\mathbf{r}_2^{(t)},\gamma_2^{(t)}), \quad \alpha_2^{(t)} = \langle g_2'(\mathbf{r}_2^{(t)},\gamma_2^{(t)}) \rangle \\ & & & \gamma_1^{(t+1)} = \gamma_2^{(t)}\left(\frac{1}{\alpha_2^{(t)}} - 1\right), \quad \mathbf{r}_1^{(t+1)} = \frac{\hat{\mathbf{x}}_2^{(t)} - \alpha_2^{(t)}\mathbf{r}_2^{(t)}}{1 - \alpha_2^{(t)}} \end{aligned}$

The two local estimators are:

$g_1(\mathbf{r},\gamma) = \mathbb{E}[\mathbf{x}|\mathbf{x} + \mathcal{N}(\mathbf{0},\gamma^{-1}\mathbf{I}) = \mathbf{r}]$ (prior MMSE denoiser);
$g_2(\mathbf{r},\gamma) = (\mathbf{A}^{\mathsf{H}}\mathbf{A}/\sigma^2 + \gamma \mathbf{I})^{-1}(\mathbf{A}^{\mathsf{H}}\ntn{obs}/\sigma^2 + \gamma \mathbf{r})$ (LMMSE with pseudo-prior $\mathcal{N}(\mathbf{r},\gamma^{-1}\mathbf{I})$ ).

The scalar $\alpha_i = \langle g_i' \rangle$ is the average sensitivity of the local estimator, and the formulas for $(\mathbf{r}_{j},\gamma_j)$ implement the Gaussian message-subtraction step that makes the next input extrinsic.

Theorem: VAMP State Evolution

For right-rotationally-invariant $\mathbf{A}$ with limiting spectrum $\mu$ , separable prior $p_X$ with finite variance, and Lipschitz-continuous denoiser $g_1$ , the VAMP iterates satisfy, in the large-system limit,

$\mathbf{r}_i^{(t)} = \mathbf{x} + \mathcal{N}(\mathbf{0},\,\tau_i^{(t)}\mathbf{I}), \qquad i \in \{1,2\},$

with scalar variances $\tau_i^{(t)}$ evolving according to a deterministic recursion depending only on the spectrum $\mu$ , the noise level $\sigma^2$ , and the denoiser MSE curve. The fixed-point of this recursion coincides with the replica-predicted Bayes-optimal MMSE when $g_1$ is the posterior-mean denoiser.

State evolution for VAMP is again a one-dimensional recursion — that is the whole point. The RRI assumption gives us the rotational symmetry needed to reduce a high-dimensional iterative algorithm to a scalar fixed-point equation, which then predicts the algorithm's performance without running it.

Proof

Reduce to scalar coordinates via SVD

Express every iterate in the SVD basis of $\mathbf{A}$ . Because $\mathbf{V}$ is Haar, the prior-node messages arrive in a uniformly random basis relative to the likelihood-node coordinates; the problem decouples into $N$ independent scalar channels along the singular directions.

Track precision propagation

At each stage, the outgoing precision is $\gamma^{\text{out}} = \gamma^{\text{total}} - \gamma^{\text{in}}$ , which is the Gaussian-message subtraction formula. Summing contributions across singular values gives the scalar MSE recursion.

Close the loop with the denoiser

The denoiser MSE $\mathcal{E}(\tau) = \mathbb{E}[(g_1(X+\tau Z,\tau^{-1})-X)^2]$ closes the fixed-point equation. Lipschitz continuity of $g_1$ plus the Haar property of $\mathbf{V}$ are precisely the hypotheses needed for rigorous convergence of the empirical trajectory to the state-evolution prediction.

Example: VAMP as Message Passing on a Scalar Channel

Take the simplest scalar case: $N = M = 1$ , $y = x + w$ , $w \sim \mathcal{N}(0,\sigma^2)$ , $x \sim p_X$ . Show that VAMP reduces to the standard MMSE scalar denoiser and that its two-step message passing is exactly Bayes' rule applied twice.

Solution

Identify the two factors

The prior node carries $p_X(x)$ , and the likelihood node carries $\mathcal{N}(y;x,\sigma^2)$ . Expectation consistency requires matching the two local posteriors' mean and variance, which in this one-dimensional case gives Bayes' rule exactly: $p(x|y) \propto p_X(x)\mathcal{N}(y;x,\sigma^2)$ .

Run the VAMP update

Starting from $r_1 = 0, \gamma_1 = 1/\mathrm{Var}(X)$ , the denoiser produces $\hat{x}_1 = \mathbb{E}[x|r_1 + \text{noise}(\gamma_1^{-1})]$ , the precision-subtraction gives $\gamma_2 = \gamma_1(1/\alpha_1 - 1)$ , and the LMMSE node combines this with the scalar measurement to give $\hat{x}_2 = (\gamma_2 r_2 + y/\sigma^2)/(\gamma_2 + 1/\sigma^2)$ .

Verify the fixed point

At convergence the two posteriors agree, so $\hat{x}_1 = \hat{x}_2$ . In this scalar problem convergence is immediate and the fixed point is the MMSE estimate. The two-node dance is unnecessary here — but it shows that VAMP is a consistent Bayes engine that reduces correctly in the easy case.

VAMP State Evolution vs Empirical MSE

Run VAMP on a synthetic right-rotationally-invariant sensing matrix with a Bernoulli-Gaussian prior and compare the empirical MSE trajectory to the scalar state-evolution prediction. The two curves should overlay — this is the headline guarantee of the chapter.

Parameters

Signal dimension

N

800

Sampling rate

M/N

0.5

Sparsity

\rho

0.15

SNR (dB)25

Iterations15

OAMP and VAMP Are the Same Algorithm

Up to rescaling conventions, OAMP and VAMP produce identical iterates. OAMP derives the LMMSE filter from orthogonality; VAMP derives it from expectation consistency under a Gaussian approximation. The two derivations terminate at the same fixed-point equations because orthogonality of two zero-mean Gaussian errors is equivalent to additivity of their precisions.

The practical consequence: any implementation that computes $(\mathbf{A}^{\mathsf{H}}\mathbf{A}/\sigma^2 + \gamma \mathbf{I})^{-1}$ efficiently is simultaneously an OAMP and a VAMP implementation. Choose the derivation that is more convenient for the extension you want to build — orthogonality for physics-motivated analysis, or message passing for graph-structured models and learned variants.

⚠️Engineering Note

Damping and Numerical Stability

Even though VAMP is provably convergent under the RRI assumption, finite- $N$ implementations can oscillate, particularly when the LMMSE precision $\gamma_2$ becomes very small (flat directions of $\mathbf{A}$ ) or very large (well-observed directions). Standard practice is to damp the precision updates:

$\gamma^{(t+1)} \leftarrow (1-\beta)\gamma^{(t)} + \beta \gamma^{(t+1)}_{\text{new}},$

with $\beta \in [0.5, 0.9]$ . One also clamps $\alpha$ to a small positive number (e.g., $10^{-6}$ ) to avoid division blowup when the denoiser is nearly constant. These tweaks do not change the fixed-point but vastly improve the transient behavior on finite-dimensional instances.

Practical Constraints

•
Precision clamping: $\alpha_i \in [\epsilon, 1-\epsilon]$
•
Damping factor: $\beta \in [0.5, 0.9]$

AMP vs OAMP vs VAMP

Property	AMP	OAMP / VAMP
Linear step	Matched filter $\mathbf{A}^{\mathsf{H}}$	LMMSE $(\mathbf{A}^{\mathsf{H}}\mathbf{A}/\sigma^2+\gamma\mathbf{I})^{-1}$
Per-iter cost (dense $\mathbf{A}$ )	$O(MN)$	$O(N^3)$ or $O(MN)$ via SVD
Matrix class (provable)	i.i.d. sub-Gaussian	Right-rotationally invariant
State evolution	Valid (i.i.d. case)	Valid (RRI case)
Requires SVD / inverse	No	Yes (or iterative solve)
Bayes-optimal fixed point	Yes (if SE converges)	Yes (if SE converges)
Onsager correction	Explicit $b_t \mathbf{r}_{t-1}$ term	Absorbed into normalization
Typical imaging use	Demos on i.i.d. Gaussian $\mathbf{A}$	Structured / physical $\mathbf{A}$

Common Mistake: Assuming Any Structured Matrix Is RRI

Mistake:

Assuming that because VAMP works for partial-DFT and random-unitary sensing, it will work for any structured matrix — including, say, near-field imaging operators with strong spatial correlations.

Correction:

RRI is a specific statistical assumption on the right singular basis. Imaging operators constructed from physical propagation typically have highly structured right singular vectors that are not Haar-distributed. On such operators, VAMP's state evolution is no longer exact, and empirical MSE can deviate noticeably from the predicted curve. When this happens, either (a) apply a random unitary pre-rotation to make the effective operator closer to RRI, or (b) use multi-layer VAMP / learned VAMP, which tolerates structured mismatches.

VAMP (Vector AMP)

A message-passing algorithm for linear inverse problems that alternates between a denoising step (prior node) and an LMMSE step (likelihood node), enforcing expectation consistency between the two local beliefs. Equivalent to OAMP and provably correct for the right-rotationally-invariant class of sensing matrices.

Expectation consistency

A principle for approximate inference in factor graphs that requires all local Gaussian beliefs on a shared variable to agree on mean and variance. When satisfied, the resulting fixed point is a stationary point of a free-energy functional and, under the right symmetry assumptions, coincides with the Bayes-optimal posterior.

Related: VAMP (Vector AMP)

Historical Note: Expectation Consistency and the VAMP Genealogy

2005-2017

Opper and Winther introduced expectation consistency as a refinement of expectation propagation around 2005. Manfred Opper observed that the precision of a belief is additive across messages in a Gaussian approximation — a consequence of the entropy functional being quadratic in the sufficient statistics.

The Schniter–Rangan–Fletcher team's 2017 VAMP paper took this principle and combined it with the AMP framework, producing an algorithm that was immediately recognized as equivalent to Ma and Ping's OAMP. Sundeep Rangan later remarked that "the two derivations don't look alike, but the iteration is the same to the last trigonometric identity" — a common phenomenon in message passing where multiple paths lead to the same fixed-point equations.

Quick Check

In VAMP, what role does the auxiliary variable $\mathbf{x}_2$ and the constraint $\mathbf{x}_1 = \mathbf{x}_2$ play?

It introduces additional measurements to over-determine the system.

It splits the inference problem into two local subproblems that exchange Gaussian messages.

It converts a non-convex problem into a convex one.

It enforces sparsity on $\mathbf{x}_2$ to match the signal's prior.