OAMP — Orthogonal Approximate Message Passing

Why OAMP?

In Chapter 20 we saw that AMP is beautiful when it works — it tracks a one-dimensional state-evolution recursion and achieves the Bayes-optimal MSE for i.i.d. Gaussian sensing matrices. But the moment we step outside the i.i.d. regime — to partial Fourier matrices, to Kronecker-structured operators from MIMO and imaging, to any matrix with a non-flat spectrum — AMP diverges, sometimes dramatically. Damping helps but does not cure.

The diagnosis is simple. AMP relies on the Onsager correction to decorrelate the residual from the current estimate, so that the pseudo-observation rt=x+τtzt\mathbf{r}_t = \mathbf{x} + \tau_t \mathbf{z}_t really does look like signal-plus-white-Gaussian-noise. When A\mathbf{A} has non-i.i.d. entries, the Onsager correction no longer does this job, and the denoiser is fed a residual with structured correlations.

OAMP (Orthogonal AMP) fixes this by replacing the matched filter AH\mathbf{A}^{\mathsf{H}} with a divergence-free linear estimator, most naturally the LMMSE filter. The orthogonality between the linear estimation error and the denoising error is enforced directly, rather than approximately. This widens the class of matrices for which the state-evolution analysis is correct — specifically, OAMP is provably correct for the right-rotationally-invariant class, a much larger family than i.i.d. Gaussian.

Definition:

Right-Rotationally-Invariant Matrix Ensemble

A random matrix ACM×N\mathbf{A} \in \mathbb{C}^{M \times N} is called right-rotationally-invariant (RRI) if its distribution is unchanged by right-multiplication by any deterministic unitary matrix OCN×N\mathbf{O} \in \mathbb{C}^{N \times N}:

A=dAOfor all unitary O.\mathbf{A} \,\stackrel{d}{=}\, \mathbf{A}\,\mathbf{O} \quad \text{for all unitary } \mathbf{O}.

Equivalently, writing the SVD as A=UΛVH\mathbf{A} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{V}^{\mathsf{H}}, the right singular basis V\mathbf{V} is uniformly distributed on the unitary group, independently of the singular values {λi}\{\lambda_i\} and of U\mathbf{U}.

This class strictly contains the i.i.d. Gaussian ensemble (for which both U\mathbf{U} and V\mathbf{V} are Haar and the singular values follow Marchenko-Pastur), but it also contains orthogonal / partial DFT matrices, randomly subsampled unitary matrices, and many designed sensing operators that arise in MIMO and imaging.

Definition:

Divergence-Free Linear Estimator

Given a pseudo-observation r=x+τz\mathbf{r} = \mathbf{x} + \tau \mathbf{z}, zN(0,I)\mathbf{z} \sim \mathcal{N}(\mathbf{0},\mathbf{I}), and a linear operator W\mathbf{W} mapping residuals back to the signal domain, the function f(y)=W(yAx^)f(\mathbf{y}) = \mathbf{W}(\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}) is divergence-free with respect to x^\hat{\mathbf{x}} when

tr(WA)=0or, after normalization,1Ntr(WA)=1.\mathrm{tr}(\mathbf{W}\mathbf{A}) = 0 \quad \text{or, after normalization,} \quad \frac{1}{N}\mathrm{tr}(\mathbf{W}\mathbf{A}) = 1.

The first form gives a zero-mean residual correction; the second is the common unbiased normalization used in OAMP. In both cases the role is the same: the output of the linear step is uncorrelated with the input error, so the effective noise passed to the denoiser is truly fresh.

"Divergence-free" is language borrowed from the state-evolution derivation — it is the condition that cancels the Onsager term that plain AMP inserts by hand.

Definition:

OAMP Iteration

Given the linear model \ntnobs=Ax+w\ntn{obs} = \mathbf{A}\mathbf{x} + \mathbf{w} with wCN(0,σ2I)\mathbf{w} \sim \mathcal{CN}(\mathbf{0},\sigma^2\mathbf{I}), the OAMP algorithm iterates, for t=0,1,2,t=0,1,2,\dots:

linear step:rt=x^t+Wt(\ntnobsAx^t),denoise:x^t+1=Ct[ηt(rt)ηt(rt)rt],\begin{aligned} \text{linear step:} \quad & \mathbf{r}_t = \hat{\mathbf{x}}_t + \mathbf{W}_t (\ntn{obs} - \mathbf{A}\hat{\mathbf{x}}_t), \\ \text{denoise:} \quad & \hat{\mathbf{x}}_{t+1} = C_t \left[ \eta_t(\mathbf{r}_t) - \langle \eta_t'(\mathbf{r}_t) \rangle \mathbf{r}_t \right], \end{aligned}

where Wt\mathbf{W}_t is the divergence-free LMMSE filter, ηt()\eta_t(\cdot) is a componentwise denoiser matched to the signal prior, ηt(rt)=1Niηt(rt,i)\langle \eta_t'(\mathbf{r}_t) \rangle = \frac{1}{N}\sum_i \eta_t'(r_{t,i}) is the average divergence, and CtC_t is a normalization constant that restores unit-gain on x\mathbf{x}.

The subtraction of ηtrt\langle \eta_t' \rangle \mathbf{r}_t from ηt(rt)\eta_t(\mathbf{r}_t) is the divergence-free correction on the denoiser side. Together with the divergence-free Wt\mathbf{W}_t on the linear side, it enforces orthogonality between the two error components — hence "Orthogonal" AMP.

Theorem: OAMP State Evolution for RRI Matrices

Assume A\mathbf{A} is right-rotationally-invariant with asymptotic spectrum μ(λ2)\mu(\lambda^2) of AHA\mathbf{A}^{\mathsf{H}}\mathbf{A}, and x\mathbf{x} has i.i.d. entries with prior pXp_X. Let Wt\mathbf{W}_t be the LMMSE filter

Wt=Ntr(W^tA)W^t,W^t=AH(AAH+σ2vtI)1,\mathbf{W}_t = \frac{N}{\mathrm{tr}(\hat{\mathbf{W}}_t \mathbf{A})}\hat{\mathbf{W}}_t, \qquad \hat{\mathbf{W}}_t = \mathbf{A}^{\mathsf{H}}\left( \mathbf{A}\mathbf{A}^{\mathsf{H}} + \frac{\sigma^2}{v_t}\mathbf{I}\right)^{-1},

and let vt=Ex^tx2/Nv_t = \mathbb{E}\|\hat{\mathbf{x}}_t - \mathbf{x}\|^2/N be the MSE at iteration tt. Then, in the large-system limit NN \to \infty with M/N=δM/N = \delta fixed, the pseudo-observation satisfies

rt=x+τtzt,ztN(0,I),\mathbf{r}_t = \mathbf{x} + \tau_t \mathbf{z}_t, \qquad \mathbf{z}_t \sim \mathcal{N}(\mathbf{0},\mathbf{I}),

with zt\mathbf{z}_t independent of x\mathbf{x}, and the scalar state τt2\tau_t^2 evolves according to

τt2=F(vt;μ,σ2),vt+1=E(τt2;pX),\tau_t^2 = \mathcal{F}(v_t; \mu, \sigma^2), \qquad v_{t+1} = \mathcal{E}(\tau_t^2; p_X),

where F\mathcal{F} is the linear-step transfer function (determined by the spectrum of A\mathbf{A}) and E\mathcal{E} is the MSE of the denoiser at noise level τt2\tau_t^2.

The theorem says that OAMP's one-dimensional state-evolution description — the same kind of recursion that makes AMP tractable — survives for the entire right-rotationally-invariant class. You trade a matrix-vector product with AH\mathbf{A}^{\mathsf{H}} for an LMMSE-style solve, and in return you get a much wider regime of validity.

,

Key Takeaway

OAMP replaces AMP's matched filter with an LMMSE filter and enforces orthogonality between the linear-step error and the denoising error. Orthogonality is the statistical condition that keeps the scalar state-evolution recursion correct, so OAMP generalizes cleanly beyond the i.i.d. Gaussian class to right-rotationally-invariant matrices.

⚠️Engineering Note

Computational Cost of the LMMSE Step

The LMMSE filter W^t\hat{\mathbf{W}}_t requires solving a linear system of the form (AAH+cI)1\ntnobs(\mathbf{A}\mathbf{A}^{\mathsf{H}} + c\mathbf{I})^{-1}\ntn{obs}. In general this costs O(M3)O(M^3) per iteration, which is the main practical barrier to deploying OAMP. For structured operators (Kronecker, partial DFT, subsampled unitary), the cost drops dramatically — we treat the Kronecker case in section 21.3.

In imaging pipelines where MM can be tens of thousands, the linear step typically dominates the runtime. Practical implementations use Woodbury identities, diagonalization in the SVD basis, or CG with a few inner iterations warm-started from the previous outer step.

Example: OAMP for a Partial Orthogonal Sensing Matrix

Consider A=N/MSF\mathbf{A} = \sqrt{N/M}\,\mathbf{S}\mathbf{F} where F\mathbf{F} is an N×NN \times N DFT matrix and S\mathbf{S} selects MM rows at random. This is a classic compressed-sensing operator. Compute the effective noise variance τt2\tau_t^2 after one OAMP iteration when vtv_t is the current MSE and σ2\sigma^2 is the measurement noise.

OAMP vs AMP on Structured Matrices

Compare the MSE trajectories of AMP (matched-filter linear step) and OAMP (LMMSE linear step) on three sensing ensembles: i.i.d. Gaussian, partial DFT, and column-correlated Gaussian. AMP tracks state evolution only for i.i.d. Gaussian; OAMP tracks it for all three.

Parameters
0.5

Ratio of measurements to signal dimension

0.15

Fraction of non-zero entries in $\mathbf{x}$

20
20

OAMP with Bayes-Optimal Denoiser

Complexity: Dominated by the LMMSE solve: O(M3)O(M^3) per iteration in general, or O(MlogM)O(M \log M) for partial-DFT sensing, or O(M3/2)O(M^{3/2}) for separable Kronecker sensing (section 21.3).
Input: measurements y, sensing matrix A, noise variance sigma^2,
prior p_X, max iterations T, tolerance eps
Initialize: x_hat_0 = 0, v_0 = Var(X)
for t = 0, 1, ..., T-1:
# ----- Linear step (divergence-free LMMSE) -----
W_hat = A^H (A A^H + (sigma^2 / v_t) I)^(-1)
C_lin = N / trace(W_hat A)
W_t = C_lin * W_hat
r_t = x_hat_t + W_t (y - A x_hat_t)
tau_t_sq = computeTauSquared(v_t, spectrum(A), sigma^2)
# ----- Denoising step (divergence-free MMSE denoiser) -----
eta_r = MMSE_denoise(r_t, tau_t_sq, p_X)
div_eta = mean(derivative_of_MMSE(r_t, tau_t_sq, p_X))
C_dn = 1 / (1 - div_eta)
x_hat_next = C_dn * (eta_r - div_eta * r_t)
v_next = tau_t_sq * div_eta / (1 - div_eta)
if |v_next - v_t| < eps: break
v_t = v_next; x_hat_t = x_hat_next
return x_hat_t

The denominator 1divη1 - \mathrm{div}\,\eta is exactly the "Onsager-free" rescaling that makes the denoised signal orthogonal to its input pseudo-observation.

Common Mistake: Forgetting the Divergence-Free Normalization

Mistake:

Implementing the LMMSE step as rt=x^t+W^t(\ntnobsAx^t)\mathbf{r}_t = \hat{\mathbf{x}}_t + \hat{\mathbf{W}}_t(\ntn{obs} - \mathbf{A}\hat{\mathbf{x}}_t) without the scalar Ct=N/tr(W^tA)C_t = N/\mathrm{tr}(\hat{\mathbf{W}}_t \mathbf{A}), or forgetting the analogous scaling on the denoiser side.

Correction:

Both normalizations are essential. Without CtC_t on the linear side, the signal passes through the filter with a gain 1\neq 1, so the denoiser is matched to the wrong effective prior. Without the 1/(1divη)1/(1 - \mathrm{div}\,\eta) scaling on the denoiser side, the orthogonality condition fails and state evolution no longer tracks the empirical dynamics. In practice both missing factors manifest as a state-evolution curve that diverges from the simulated MSE.

OAMP as AMP with a Better Linear Step

Comparing AMP and OAMP side by side makes the difference crisp. AMP uses WtAMP=AH\mathbf{W}_t^{\text{AMP}} = \mathbf{A}^{\mathsf{H}} (the matched filter, or "adjoint"), and pays for its bias with the Onsager term. OAMP uses the LMMSE WtOAMP\mathbf{W}_t^{\text{OAMP}}, which is already approximately unbiased, and the residual bias is removed by the divergence-free normalization.

When A\mathbf{A} is i.i.d. Gaussian, the two filters are essentially equivalent in the large-system limit, which is why AMP works in that regime. When A\mathbf{A} has a non-flat spectrum, the matched filter introduces a coloring that AMP cannot undo, while OAMP's LMMSE filter whitens the residual. This is the full story.

Historical Note: From Turbo Equalization to OAMP

2017

The divergence-free-plus-denoiser architecture has deep roots in turbo equalization (Douillard, Jézéquel, Berrou, 1995) and in the concept of extrinsic information from iterative decoding: the output of each module must be statistically independent of its input, otherwise beliefs are reinforced rather than refreshed.

Junjie Ma and Li Ping's 2017 paper Orthogonal AMP carried this principle over to compressed sensing. Independently, Schniter, Rangan, and Fletcher arrived at essentially the same algorithm by a different route — expectation consistency — and called it VAMP (section 21.2). The two derivations are now understood as complementary views of a single algorithm. The name "OAMP" emphasizes orthogonality; "VAMP" emphasizes the message-passing factor structure. Both are correct and both are used.

OAMP (Orthogonal AMP)

A modification of AMP that replaces the matched-filter linear step with an LMMSE filter and enforces the divergence-free condition on both the linear estimator and the denoiser. OAMP converges and admits state evolution for the right-rotationally-invariant class of sensing matrices.

Related: Divergence-free estimator, Right-rotationally-invariant matrix

Divergence-free estimator

A linear or nonlinear estimator whose derivative (divergence) with respect to its input vanishes on average, ensuring that the output error is orthogonal to the input noise. OAMP enforces this by trace-normalizing the LMMSE filter and subtracting ηrt\langle \eta' \rangle \mathbf{r}_t from the denoiser output.

Related: OAMP (Orthogonal AMP)

Right-rotationally-invariant matrix

A random matrix whose distribution is invariant under right-multiplication by any unitary matrix. Equivalently, its right singular basis is Haar-distributed. This class includes i.i.d. Gaussian matrices, random partial unitary / partial DFT matrices, and many other structured ensembles relevant to imaging.

Related: OAMP (Orthogonal AMP)

Quick Check

Which of the following statements about right-rotationally-invariant (RRI) sensing matrices is TRUE?

All i.i.d. Gaussian matrices are RRI, and the converse is also true.

RRI means the left singular basis U\mathbf{U} is Haar-distributed.

Partial DFT matrices are RRI because their right singular basis is uniformly random on the unitary group.

If A\mathbf{A} is RRI, then AH\mathbf{A}^{\mathsf{H}} is also RRI.

Why This Matters: Why OAMP Matters for RF Imaging

Imaging pipelines in the CommIT group build sensing matrices A\mathbf{A} from physical propagation: Kronecker products of steering-vector dictionaries, subsampled DFT rows from OFDM pilots, and near-field operators with slowly decaying singular-value spectra. None of these are i.i.d. Gaussian. Running plain AMP on such operators is numerically unstable, regardless of how many damping heuristics are stacked on top.

OAMP is the natural algorithmic home for these problems: it tolerates the non-flat spectrum, it admits principled state-evolution analysis (so we can predict reconstruction quality from the operator's singular values alone), and its LMMSE step can be computed efficiently by exploiting the Kronecker structure. This connection is developed in section 21.3 and reappears in Book 2 Chapter 27 where unrolled OAMP becomes the backbone of the RF imaging network.

See full treatment in Chapter 27، Section sec-lista-imaging