Ferkans — Interactive Telecom Tutor

Why OAMP?

In Chapter 20 we saw that AMP is beautiful when it works — it tracks a one-dimensional state-evolution recursion and achieves the Bayes-optimal MSE for i.i.d. Gaussian sensing matrices. But the moment we step outside the i.i.d. regime — to partial Fourier matrices, to Kronecker-structured operators from MIMO and imaging, to any matrix with a non-flat spectrum — AMP diverges, sometimes dramatically. Damping helps but does not cure.

The diagnosis is simple. AMP relies on the Onsager correction to decorrelate the residual from the current estimate, so that the pseudo-observation $\mathbf{r}_t = \mathbf{x} + \tau_t \mathbf{z}_t$ really does look like signal-plus-white-Gaussian-noise. When $\mathbf{A}$ has non-i.i.d. entries, the Onsager correction no longer does this job, and the denoiser is fed a residual with structured correlations.

OAMP (Orthogonal AMP) fixes this by replacing the matched filter $\mathbf{A}^{\mathsf{H}}$ with a divergence-free linear estimator, most naturally the LMMSE filter. The orthogonality between the linear estimation error and the denoising error is enforced directly, rather than approximately. This widens the class of matrices for which the state-evolution analysis is correct — specifically, OAMP is provably correct for the right-rotationally-invariant class, a much larger family than i.i.d. Gaussian.

Definition:
Right-Rotationally-Invariant Matrix Ensemble

A random matrix $\mathbf{A} \in \mathbb{C}^{M \times N}$ is called right-rotationally-invariant (RRI) if its distribution is unchanged by right-multiplication by any deterministic unitary matrix $\mathbf{O} \in \mathbb{C}^{N \times N}$ :

$\mathbf{A} \,\stackrel{d}{=}\, \mathbf{A}\,\mathbf{O} \quad \text{for all unitary } \mathbf{O}.$

Equivalently, writing the SVD as $\mathbf{A} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{V}^{\mathsf{H}}$ , the right singular basis $\mathbf{V}$ is uniformly distributed on the unitary group, independently of the singular values $\{\lambda_i\}$ and of $\mathbf{U}$ .

This class strictly contains the i.i.d. Gaussian ensemble (for which both $\mathbf{U}$ and $\mathbf{V}$ are Haar and the singular values follow Marchenko-Pastur), but it also contains orthogonal / partial DFT matrices, randomly subsampled unitary matrices, and many designed sensing operators that arise in MIMO and imaging.

Definition:
Divergence-Free Linear Estimator

Given a pseudo-observation $\mathbf{r} = \mathbf{x} + \tau \mathbf{z}$ , $\mathbf{z} \sim \mathcal{N}(\mathbf{0},\mathbf{I})$ , and a linear operator $\mathbf{W}$ mapping residuals back to the signal domain, the function $f(\mathbf{y}) = \mathbf{W}(\mathbf{y} - \mathbf{A}\hat{\mathbf{x}})$ is divergence-free with respect to $\hat{\mathbf{x}}$ when

$\mathrm{tr}(\mathbf{W}\mathbf{A}) = 0 \quad \text{or, after normalization,} \quad \frac{1}{N}\mathrm{tr}(\mathbf{W}\mathbf{A}) = 1.$

The first form gives a zero-mean residual correction; the second is the common unbiased normalization used in OAMP. In both cases the role is the same: the output of the linear step is uncorrelated with the input error, so the effective noise passed to the denoiser is truly fresh.

"Divergence-free" is language borrowed from the state-evolution derivation — it is the condition that cancels the Onsager term that plain AMP inserts by hand.

Definition:
OAMP Iteration

Given the linear model $\ntn{obs} = \mathbf{A}\mathbf{x} + \mathbf{w}$ with $\mathbf{w} \sim \mathcal{CN}(\mathbf{0},\sigma^2\mathbf{I})$ , the OAMP algorithm iterates, for $t=0,1,2,\dots$ :

$\begin{aligned} \text{linear step:} \quad & \mathbf{r}_t = \hat{\mathbf{x}}_t + \mathbf{W}_t (\ntn{obs} - \mathbf{A}\hat{\mathbf{x}}_t), \\ \text{denoise:} \quad & \hat{\mathbf{x}}_{t+1} = C_t \left[ \eta_t(\mathbf{r}_t) - \langle \eta_t'(\mathbf{r}_t) \rangle \mathbf{r}_t \right], \end{aligned}$

where $\mathbf{W}_t$ is the divergence-free LMMSE filter, $\eta_t(\cdot)$ is a componentwise denoiser matched to the signal prior, $\langle \eta_t'(\mathbf{r}_t) \rangle = \frac{1}{N}\sum_i \eta_t'(r_{t,i})$ is the average divergence, and $C_t$ is a normalization constant that restores unit-gain on $\mathbf{x}$ .

The subtraction of $\langle \eta_t' \rangle \mathbf{r}_t$ from $\eta_t(\mathbf{r}_t)$ is the divergence-free correction on the denoiser side. Together with the divergence-free $\mathbf{W}_t$ on the linear side, it enforces orthogonality between the two error components — hence "Orthogonal" AMP.

Theorem: OAMP State Evolution for RRI Matrices

Assume $\mathbf{A}$ is right-rotationally-invariant with asymptotic spectrum $\mu(\lambda^2)$ of $\mathbf{A}^{\mathsf{H}}\mathbf{A}$ , and $\mathbf{x}$ has i.i.d. entries with prior $p_X$ . Let $\mathbf{W}_t$ be the LMMSE filter

$\mathbf{W}_t = \frac{N}{\mathrm{tr}(\hat{\mathbf{W}}_t \mathbf{A})}\hat{\mathbf{W}}_t, \qquad \hat{\mathbf{W}}_t = \mathbf{A}^{\mathsf{H}}\left( \mathbf{A}\mathbf{A}^{\mathsf{H}} + \frac{\sigma^2}{v_t}\mathbf{I}\right)^{-1},$

and let $v_t = \mathbb{E}\|\hat{\mathbf{x}}_t - \mathbf{x}\|^2/N$ be the MSE at iteration $t$ . Then, in the large-system limit $N \to \infty$ with $M/N = \delta$ fixed, the pseudo-observation satisfies

$\mathbf{r}_t = \mathbf{x} + \tau_t \mathbf{z}_t, \qquad \mathbf{z}_t \sim \mathcal{N}(\mathbf{0},\mathbf{I}),$

with $\mathbf{z}_t$ independent of $\mathbf{x}$ , and the scalar state $\tau_t^2$ evolves according to

$\tau_t^2 = \mathcal{F}(v_t; \mu, \sigma^2), \qquad v_{t+1} = \mathcal{E}(\tau_t^2; p_X),$

where $\mathcal{F}$ is the linear-step transfer function (determined by the spectrum of $\mathbf{A}$ ) and $\mathcal{E}$ is the MSE of the denoiser at noise level $\tau_t^2$ .

The theorem says that OAMP's one-dimensional state-evolution description — the same kind of recursion that makes AMP tractable — survives for the entire right-rotationally-invariant class. You trade a matrix-vector product with $\mathbf{A}^{\mathsf{H}}$ for an LMMSE-style solve, and in return you get a much wider regime of validity.

Proof

Set up the error decomposition

Write $\mathbf{r}_t - \mathbf{x} = (\mathbf{W}_t \mathbf{A} - \mathbf{I})(\hat{\mathbf{x}}_t - \mathbf{x}) + \mathbf{W}_t \mathbf{w}$ . With the normalization $\mathrm{tr}(\mathbf{W}_t \mathbf{A}) = N$ , the first term has mean zero conditional on the previous iterate. The second term is a linear transformation of Gaussian noise.

Exploit rotational invariance of $\mathbf{V}$

Because $\mathbf{V}$ is Haar-distributed, conditioning on $\{\hat{\mathbf{x}}_s, \mathbf{r}_s\}_{s \leq t}$ leaves the residual basis uniformly distributed on the subspace orthogonal to the history. This is the conditional uniformity principle — the same tool used in the AMP state-evolution proof, extended to the RRI class.

Compute the asymptotic variance

Using the SVD $\mathbf{A} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{V}^{\mathsf{H}}$ and the Haar property of $\mathbf{V}$ , the conditional variance of $\mathbf{r}_t - \mathbf{x}$ reduces to a spectral integral:

$\tau_t^2 = \frac{1}{N}\mathrm{tr}\big(\mathbf{W}_t \mathbf{W}_t^{\mathsf{H}} \big)\sigma^2 \cdot g(v_t, \mu),$

where $g$ depends only on the spectrum $\mu$ , not on the realization. This is the function $\mathcal{F}$ .

Close the recursion with the denoiser MSE

For any Lipschitz componentwise denoiser $\eta_t$ , the law of large numbers gives $v_{t+1} = \mathbb{E}_{X \sim p_X, Z \sim \mathcal{N}(0,1)}[(\eta_t(X+\tau_t Z) - X)^2]$ , which is the scalar function $\mathcal{E}$ . The divergence-free correction on $\eta_t$ guarantees that $\hat{\mathbf{x}}_{t+1}$ is orthogonal to the next linear-step error, closing the loop.

,

Key Takeaway

OAMP replaces AMP's matched filter with an LMMSE filter and enforces orthogonality between the linear-step error and the denoising error. Orthogonality is the statistical condition that keeps the scalar state-evolution recursion correct, so OAMP generalizes cleanly beyond the i.i.d. Gaussian class to right-rotationally-invariant matrices.

⚠️Engineering Note

Computational Cost of the LMMSE Step

The LMMSE filter $\hat{\mathbf{W}}_t$ requires solving a linear system of the form $(\mathbf{A}\mathbf{A}^{\mathsf{H}} + c\mathbf{I})^{-1}\ntn{obs}$ . In general this costs $O(M^3)$ per iteration, which is the main practical barrier to deploying OAMP. For structured operators (Kronecker, partial DFT, subsampled unitary), the cost drops dramatically — we treat the Kronecker case in section 21.3.

In imaging pipelines where $M$ can be tens of thousands, the linear step typically dominates the runtime. Practical implementations use Woodbury identities, diagonalization in the SVD basis, or CG with a few inner iterations warm-started from the previous outer step.

Example: OAMP for a Partial Orthogonal Sensing Matrix

Consider $\mathbf{A} = \sqrt{N/M}\,\mathbf{S}\mathbf{F}$ where $\mathbf{F}$ is an $N \times N$ DFT matrix and $\mathbf{S}$ selects $M$ rows at random. This is a classic compressed-sensing operator. Compute the effective noise variance $\tau_t^2$ after one OAMP iteration when $v_t$ is the current MSE and $\sigma^2$ is the measurement noise.

Solution

Identify the spectrum

For the partial-DFT operator, $\mathbf{A}\mathbf{A}^{\mathsf{H}} = (N/M)\mathbf{I}_M$ after normalization, so all non-zero singular values equal $\sqrt{N/M}$ . The spectrum is a single mass point at $\lambda^2 = N/M$ , with a zero eigenvalue of multiplicity $N-M$ on the co-range.

Apply the LMMSE filter

The LMMSE filter becomes $\hat{\mathbf{W}}_t = \mathbf{A}^{\mathsf{H}}(\mathbf{A}\mathbf{A}^{\mathsf{H}} + (\sigma^2/v_t)\mathbf{I})^{-1}$ . With the single-mass spectrum, this is simply a rescaled $\mathbf{A}^{\mathsf{H}}$ with scalar factor $\alpha_t = 1/(N/M + \sigma^2/v_t)$ .

Compute $\tau_t^2$

After normalization to $\mathrm{tr}(\mathbf{W}_t \mathbf{A}) = N$ , the effective noise variance works out to

$\tau_t^2 = \left(\frac{1}{\delta} - 1\right) v_t + \frac{\sigma^2}{\delta \cdot (N/M)} = \left(\frac{1}{\delta}-1\right)v_t + \frac{\sigma^2}{1},$

where $\delta = M/N$ . The first term is the aliasing variance from the $N-M$ unmeasured directions (pure signal uncertainty), and the second is the amplified measurement noise. This clean additive decomposition is the hallmark of OAMP state evolution.

OAMP vs AMP on Structured Matrices

Compare the MSE trajectories of AMP (matched-filter linear step) and OAMP (LMMSE linear step) on three sensing ensembles: i.i.d. Gaussian, partial DFT, and column-correlated Gaussian. AMP tracks state evolution only for i.i.d. Gaussian; OAMP tracks it for all three.

Parameters

Sampling rate

M/N

0.5

Ratio of measurements to signal dimension

Sparsity

\rho

0.15

Fraction of non-zero entries in $\mathbf{x}$

SNR (dB)20

Sensing ensemble

Iterations20

OAMP with Bayes-Optimal Denoiser

Complexity: Dominated by the LMMSE solve:

O(M^3)

per iteration in general, or

O(M \log M)

for partial-DFT sensing, or

O(M^{3/2})

for separable Kronecker sensing (section 21.3).

Input: measurements y, sensing matrix A, noise variance sigma^2,

prior p_X, max iterations T, tolerance eps

Initialize: x_hat_0 = 0, v_0 = Var(X)

for t = 0, 1, ..., T-1:

# ----- Linear step (divergence-free LMMSE) -----

W_hat = A^H (A A^H + (sigma^2 / v_t) I)^(-1)

C_lin = N / trace(W_hat A)

W_t = C_lin * W_hat

r_t = x_hat_t + W_t (y - A x_hat_t)

tau_t_sq = computeTauSquared(v_t, spectrum(A), sigma^2)

# ----- Denoising step (divergence-free MMSE denoiser) -----

eta_r = MMSE_denoise(r_t, tau_t_sq, p_X)

div_eta = mean(derivative_of_MMSE(r_t, tau_t_sq, p_X))

C_dn = 1 / (1 - div_eta)

x_hat_next = C_dn * (eta_r - div_eta * r_t)

v_next = tau_t_sq * div_eta / (1 - div_eta)

if |v_next - v_t| < eps: break

v_t = v_next; x_hat_t = x_hat_next

return x_hat_t

The denominator $1 - \mathrm{div}\,\eta$ is exactly the "Onsager-free" rescaling that makes the denoised signal orthogonal to its input pseudo-observation.

Common Mistake: Forgetting the Divergence-Free Normalization

Mistake:

Implementing the LMMSE step as $\mathbf{r}_t = \hat{\mathbf{x}}_t + \hat{\mathbf{W}}_t(\ntn{obs} - \mathbf{A}\hat{\mathbf{x}}_t)$ without the scalar $C_t = N/\mathrm{tr}(\hat{\mathbf{W}}_t \mathbf{A})$ , or forgetting the analogous scaling on the denoiser side.

Correction:

Both normalizations are essential. Without $C_t$ on the linear side, the signal passes through the filter with a gain $\neq 1$ , so the denoiser is matched to the wrong effective prior. Without the $1/(1 - \mathrm{div}\,\eta)$ scaling on the denoiser side, the orthogonality condition fails and state evolution no longer tracks the empirical dynamics. In practice both missing factors manifest as a state-evolution curve that diverges from the simulated MSE.

OAMP as AMP with a Better Linear Step

Comparing AMP and OAMP side by side makes the difference crisp. AMP uses $\mathbf{W}_t^{\text{AMP}} = \mathbf{A}^{\mathsf{H}}$ (the matched filter, or "adjoint"), and pays for its bias with the Onsager term. OAMP uses the LMMSE $\mathbf{W}_t^{\text{OAMP}}$ , which is already approximately unbiased, and the residual bias is removed by the divergence-free normalization.

When $\mathbf{A}$ is i.i.d. Gaussian, the two filters are essentially equivalent in the large-system limit, which is why AMP works in that regime. When $\mathbf{A}$ has a non-flat spectrum, the matched filter introduces a coloring that AMP cannot undo, while OAMP's LMMSE filter whitens the residual. This is the full story.

Historical Note: From Turbo Equalization to OAMP

2017

The divergence-free-plus-denoiser architecture has deep roots in turbo equalization (Douillard, Jézéquel, Berrou, 1995) and in the concept of extrinsic information from iterative decoding: the output of each module must be statistically independent of its input, otherwise beliefs are reinforced rather than refreshed.

Junjie Ma and Li Ping's 2017 paper Orthogonal AMP carried this principle over to compressed sensing. Independently, Schniter, Rangan, and Fletcher arrived at essentially the same algorithm by a different route — expectation consistency — and called it VAMP (section 21.2). The two derivations are now understood as complementary views of a single algorithm. The name "OAMP" emphasizes orthogonality; "VAMP" emphasizes the message-passing factor structure. Both are correct and both are used.

OAMP (Orthogonal AMP)

A modification of AMP that replaces the matched-filter linear step with an LMMSE filter and enforces the divergence-free condition on both the linear estimator and the denoiser. OAMP converges and admits state evolution for the right-rotationally-invariant class of sensing matrices.

Divergence-free estimator

A linear or nonlinear estimator whose derivative (divergence) with respect to its input vanishes on average, ensuring that the output error is orthogonal to the input noise. OAMP enforces this by trace-normalizing the LMMSE filter and subtracting $\langle \eta' \rangle \mathbf{r}_t$ from the denoiser output.

Related: OAMP (Orthogonal AMP)

Right-rotationally-invariant matrix

A random matrix whose distribution is invariant under right-multiplication by any unitary matrix. Equivalently, its right singular basis is Haar-distributed. This class includes i.i.d. Gaussian matrices, random partial unitary / partial DFT matrices, and many other structured ensembles relevant to imaging.

Related: OAMP (Orthogonal AMP)

Quick Check

Which of the following statements about right-rotationally-invariant (RRI) sensing matrices is TRUE?

All i.i.d. Gaussian matrices are RRI, and the converse is also true.

RRI means the left singular basis $\mathbf{U}$ is Haar-distributed.

Partial DFT matrices are RRI because their right singular basis is uniformly random on the unitary group.

If $\mathbf{A}$ is RRI, then $\mathbf{A}^{\mathsf{H}}$ is also RRI.

Correction:

Partial DFT matrices are RRI because their right singular basis is uniformly random on the unitary group.

Yes — the random row selection makes $\\mathbf{V}$ effectively Haar-distributed on the subspace of interest, which is the defining property of the RRI class.

Why This Matters: Why OAMP Matters for RF Imaging

Imaging pipelines in the CommIT group build sensing matrices $\mathbf{A}$ from physical propagation: Kronecker products of steering-vector dictionaries, subsampled DFT rows from OFDM pilots, and near-field operators with slowly decaying singular-value spectra. None of these are i.i.d. Gaussian. Running plain AMP on such operators is numerically unstable, regardless of how many damping heuristics are stacked on top.

OAMP is the natural algorithmic home for these problems: it tolerates the non-flat spectrum, it admits principled state-evolution analysis (so we can predict reconstruction quality from the operator's singular values alone), and its LMMSE step can be computed efficiently by exploiting the Kronecker structure. This connection is developed in section 21.3 and reappears in Book 2 Chapter 27 where unrolled OAMP becomes the backbone of the RF imaging network.

See full treatment in Chapter 27، Section sec-lista-imaging

OAMP — Orthogonal Approximate Message Passing

Why OAMP?

Definition: Right-Rotationally-Invariant Matrix Ensemble

Definition: Divergence-Free Linear Estimator

Definition: OAMP Iteration

Theorem: OAMP State Evolution for RRI Matrices

Set up the error decomposition

Exploit rotational invariance of $\mathbf{V}$

Compute the asymptotic variance

Close the recursion with the denoiser MSE

Key Takeaway

Computational Cost of the LMMSE Step

Example: OAMP for a Partial Orthogonal Sensing Matrix

Identify the spectrum

Apply the LMMSE filter

Compute $\tau_t^2$

OAMP vs AMP on Structured Matrices

Parameters

OAMP with Bayes-Optimal Denoiser

Common Mistake: Forgetting the Divergence-Free Normalization

OAMP as AMP with a Better Linear Step

Historical Note: From Turbo Equalization to OAMP

OAMP (Orthogonal AMP)

Divergence-free estimator

Right-rotationally-invariant matrix

Quick Check

Why This Matters: Why OAMP Matters for RF Imaging

Definition:
Right-Rotationally-Invariant Matrix Ensemble

Definition:
Divergence-Free Linear Estimator

Definition:
OAMP Iteration