Practical OAMP for Structured Sensing Matrices

The LMMSE Step Is the Bottleneck

OAMP and VAMP are attractive in theory β€” state evolution holds for the RRI class, convergence is fast, and the fixed point is Bayes-optimal. The price is the LMMSE step, which in the generic dense case costs O(M3)O(M^3) to set up and O(MN)O(MN) per application. For imaging problems with MM on the order of 10410^4 to 10610^6, this is prohibitive.

The good news is that imaging sensing matrices are almost never generic. They carry physical structure β€” delays, angles of arrival, subcarrier-antenna Kronecker products β€” and this structure can be exploited to make the LMMSE solve essentially free. In this section we develop three practical tools: the Kronecker solve, the Hutchinson trace estimator for divergence, and a mismatch analysis that tells us how robust OAMP is to getting the prior wrong.

Definition:

Kronecker-Structured Sensing Matrix

A sensing matrix A\mathbf{A} is Kronecker-structured if

A=A1βŠ—A2,A1∈CM1Γ—N1,β€…β€ŠA2∈CM2Γ—N2,\mathbf{A} = \mathbf{A}_1 \otimes \mathbf{A}_2, \qquad \mathbf{A}_1 \in \mathbb{C}^{M_1 \times N_1}, \; \mathbf{A}_2 \in \mathbb{C}^{M_2 \times N_2},

so that M=M1M2M = M_1 M_2 and N=N1N2N = N_1 N_2. Equivalently, the signal x\mathbf{x} can be reshaped into an N2Γ—N1N_2 \times N_1 matrix X\mathbf{X}, and the measurement is

Y=A2XA1T.\mathbf{Y} = \mathbf{A}_2 \mathbf{X} \mathbf{A}_1^{\mathsf{T}}.

This appears in RF imaging as a product of an angular dictionary and a delay dictionary, in MIMO-OFDM as a product of space and subcarrier operators, and more generally whenever the sensing process factors across two physical dimensions.

The key identity is (A1βŠ—A2)(B1βŠ—B2)=(A1B1)βŠ—(A2B2)(\mathbf{A}_1 \otimes \mathbf{A}_2)(\mathbf{B}_1 \otimes \mathbf{B}_2) = (\mathbf{A}_1 \mathbf{B}_1)\otimes(\mathbf{A}_2 \mathbf{B}_2), which propagates the Kronecker structure through products, inverses, and SVDs.

Theorem: Efficient LMMSE for Kronecker Sensing

Let A=A1βŠ—A2\mathbf{A} = \mathbf{A}_1 \otimes \mathbf{A}_2 with SVDs Ai=UiΞ›iViH\mathbf{A}_i = \mathbf{U}_i \boldsymbol{\Lambda}_i \mathbf{V}_i^{\mathsf{H}}. Then the LMMSE filter

W=AH(AAH+cI)βˆ’1\mathbf{W} = \mathbf{A}^{\mathsf{H}}(\mathbf{A}\mathbf{A}^{\mathsf{H}} + c\mathbf{I})^{-1}

can be applied to any vector in time

O(M1M2(N1+N2)+M12M2+M1M22),O(M_1 M_2 (N_1 + N_2) + M_1^2 M_2 + M_1 M_2^2),

which is dominated by two small matrix products and a diagonal scaling in the SVD basis. For M1β‰ˆM2β‰ˆMM_1 \approx M_2 \approx \sqrt{M}, the cost is O(M3/2)O(M^{3/2}) rather than O(M3)O(M^3).

The singular values of A1βŠ—A2\mathbf{A}_1 \otimes \mathbf{A}_2 are the pairwise products Ξ»i(1)Ξ»j(2)\lambda_i^{(1)} \lambda_j^{(2)}, and the right singular basis is V1βŠ—V2\mathbf{V}_1 \otimes \mathbf{V}_2. So the LMMSE inverse is diagonal in the tensor-product basis, and the application reduces to small per-factor transforms.

Definition:

Hutchinson Trace Estimator

For any matrix M∈CNΓ—N\mathbf{M} \in \mathbb{C}^{N\times N}, the trace admits the stochastic identity

tr(M)=EΟ΅[Ο΅TMΟ΅],\mathrm{tr}(\mathbf{M}) = \mathbb{E}_{\boldsymbol{\epsilon}}[\boldsymbol{\epsilon}^{\mathsf{T}}\mathbf{M}\boldsymbol{\epsilon}],

where the entries of Ο΅\boldsymbol{\epsilon} are i.i.d. with zero mean and unit variance (e.g., Rademacher or Gaussian). Averaging KK independent realizations gives an unbiased estimator

tr(M)^=1Kβˆ‘k=1KΟ΅kTMΟ΅k,\widehat{\mathrm{tr}(\mathbf{M})} = \frac{1}{K}\sum_{k=1}^{K} \boldsymbol{\epsilon}_k^{\mathsf{T}} \mathbf{M} \boldsymbol{\epsilon}_k,

whose variance scales as O(βˆ₯Mβˆ₯F2/K)O(\|\mathbf{M}\|_F^2 / K) for Rademacher probes.

The estimator does not require access to the entries of M\mathbf{M} β€” only the ability to compute matrix-vector products MΟ΅\mathbf{M}\boldsymbol{\epsilon}. This is decisive when M\mathbf{M} is the composition WtA\mathbf{W}_t \mathbf{A} with a matrix-free LMMSE solver.

Hutchinson Estimator for Denoiser Divergence

Complexity: KK calls to the denoiser plus O(KN)O(KN) inner products. Since the denoiser is separable, each call is O(N)O(N), giving total O(KN)O(KN) β€” negligible compared to the linear step when K≀10K \leq 10.
Input: pseudo-observation r_t, denoiser eta, number of probes K
Output: estimate of div_eta = (1/N) sum_i d eta_i / d r_i
acc = 0
for k = 1 to K:
# Draw Rademacher probe
eps = sign(randn(N))
# Finite-difference JVP (Jacobian-vector product)
delta = 1e-3
dEta = (eta(r_t + delta * eps) - eta(r_t)) / delta
acc += (eps^T @ dEta) / N
return acc / K

For smooth denoisers (soft-thresholding, MMSE on Bernoulli-Gaussian) a closed-form divergence is available and preferred. The Hutchinson estimator is the method of last resort for black-box / learned denoisers where no closed form exists.

Example: Variance of the Hutchinson Estimator

For a diagonal matrix M=diag(m1,…,mN)\mathbf{M} = \mathrm{diag}(m_1,\ldots,m_N) with Rademacher probes, compute the mean and variance of the single-probe estimator T(Ο΅)=Ο΅TMΟ΅T(\boldsymbol{\epsilon}) = \boldsymbol{\epsilon}^{\mathsf{T}}\mathbf{M}\boldsymbol{\epsilon}. What does this tell us about choosing KK?

Theorem: Mismatch Penalty for OAMP

Let OAMP be run with a denoiser Ξ·tassumed\eta_t^{\text{assumed}} matched to an assumed prior p~X\tilde{p}_X, while the true prior is pXtruep_X^{\text{true}}. In the large-system limit, the MSE fixed point satisfies

MSEmismatchβ‰₯MSEBayes+Ξ”,\mathrm{MSE}_{\text{mismatch}} \geq \mathrm{MSE}_{\text{Bayes}} + \Delta,

where the penalty Ξ”\Delta is non-negative and equals zero if and only if p~X=pXtrue\tilde{p}_X = p_X^{\text{true}} almost everywhere. A first-order expansion in the KL discrepancy gives

Ξ”β‰ˆ12 Etrue ⁣[(βˆ‚log⁑pXtrueβˆ‚xβˆ’βˆ‚log⁑p~Xβˆ‚x) ⁣2]⋅τ⋆2,\Delta \approx \frac{1}{2}\,\mathbb{E}_{\text{true}}\!\left[\left(\frac{\partial \log p_X^{\text{true}}}{\partial x} - \frac{\partial \log \tilde{p}_X}{\partial x}\right)^{\!2}\right]\cdot \tau_\star^2,

where τ⋆2\tau_\star^2 is the fixed-point effective noise variance.

Assumption mismatch costs MSE, and it costs MSE linearly in the squared score-function error. Overconfident sparsity (assuming ρ\rho lower than truth) is typically more damaging than underconfident sparsity β€” a folklore observation made precise by this bound.

OAMP Mismatch Analysis

Explore the MSE penalty of running OAMP with an assumed Bernoulli-Gaussian sparsity ρ~\tilde{\rho} when the true sparsity is ρ⋆\rho^\star. Overestimating sparsity (too small ρ~\tilde{\rho}) is typically worse than underestimating it.

Parameters
0.15
0.5
25

Kronecker LMMSE Speedup

Compare the wall-clock cost of a dense LMMSE solve against the factorized Kronecker solve as a function of the problem dimension. The asymptotic M3β†’M3/2M^3 \to M^{3/2} speedup is visible already for modest MM.

Parameters
3.5
⚠️Engineering Note

Calibrating the Assumed Prior in Practice

In deployed imaging systems the true prior is never perfectly known. Two strategies mitigate mismatch:

  1. EM tuning (see section 21.4): alternate OAMP steps with maximum-likelihood updates of (ρ~,Οƒ~x2,Οƒ2)(\tilde{\rho}, \tilde{\sigma}_x^2, \sigma^2), refining the assumed prior from the data.
  2. Conservative default: if ρ~\tilde{\rho} is uncertain, choose it on the larger side. Underestimating sparsity degrades performance gracefully; overestimating it can cause premature thresholding and support errors.

Either strategy costs some Bayes-optimality but buys robustness against distributional drift between calibration and deployment.

πŸŽ“CommIT Contribution(2023)

Kronecker-Structured OAMP for RF Imaging

M. Dehkordi, P. Jung, G. Caire β€” IEEE Trans. Wireless Commun. (submitted)

The CommIT group's RF-imaging pipeline uses an OAMP iteration in which the LMMSE step is implemented by exploiting the angular βŠ—\otimes delay βŠ—\otimes subcarrier factorization of the physical sensing operator. Together with a Hutchinson estimator for the divergence of a learned denoiser, this makes OAMP feasible at imaging scales (M∼105M \sim 10^5) where a dense solve would be infeasible. The resulting algorithm is the backbone of the unrolled-network reconstruction in Book 2 Chapter 27.

rf-imagingoampkronecker

Common Mistake: Kronecker Row/Column Conventions

Mistake:

Writing vec(Y)=(A1βŠ—A2)vec(X)\mathrm{vec}(\mathbf{Y}) = (\mathbf{A}_1 \otimes \mathbf{A}_2) \mathrm{vec}(\mathbf{X}) without checking the vectorization convention (column-major vs row-major). A single index flip will silently swap the two Kronecker factors and produce a correct-looking but completely wrong result.

Correction:

The standard identity is vec(AXB)=(BTβŠ—A)vec(X)\mathrm{vec}(\mathbf{A}\mathbf{X}\mathbf{B}) = (\mathbf{B}^{\mathsf{T}} \otimes \mathbf{A})\mathrm{vec}(\mathbf{X}) under column-major vectorization. NumPy / PyTorch default to row-major, so the identity becomes vec(AXB)=(AβŠ—BT)vec(X)\mathrm{vec}(\mathbf{A}\mathbf{X}\mathbf{B}) = (\mathbf{A} \otimes \mathbf{B}^{\mathsf{T}})\mathrm{vec}(\mathbf{X}) in those libraries. Always write out the first three indices by hand for a 2Γ—22\times 2 test case before committing the code.

When the Structure Is Only Approximate

In practice the sensing operator is rarely exactly Kronecker. A realistic imaging pipeline has small perturbations β€” mutual coupling between antennas, phase noise across subcarriers, calibration errors β€” that break the separability. Two approaches keep things tractable:

  1. Absorb the perturbation into the prior by modeling it as correlated additional noise, which keeps the LMMSE solve factorized.
  2. Use a Kronecker preconditioner for an iterative CG solve of the exact LMMSE system. CG typically converges in 5-20 inner iterations when the preconditioner captures the dominant spectral structure.

Neither approach is uniformly best β€” the trade-off depends on how severely the structure is broken. A healthy implementation allows the user to switch between them.

Quick Check

If A1\mathbf{A}_1 is 4Γ—44\times 4 with singular values (4,2,1,0.5)(4,2,1,0.5) and A2\mathbf{A}_2 is 3Γ—33\times 3 with singular values (3,1,0.5)(3,1,0.5), what is the largest singular value of A1βŠ—A2\mathbf{A}_1 \otimes \mathbf{A}_2?

77

1212

44

33

Hutchinson trace estimator

An unbiased randomized estimator of tr(M)\mathrm{tr}(\mathbf{M}) that requires only matrix-vector products with M\mathbf{M}. Used in OAMP to estimate the denoiser divergence when no closed-form is available.

Related: Divergence-free estimator