Ferkans — Interactive Telecom Tutor

The LMMSE Step Is the Bottleneck

OAMP and VAMP are attractive in theory — state evolution holds for the RRI class, convergence is fast, and the fixed point is Bayes-optimal. The price is the LMMSE step, which in the generic dense case costs $O(M^3)$ to set up and $O(MN)$ per application. For imaging problems with $M$ on the order of $10^4$ to $10^6$ , this is prohibitive.

The good news is that imaging sensing matrices are almost never generic. They carry physical structure — delays, angles of arrival, subcarrier-antenna Kronecker products — and this structure can be exploited to make the LMMSE solve essentially free. In this section we develop three practical tools: the Kronecker solve, the Hutchinson trace estimator for divergence, and a mismatch analysis that tells us how robust OAMP is to getting the prior wrong.

Definition:
Kronecker-Structured Sensing Matrix

A sensing matrix $\mathbf{A}$ is Kronecker-structured if

$\mathbf{A} = \mathbf{A}_1 \otimes \mathbf{A}_2, \qquad \mathbf{A}_1 \in \mathbb{C}^{M_1 \times N_1}, \; \mathbf{A}_2 \in \mathbb{C}^{M_2 \times N_2},$

so that $M = M_1 M_2$ and $N = N_1 N_2$ . Equivalently, the signal $\mathbf{x}$ can be reshaped into an $N_2 \times N_1$ matrix $\mathbf{X}$ , and the measurement is

$\mathbf{Y} = \mathbf{A}_2 \mathbf{X} \mathbf{A}_1^{\mathsf{T}}.$

This appears in RF imaging as a product of an angular dictionary and a delay dictionary, in MIMO-OFDM as a product of space and subcarrier operators, and more generally whenever the sensing process factors across two physical dimensions.

The key identity is $(\mathbf{A}_1 \otimes \mathbf{A}_2)(\mathbf{B}_1 \otimes \mathbf{B}_2) = (\mathbf{A}_1 \mathbf{B}_1)\otimes(\mathbf{A}_2 \mathbf{B}_2)$ , which propagates the Kronecker structure through products, inverses, and SVDs.

Theorem: Efficient LMMSE for Kronecker Sensing

Let $\mathbf{A} = \mathbf{A}_1 \otimes \mathbf{A}_2$ with SVDs $\mathbf{A}_i = \mathbf{U}_i \boldsymbol{\Lambda}_i \mathbf{V}_i^{\mathsf{H}}$ . Then the LMMSE filter

$\mathbf{W} = \mathbf{A}^{\mathsf{H}}(\mathbf{A}\mathbf{A}^{\mathsf{H}} + c\mathbf{I})^{-1}$

can be applied to any vector in time

$O(M_1 M_2 (N_1 + N_2) + M_1^2 M_2 + M_1 M_2^2),$

which is dominated by two small matrix products and a diagonal scaling in the SVD basis. For $M_1 \approx M_2 \approx \sqrt{M}$ , the cost is $O(M^{3/2})$ rather than $O(M^3)$ .

The singular values of $\mathbf{A}_1 \otimes \mathbf{A}_2$ are the pairwise products $\lambda_i^{(1)} \lambda_j^{(2)}$ , and the right singular basis is $\mathbf{V}_1 \otimes \mathbf{V}_2$ . So the LMMSE inverse is diagonal in the tensor-product basis, and the application reduces to small per-factor transforms.

Proof

Compute the product's SVD

Using $(\mathbf{A}_1 \otimes \mathbf{A}_2)(\mathbf{A}_1 \otimes \mathbf{A}_2)^{\mathsf{H}} = (\mathbf{A}_1\mathbf{A}_1^{\mathsf{H}}) \otimes (\mathbf{A}_2 \mathbf{A}_2^{\mathsf{H}})$ , the eigenvalues of $\mathbf{A}\mathbf{A}^{\mathsf{H}}$ are $\lambda_i^{(1)2} \lambda_j^{(2)2}$ with eigenvectors $\mathbf{u}_i^{(1)} \otimes \mathbf{u}_j^{(2)}$ .

Diagonalize the regularized inverse

In the tensor basis, $(\mathbf{A}\mathbf{A}^{\mathsf{H}} + c\mathbf{I})^{-1}$ becomes a diagonal matrix with entries $1/(\lambda_i^{(1)2}\lambda_j^{(2)2} + c)$ . Applying it is element-wise in this basis.

Translate back to the original basis

Applying $\mathbf{U}_1^{\mathsf{H}}\cdot \mathbf{U}_2^{\mathsf{H}}$ (acting row-wise and column-wise on $\mathbf{Y}$ ) costs $O(M_1^2 M_2)$ and $O(M_1 M_2^2)$ respectively. The sandwich $\mathbf{A}^{\mathsf{H}}(\cdot)$ contributes a symmetric cost along the signal-side factors. Summing gives the stated total.

Definition:
Hutchinson Trace Estimator

For any matrix $\mathbf{M} \in \mathbb{C}^{N\times N}$ , the trace admits the stochastic identity

$\mathrm{tr}(\mathbf{M}) = \mathbb{E}_{\boldsymbol{\epsilon}}[\boldsymbol{\epsilon}^{\mathsf{T}}\mathbf{M}\boldsymbol{\epsilon}],$

where the entries of $\boldsymbol{\epsilon}$ are i.i.d. with zero mean and unit variance (e.g., Rademacher or Gaussian). Averaging $K$ independent realizations gives an unbiased estimator

$\widehat{\mathrm{tr}(\mathbf{M})} = \frac{1}{K}\sum_{k=1}^{K} \boldsymbol{\epsilon}_k^{\mathsf{T}} \mathbf{M} \boldsymbol{\epsilon}_k,$

whose variance scales as $O(\|\mathbf{M}\|_F^2 / K)$ for Rademacher probes.

The estimator does not require access to the entries of $\mathbf{M}$ — only the ability to compute matrix-vector products $\mathbf{M}\boldsymbol{\epsilon}$ . This is decisive when $\mathbf{M}$ is the composition $\mathbf{W}_t \mathbf{A}$ with a matrix-free LMMSE solver.

Hutchinson Estimator for Denoiser Divergence

Complexity:

K

calls to the denoiser plus

O(KN)

inner products. Since the denoiser is separable, each call is

O(N)

, giving total

O(KN)

— negligible compared to the linear step when

K \leq 10

.

Input: pseudo-observation r_t, denoiser eta, number of probes K

Output: estimate of div_eta = (1/N) sum_i d eta_i / d r_i

acc = 0

for k = 1 to K:

# Draw Rademacher probe

eps = sign(randn(N))

# Finite-difference JVP (Jacobian-vector product)

delta = 1e-3

dEta = (eta(r_t + delta * eps) - eta(r_t)) / delta

acc += (eps^T @ dEta) / N

return acc / K

For smooth denoisers (soft-thresholding, MMSE on Bernoulli-Gaussian) a closed-form divergence is available and preferred. The Hutchinson estimator is the method of last resort for black-box / learned denoisers where no closed form exists.

Example: Variance of the Hutchinson Estimator

For a diagonal matrix $\mathbf{M} = \mathrm{diag}(m_1,\ldots,m_N)$ with Rademacher probes, compute the mean and variance of the single-probe estimator $T(\boldsymbol{\epsilon}) = \boldsymbol{\epsilon}^{\mathsf{T}}\mathbf{M}\boldsymbol{\epsilon}$ . What does this tell us about choosing $K$ ?

Solution

Expand the quadratic form

$T = \sum_{i,j}\epsilon_i \epsilon_j M_{ij} = \sum_i \epsilon_i^2 m_i + \sum_{i \neq j}\epsilon_i \epsilon_j M_{ij}$ . For Rademacher probes $\epsilon_i^2 = 1$ , so $T = \sum_i m_i + \sum_{i\neq j}\epsilon_i\epsilon_j M_{ij}$ . For diagonal $\mathbf{M}$ the cross terms vanish, giving $T = \sum_i m_i = \mathrm{tr}(\mathbf{M})$ , exactly, with zero variance.

Handle the general case

For non-diagonal $\mathbf{M}$ , $\mathbb{E}[\epsilon_i\epsilon_j]=0$ when $i \neq j$ (Rademacher independence), so $\mathbb{E}[T] = \mathrm{tr}(\mathbf{M})$ . The variance is $\mathrm{Var}(T) = 2\sum_{i\neq j}|M_{ij}|^2 = 2(\|\mathbf{M}\|_F^2 - \sum_i M_{ii}^2)$ .

Choose $K$ for target accuracy

To achieve relative error $\delta$ with high probability, Chebyshev's inequality gives $K \approx 2\|\mathbf{M}\|_F^2 / (\delta \cdot \mathrm{tr}(\mathbf{M}))^2$ . For "balanced" Jacobians this is typically $K = 3 \text{ to } 10$ — cheap enough to use at every iteration.

Theorem: Mismatch Penalty for OAMP

Let OAMP be run with a denoiser $\eta_t^{\text{assumed}}$ matched to an assumed prior $\tilde{p}_X$ , while the true prior is $p_X^{\text{true}}$ . In the large-system limit, the MSE fixed point satisfies

$\mathrm{MSE}_{\text{mismatch}} \geq \mathrm{MSE}_{\text{Bayes}} + \Delta,$

where the penalty $\Delta$ is non-negative and equals zero if and only if $\tilde{p}_X = p_X^{\text{true}}$ almost everywhere. A first-order expansion in the KL discrepancy gives

$\Delta \approx \frac{1}{2}\,\mathbb{E}_{\text{true}}\!\left[\left(\frac{\partial \log p_X^{\text{true}}}{\partial x} - \frac{\partial \log \tilde{p}_X}{\partial x}\right)^{\!2}\right]\cdot \tau_\star^2,$

where $\tau_\star^2$ is the fixed-point effective noise variance.

Assumption mismatch costs MSE, and it costs MSE linearly in the squared score-function error. Overconfident sparsity (assuming $\rho$ lower than truth) is typically more damaging than underconfident sparsity — a folklore observation made precise by this bound.

Proof

Fixed-point equation with mismatched denoiser

At the OAMP fixed point, the effective MSE $v_\star$ satisfies $v_\star = \mathcal{E}_{\tilde{p}}(\tau_\star)$ where $\mathcal{E}_{\tilde{p}}$ is the MSE of the assumed-prior denoiser evaluated under the true data distribution.

Compare with the Bayes-optimal fixed point

For the Bayes fixed point, the same equation holds with $\mathcal{E}_{p^{\text{true}}}$ . Subtract: $\Delta = \mathcal{E}_{\tilde{p}}(\tau_\star) - \mathcal{E}_{p^{\text{true}}}(\tau_\star) \geq 0$ by the optimality of the MMSE estimator.

First-order expansion

Expanding the denoiser difference around the true posterior and using the I-MMSE identity (Guo-Shamai-Verdú) gives the closed-form bound in terms of score-function discrepancy. Details are standard in the mismatched-estimation literature.

OAMP Mismatch Analysis

Explore the MSE penalty of running OAMP with an assumed Bernoulli-Gaussian sparsity $\tilde{\rho}$ when the true sparsity is $\rho^\star$ . Overestimating sparsity (too small $\tilde{\rho}$ ) is typically worse than underestimating it.

Parameters

True sparsity

\rho^\star

0.15

Sampling rate

M/N

0.5

SNR (dB)25

Kronecker LMMSE Speedup

Compare the wall-clock cost of a dense LMMSE solve against the factorized Kronecker solve as a function of the problem dimension. The asymptotic $M^3 \to M^{3/2}$ speedup is visible already for modest $M$ .

Parameters

\log_{10}(M_{\max})

3.5

⚠️Engineering Note

Calibrating the Assumed Prior in Practice

In deployed imaging systems the true prior is never perfectly known. Two strategies mitigate mismatch:

EM tuning (see section 21.4): alternate OAMP steps with maximum-likelihood updates of $(\tilde{\rho}, \tilde{\sigma}_x^2, \sigma^2)$ , refining the assumed prior from the data.
Conservative default: if $\tilde{\rho}$ is uncertain, choose it on the larger side. Underestimating sparsity degrades performance gracefully; overestimating it can cause premature thresholding and support errors.

Either strategy costs some Bayes-optimality but buys robustness against distributional drift between calibration and deployment.

🎓CommIT Contribution(2023)

Kronecker-Structured OAMP for RF Imaging

M. Dehkordi, P. Jung, G. Caire — IEEE Trans. Wireless Commun. (submitted)

The CommIT group's RF-imaging pipeline uses an OAMP iteration in which the LMMSE step is implemented by exploiting the angular $\otimes$ delay $\otimes$ subcarrier factorization of the physical sensing operator. Together with a Hutchinson estimator for the divergence of a learned denoiser, this makes OAMP feasible at imaging scales ( $M \sim 10^5$ ) where a dense solve would be infeasible. The resulting algorithm is the backbone of the unrolled-network reconstruction in Book 2 Chapter 27.

rf-imagingoampkronecker

Common Mistake: Kronecker Row/Column Conventions

Mistake:

Writing $\mathrm{vec}(\mathbf{Y}) = (\mathbf{A}_1 \otimes \mathbf{A}_2) \mathrm{vec}(\mathbf{X})$ without checking the vectorization convention (column-major vs row-major). A single index flip will silently swap the two Kronecker factors and produce a correct-looking but completely wrong result.

Correction:

The standard identity is $\mathrm{vec}(\mathbf{A}\mathbf{X}\mathbf{B}) = (\mathbf{B}^{\mathsf{T}} \otimes \mathbf{A})\mathrm{vec}(\mathbf{X})$ under column-major vectorization. NumPy / PyTorch default to row-major, so the identity becomes $\mathrm{vec}(\mathbf{A}\mathbf{X}\mathbf{B}) = (\mathbf{A} \otimes \mathbf{B}^{\mathsf{T}})\mathrm{vec}(\mathbf{X})$ in those libraries. Always write out the first three indices by hand for a $2\times 2$ test case before committing the code.

When the Structure Is Only Approximate

In practice the sensing operator is rarely exactly Kronecker. A realistic imaging pipeline has small perturbations — mutual coupling between antennas, phase noise across subcarriers, calibration errors — that break the separability. Two approaches keep things tractable:

Absorb the perturbation into the prior by modeling it as correlated additional noise, which keeps the LMMSE solve factorized.
Use a Kronecker preconditioner for an iterative CG solve of the exact LMMSE system. CG typically converges in 5-20 inner iterations when the preconditioner captures the dominant spectral structure.

Neither approach is uniformly best — the trade-off depends on how severely the structure is broken. A healthy implementation allows the user to switch between them.

Quick Check

If $\mathbf{A}_1$ is $4\times 4$ with singular values $(4,2,1,0.5)$ and $\mathbf{A}_2$ is $3\times 3$ with singular values $(3,1,0.5)$ , what is the largest singular value of $\mathbf{A}_1 \otimes \mathbf{A}_2$ ?

$7$

$12$

$4$

$3$

Correction:

12

The singular values of $\\mathbf{A}_1 \\otimes \\mathbf{A}_2$ are the pairwise products of the component singular values. The largest product is $4 \\times 3 = 12$ .

Hutchinson trace estimator

An unbiased randomized estimator of $\mathrm{tr}(\mathbf{M})$ that requires only matrix-vector products with $\mathbf{M}$ . Used in OAMP to estimate the denoiser divergence when no closed-form is available.

Related: Divergence-free estimator

Practical OAMP for Structured Sensing Matrices

The LMMSE Step Is the Bottleneck

Definition: Kronecker-Structured Sensing Matrix

Theorem: Efficient LMMSE for Kronecker Sensing

Compute the product's SVD

Diagonalize the regularized inverse

Translate back to the original basis

Definition: Hutchinson Trace Estimator

Hutchinson Estimator for Denoiser Divergence

Example: Variance of the Hutchinson Estimator

Expand the quadratic form

Handle the general case

Choose $K$ for target accuracy

Theorem: Mismatch Penalty for OAMP

Fixed-point equation with mismatched denoiser

Compare with the Bayes-optimal fixed point

First-order expansion

OAMP Mismatch Analysis

Parameters

Kronecker LMMSE Speedup

Parameters

Calibrating the Assumed Prior in Practice

Kronecker-Structured OAMP for RF Imaging

Common Mistake: Kronecker Row/Column Conventions

When the Structure Is Only Approximate

Quick Check

Hutchinson trace estimator

Definition:
Kronecker-Structured Sensing Matrix

Definition:
Hutchinson Trace Estimator