Multi-Layer Inference

Structured Scene Priors from Generative Models

The Bernoulli-Gaussian prior used in Chapters 17–19 captures coordinate-wise sparsity but ignores spatial structure: neighboring voxels in a real scene are correlated, extended objects have smooth boundaries, and urban environments have characteristic textures.

A more powerful approach is to model the scene c\mathbf{c} as the output of a deep generative model trained on realistic scene data:

c=G(z(L)),z(β„“βˆ’1)=fβ„“(A(β„“)z(β„“))Β forΒ β„“=L,…,1,\mathbf{c} = G(\mathbf{z}^{(L)}), \quad \mathbf{z}^{(\ell-1)} = f_\ell(\mathbf{A}^{(\ell)}\mathbf{z}^{(\ell)}) \text{ for } \ell = L, \ldots, 1,

where z(L)\mathbf{z}^{(L)} is a low-dimensional latent code. ML-VAMP (Multi-Layer VAMP) integrates this generative prior directly into the message-passing inference framework, yielding a principled alternative to PnP (Chapter 21) that is jointly trained with the sensing model.

,

Definition:

Multi-Layer Generalized Linear Model

A multi-layer GLM with LL layers has the form:

z(0)=c,\mathbf{z}^{(0)} = \mathbf{c}, z(β„“)=gβ„“(A(β„“)z(β„“βˆ’1)),β„“=1,…,L,\mathbf{z}^{(\ell)} = g_\ell(\mathbf{A}^{(\ell)}\mathbf{z}^{(\ell-1)}), \quad \ell = 1, \ldots, L, y∼p(y∣z(L)),\mathbf{y} \sim p(\mathbf{y} \mid \mathbf{z}^{(L)}),

where:

  • A(β„“)∈RNβ„“Γ—Nβ„“βˆ’1\mathbf{A}^{(\ell)} \in \mathbb{R}^{N_\ell \times N_{\ell-1}} is the β„“\ell-th layer's linear mixing matrix (possibly random or structured).
  • gβ„“(β‹…)g_\ell(\cdot) is an element-wise non-linear activation function (ReLU, sign, magnitude-squared, etc.).
  • z(L)\mathbf{z}^{(L)} is the latent code with a simple prior p(z(L))=∏ipL(zi(L))p(\mathbf{z}^{(L)}) = \prod_i p_L(z_i^{(L)}).

Special cases:

  • L=1L = 1: Standard GAMP (single-layer GLM).
  • L=2L = 2: y=Aobsz(1)+w\mathbf{y} = \mathbf{A}_{\text{obs}}\mathbf{z}^{(1)} + \mathbf{w} with z(1)=ReLU(A(1)c)\mathbf{z}^{(1)} = \text{ReLU}(\mathbf{A}^{(1)}\mathbf{c}) (one-hidden-layer generator).
  • Deep VAE/flow: L=5L = 5–2020, with NLβ‰ͺN0N_L \ll N_0.

ML-VAMP: Multi-Layer Vector AMP

Complexity: O ⁣(βˆ‘β„“=0LNβ„“βˆ’1Nβ„“β‹…T)O\!\left(\sum_{\ell=0}^{L} N_{\ell-1} N_\ell \cdot T\right) per full forward-backward pass. For deep networks with Nβ„“βˆΌN/2β„“N_\ell \sim N/2^\ell, total cost β‰ˆ2 MN T\approx 2\,MN\,T.
Input: Measurements y\mathbf{y}, sensing matrix Aobs\mathbf{A}_{\text{obs}},
layer matrices {A(β„“)}\{\mathbf{A}^{(\ell)}\}, activations {gβ„“}\{g_\ell\}
Output: Posterior mean estimates {z^(β„“)}\{\hat{\mathbf{z}}^{(\ell)}\}, c^\hat{\mathbf{c}}
Initialize: z^(β„“)=0\hat{\mathbf{z}}^{(\ell)} = \mathbf{0}, Ο„(β„“)=1\tau^{(\ell)} = 1 for all β„“\ell
for t=1,2,…t = 1, 2, \ldots until convergence do
Forward pass (layer β„“=0β†’L\ell = 0 \to L):
1. Compute linear output: u^(β„“)=A(β„“)z^(β„“βˆ’1)\hat{\mathbf{u}}^{(\ell)} = \mathbf{A}^{(\ell)}\hat{\mathbf{z}}^{(\ell-1)}
2. Apply activation denoiser (MMSE under p(z(β„“)∣u^(β„“),Ο„(β„“βˆ’1))p(\mathbf{z}^{(\ell)} | \hat{\mathbf{u}}^{(\ell)}, \tau^{(\ell-1)}))
3. Compute extrinsic message to next layer: Ο„fwd(β„“)\tau^{(\ell)}_{\text{fwd}}
Backward pass (layer ℓ=L→0\ell = L \to 0):
4. Receive backward message from layer β„“+1\ell+1
5. Update z^(β„“)\hat{\mathbf{z}}^{(\ell)} using VAMP linear estimator:
z^(β„“)←(A(β„“)TA(β„“)+Ξ³(β„“)I)βˆ’1(A(β„“)Tr^bwd(β„“)+Ξ³(β„“)z^fwd(β„“))\hat{\mathbf{z}}^{(\ell)} \leftarrow (\mathbf{A}^{(\ell)T}\mathbf{A}^{(\ell)} + \gamma^{(\ell)}\mathbf{I})^{-1} (\mathbf{A}^{(\ell)T}\hat{\mathbf{r}}^{(\ell)}_{\text{bwd}} + \gamma^{(\ell)}\hat{\mathbf{z}}^{(\ell)}_{\text{fwd}})
6. Update Ο„(β„“)\tau^{(\ell)} and pass backward extrinsic message
Apply observation likelihood (at layer LL):
7. gout(ym,p^m,Ο„p)g_{\text{out}}(y_m, \hat{p}_m, \tau_p) β†’ update z^(L)\hat{\mathbf{z}}^{(L)}
end for

The VAMP linear estimator at each layer requires solving an Nβ„“Γ—Nβ„“N_\ell \times N_\ell linear system. When A(β„“)\mathbf{A}^{(\ell)} has SVD-friendly structure (DFT, random orthonormal), this reduces to O(Nβ„“log⁑Nβ„“)O(N_\ell \log N_\ell) using the Kronecker techniques from Chapter 18.

,

Theorem: State Evolution for ML-VAMP

Under i.i.d. Gaussian A(β„“)\mathbf{A}^{(\ell)} at each layer, the ML-VAMP state variables (Ο„fwd(β„“,t),Ο„bwd(β„“,t))(\tau_{\text{fwd}}^{(\ell,t)}, \tau_{\text{bwd}}^{(\ell,t)}) satisfy a coupled system of state evolution recursions:

For each layer β„“=1,…,L\ell = 1, \ldots, L:

Ο„fwd(β„“,t)=Ο„fwd(β„“βˆ’1,t)Ξ΄β„“,Ο„bwd(β„“βˆ’1,t+1)=Ο„bwd(β„“,t+1)Ξ΄β„“,\tau_{\text{fwd}}^{(\ell,t)} = \frac{\tau_{\text{fwd}}^{(\ell-1,t)}}{\delta_\ell}, \quad \tau_{\text{bwd}}^{(\ell-1,t+1)} = \frac{\tau_{\text{bwd}}^{(\ell,t+1)}}{\delta_\ell},

where Ξ΄β„“=Nβ„“/Nβ„“βˆ’1\delta_\ell = N_\ell / N_{\ell-1} is the compression ratio at layer β„“\ell, and the activation MSE at each layer satisfies:

E[(gin,β„“(X+Ο„bwd(β„“) Z,Ο„bwd(β„“))βˆ’X)2]=fβ„“(Ο„fwd(β„“),Ο„bwd(β„“)).\mathbb{E}[(g_{\text{in},\ell}(X + \sqrt{\tau_{\text{bwd}}^{(\ell)}}\,Z, \tau_{\text{bwd}}^{(\ell)}) - X)^2] = f_\ell(\tau_{\text{fwd}}^{(\ell)}, \tau_{\text{bwd}}^{(\ell)}).

The system of SE equations couples all layers simultaneously.

Each layer in ML-VAMP acts as an independent VAMP module, passing Gaussian messages up and down. The compression ratios Ξ΄β„“\delta_\ell multiply the effective noise variance at each level: a deep generator with Ξ΄β„“<1\delta_\ell < 1 at each layer progressively reduces the effective signal dimension, concentrating the prior information and improving reconstruction.

,

Example: ML-VAMP for Scene Recovery with VAE Prior

A two-layer generative model produces RF scenes: c=D ReLU(A(1)z)\mathbf{c} = \mathbf{D}\,\text{ReLU}(\mathbf{A}^{(1)}\mathbf{z}), where D∈RNΓ—K\mathbf{D} \in \mathbb{R}^{N \times K} is a learned dictionary (K<NK < N, K/N=0.25K/N = 0.25), A(1)∈RKΓ—J\mathbf{A}^{(1)} \in \mathbb{R}^{K \times J} (J<KJ < K, J/K=0.25J/K = 0.25), and z∼N(0,IJ)\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_J).

The observation is y=Ac+w\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w} with M/N=0.3M/N = 0.3 (highly compressed).

Compare ML-VAMP (exploits the generative structure) against: (a) GAMP with i.i.d. Gaussian prior (ignores structure), (b) GAMP with BG prior (correct sparsity, ignores spatial structure).

,

ML-VAMP: NMSE vs Iteration for Different Layer Counts

NMSE convergence curves for ML-VAMP with L=1L = 1 to L=4L = 4 generative layers. More layers compress the latent representation, improving reconstruction at low oversampling ratios by exploiting the generative structure.

The oversampling ratio M/NM/N is set below the standard CS phase transition, where single-layer GAMP fails but multi-layer inference succeeds.

Parameters
2
0.5

Definition:

Bilinear GAMP (BiG-AMP) β€” Special Case

When the sensing matrix A\mathbf{A} is unknown β€” e.g., calibration parameters, unknown polarization response, or an uncharacterized channel β€” the imaging model becomes bilinear:

Y=Ac+w,AΒ unknown,β€…β€ŠcΒ unknown.\mathbf{Y} = \mathbf{A}\mathbf{c} + \mathbf{w}, \quad \mathbf{A}\text{ unknown},\; \mathbf{c}\text{ unknown.}

BiG-AMP (Bilinear GAMP) treats this as a two-layer ML-GAMP: Y=F(A)β‹…G(c)+w\mathbf{Y} = \mathbf{F}(\mathbf{A}) \cdot G(\mathbf{c}) + \mathbf{w} and alternates between:

  1. Estimating c\mathbf{c} given A^\hat{\mathbf{A}} (GAMP for sparse recovery).
  2. Estimating A\mathbf{A} given c^\hat{\mathbf{c}} (GAMP for calibration).

Applications in RF imaging: blind calibration of antenna gain/phase errors, dictionary learning (learn basis functions from data), and matrix completion for missing-data recovery.

Multi-Layer Inference Methods for RF Imaging

MethodPrior StructureConvergenceBest Use Case
Single-layer GAMPi.i.d. Bernoulli-GaussianFast (20–50 iter)Random sensing, homogeneous scene
EM-GAMP (Ch. 19.1)BG with unknown (Οƒ2,ρ,Οƒx2)(\sigma^2, \rho, \sigma_x^2)20–30 EM outer iterationsAny CS problem, unknown noise
ML-VAMP (L=2)Dictionary + sparse codesModerate (50–100 iter)Scenes with learned dictionary
ML-VAMP (deep VAE)Deep generative priorSlower (100–300 iter)Under-determined problems (M/N<0.5M/N < 0.5)
BiG-AMPUnknown matrix + sparse signalAlternating (100–200 iter)Blind calibration, dictionary learning

Generative Prior

A generative prior models the signal c\mathbf{c} as the output of a probabilistic generative model (e.g., VAE, normalizing flow, diffusion model) trained on realistic scene data. Unlike parametric priors (Gaussian, BG), a generative prior can capture complex multi-modal distributions and spatial correlations. In multi-layer VAMP, the generative model is explicitly integrated into the message-passing inference loop.

Related: Vae, Normalizing Flow, Plug And Play

Bilinear Inference

Bilinear inference refers to the problem of estimating two unknown matrices (or vectors) from their product, e.g., Y=AX+N\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{N} where both A\mathbf{A} and X\mathbf{X} are unknown. Unlike linear inverse problems, bilinear problems are generally non-convex. BiG-AMP addresses them via alternating GAMP updates, which approximate the Bayesian posterior of both unknowns simultaneously.

Related: Dictionary Learning, Blind Calibration

Common Mistake: Generative Prior Mismatch Can Be Catastrophic

Mistake:

If the test scene distribution differs significantly from the training distribution of the generative model (prior mismatch), ML-VAMP will reconstruct the "most similar training sample" rather than the actual scene. This hallucination effect is worse than using a simple BG prior, which at least does not impose false scene structure.

Example: a VAE trained on urban scenes used to image a maritime scene will reconstruct ships that look like buildings.

Correction:

(1) Use diverse training sets that cover all expected scene types. (2) Regularize the generative prior with a data-consistency penalty: minimize βˆ₯yβˆ’AG(z)βˆ₯2\|\mathbf{y} - \mathbf{A}G(\mathbf{z})\|^2 directly in the latent space (projected gradient descent), using ML-VAMP only to initialize. (3) Quantify uncertainty: the posterior variance from ML-VAMP identifies regions where the scene is poorly constrained by both the measurements and the prior β€” flag these as unreliable.

⚠️Engineering Note

Computational Complexity of ML-VAMP in RF Imaging

ML-VAMP's per-iteration cost scales as O(MN+βˆ‘β„“Nβ„“Nβ„“βˆ’1)O(MN + \sum_\ell N_\ell N_{\ell-1}). For a two-layer model with N=1024N = 1024 voxels, K=256K = 256 dictionary atoms, M=300M = 300 measurements:

  • Layer 0 (sensing): O(MN)=O(300Γ—1024)β‰ˆ3Γ—105O(MN) = O(300 \times 1024) \approx 3 \times 10^5 operations.
  • Layer 1 (dictionary): O(NK)=O(1024Γ—256)β‰ˆ2.6Γ—105O(NK) = O(1024 \times 256) \approx 2.6 \times 10^5 operations.
  • Total per iteration: ∼6Γ—105\sim 6 \times 10^5 floating-point operations.
  • Convergence: typically 50–200 iterations.
  • Wall time: <1< 1 second on a modern CPU for this problem size.

For larger scenes (N>105N > 10^5), the Kronecker structure of the physical sensing matrix (Chapter 18) is essential: the VAMP linear step at the sensing layer reduces from O(MN)O(MN) to O(Nlog⁑N)O(N\log N).

Practical Constraints
  • β€’

    VAMP linear step requires SVD of A(β„“)\mathbf{A}^{(\ell)} β€” precompute and cache

  • β€’

    For structured A(β„“)\mathbf{A}^{(\ell)} (DFT, Kronecker), use fast transforms (Ch. 18)

  • β€’

    Deep generative priors (L>3L > 3) slow per-iteration cost by 22–4imes4 imes

  • β€’

    GPU acceleration recommended for N>104N > 10^4 voxels

Why This Matters: Connection to Diffusion Model Priors (Chapter 22)

ML-VAMP with a deep generative prior is the principled precursor to the diffusion-based imaging methods in Chapter 22. The key difference:

  • ML-VAMP: the generative model is explicit (G:z↦cG: \mathbf{z} \mapsto \mathbf{c}) and integrated directly into the message-passing loop.
  • Diffusion priors: the score function βˆ‡xlog⁑pt(x)\nabla_x \log p_t(x) of the diffusion model is used as a plug-in denoiser, without explicit latent structure.

Both approaches share the same goal: to exploit a learned prior over RF scenes while maintaining data consistency with the measurements.

See full treatment in Score-Based Diffusion Models Recap

Quick Check

ML-VAMP with a two-layer generative prior (N=1024N = 1024, K=64K = 64 latent code) achieves reliable reconstruction at M/N=0.3M/N = 0.3 where single-layer GAMP fails. The primary reason is:

ML-VAMP uses a more accurate state evolution

The latent dimension K=64β‰ͺM=307K = 64 \ll M = 307 makes the effective problem well-posed

ML-VAMP uses a better linear estimator at each step

Multi-layer iteration provides more passes over the data

Key Takeaway

Multi-layer VAMP integrates deep generative priors into Bayesian message passing: the generative model defines a low-dimensional signal manifold, and ML-VAMP performs inference in the latent space rather than the full NN-dimensional scene space. This enables reliable reconstruction at M/NM/N ratios well below the standard CS phase transition. The price is a more complex algorithm, slower convergence, and sensitivity to prior mismatch. BiG-AMP handles the related problem of unknown sensing matrices (blind calibration, dictionary learning) via alternating GAMP updates.