Multi-Layer Inference
Structured Scene Priors from Generative Models
The Bernoulli-Gaussian prior used in Chapters 17β19 captures coordinate-wise sparsity but ignores spatial structure: neighboring voxels in a real scene are correlated, extended objects have smooth boundaries, and urban environments have characteristic textures.
A more powerful approach is to model the scene as the output of a deep generative model trained on realistic scene data:
where is a low-dimensional latent code. ML-VAMP (Multi-Layer VAMP) integrates this generative prior directly into the message-passing inference framework, yielding a principled alternative to PnP (Chapter 21) that is jointly trained with the sensing model.
Definition: Multi-Layer Generalized Linear Model
Multi-Layer Generalized Linear Model
A multi-layer GLM with layers has the form:
where:
- is the -th layer's linear mixing matrix (possibly random or structured).
- is an element-wise non-linear activation function (ReLU, sign, magnitude-squared, etc.).
- is the latent code with a simple prior .
Special cases:
- : Standard GAMP (single-layer GLM).
- : with (one-hidden-layer generator).
- Deep VAE/flow: β, with .
ML-VAMP: Multi-Layer Vector AMP
Complexity: per full forward-backward pass. For deep networks with , total cost .The VAMP linear estimator at each layer requires solving an linear system. When has SVD-friendly structure (DFT, random orthonormal), this reduces to using the Kronecker techniques from Chapter 18.
Theorem: State Evolution for ML-VAMP
Under i.i.d. Gaussian at each layer, the ML-VAMP state variables satisfy a coupled system of state evolution recursions:
For each layer :
where is the compression ratio at layer , and the activation MSE at each layer satisfies:
The system of SE equations couples all layers simultaneously.
Each layer in ML-VAMP acts as an independent VAMP module, passing Gaussian messages up and down. The compression ratios multiply the effective noise variance at each level: a deep generator with at each layer progressively reduces the effective signal dimension, concentrating the prior information and improving reconstruction.
The proof extends the single-layer VAMP SE (Chapter 18) by induction over layers.
The key coupling is through the inter-layer messages: the backward extrinsic variance at layer becomes the forward noise variance at layer .
Reduction to independent VAMP modules
At each layer, the VAMP linear estimator sees an effective scalar channel with input noise and output noise . By the VAMP SE analysis (Rangan et al. 2019), each module satisfies its own SE independently, conditioned on the boundary variances.
Coupled SE system
The boundary variances are linked by the activation MSE functions . Iterating the single-layer SE equations across layers yields the coupled system. Fixed points correspond to the Bayes-optimal MMSE estimator when the SE is well-posed.
Example: ML-VAMP for Scene Recovery with VAE Prior
A two-layer generative model produces RF scenes: , where is a learned dictionary (, ), (, ), and .
The observation is with (highly compressed).
Compare ML-VAMP (exploits the generative structure) against: (a) GAMP with i.i.d. Gaussian prior (ignores structure), (b) GAMP with BG prior (correct sparsity, ignores spatial structure).
Problem dimensions
With , , : the true signal lies in a dimensional latent space. At , we have measurements. Standard CS requires for reliable recovery β the problem is severely under-determined for scalar priors.
ML-VAMP exploits dimensionality reduction
ML-VAMP infers the latent code from the measurements. Since (a factor of ), the effective recovery problem is well-posed.
The two-layer SE predicts MSE improvement of dB over BG-GAMP.
Numerical comparison
| Method | NMSE (dB) |
|---|---|
| GAMP (Gaussian prior) | |
| GAMP (BG prior, tuned) | |
| ML-VAMP (two-layer VAE prior) |
ML-VAMP recovers the scene because it knows the signal lives in a 64-dimensional submanifold β far less than the ambient dimension.
ML-VAMP: NMSE vs Iteration for Different Layer Counts
NMSE convergence curves for ML-VAMP with to generative layers. More layers compress the latent representation, improving reconstruction at low oversampling ratios by exploiting the generative structure.
The oversampling ratio is set below the standard CS phase transition, where single-layer GAMP fails but multi-layer inference succeeds.
Parameters
Definition: Bilinear GAMP (BiG-AMP) β Special Case
Bilinear GAMP (BiG-AMP) β Special Case
When the sensing matrix is unknown β e.g., calibration parameters, unknown polarization response, or an uncharacterized channel β the imaging model becomes bilinear:
BiG-AMP (Bilinear GAMP) treats this as a two-layer ML-GAMP: and alternates between:
- Estimating given (GAMP for sparse recovery).
- Estimating given (GAMP for calibration).
Applications in RF imaging: blind calibration of antenna gain/phase errors, dictionary learning (learn basis functions from data), and matrix completion for missing-data recovery.
Multi-Layer Inference Methods for RF Imaging
| Method | Prior Structure | Convergence | Best Use Case |
|---|---|---|---|
| Single-layer GAMP | i.i.d. Bernoulli-Gaussian | Fast (20β50 iter) | Random sensing, homogeneous scene |
| EM-GAMP (Ch. 19.1) | BG with unknown | 20β30 EM outer iterations | Any CS problem, unknown noise |
| ML-VAMP (L=2) | Dictionary + sparse codes | Moderate (50β100 iter) | Scenes with learned dictionary |
| ML-VAMP (deep VAE) | Deep generative prior | Slower (100β300 iter) | Under-determined problems () |
| BiG-AMP | Unknown matrix + sparse signal | Alternating (100β200 iter) | Blind calibration, dictionary learning |
Generative Prior
A generative prior models the signal as the output of a probabilistic generative model (e.g., VAE, normalizing flow, diffusion model) trained on realistic scene data. Unlike parametric priors (Gaussian, BG), a generative prior can capture complex multi-modal distributions and spatial correlations. In multi-layer VAMP, the generative model is explicitly integrated into the message-passing inference loop.
Related: Vae, Normalizing Flow, Plug And Play
Bilinear Inference
Bilinear inference refers to the problem of estimating two unknown matrices (or vectors) from their product, e.g., where both and are unknown. Unlike linear inverse problems, bilinear problems are generally non-convex. BiG-AMP addresses them via alternating GAMP updates, which approximate the Bayesian posterior of both unknowns simultaneously.
Related: Dictionary Learning, Blind Calibration
Common Mistake: Generative Prior Mismatch Can Be Catastrophic
Mistake:
If the test scene distribution differs significantly from the training distribution of the generative model (prior mismatch), ML-VAMP will reconstruct the "most similar training sample" rather than the actual scene. This hallucination effect is worse than using a simple BG prior, which at least does not impose false scene structure.
Example: a VAE trained on urban scenes used to image a maritime scene will reconstruct ships that look like buildings.
Correction:
(1) Use diverse training sets that cover all expected scene types. (2) Regularize the generative prior with a data-consistency penalty: minimize directly in the latent space (projected gradient descent), using ML-VAMP only to initialize. (3) Quantify uncertainty: the posterior variance from ML-VAMP identifies regions where the scene is poorly constrained by both the measurements and the prior β flag these as unreliable.
Computational Complexity of ML-VAMP in RF Imaging
ML-VAMP's per-iteration cost scales as . For a two-layer model with voxels, dictionary atoms, measurements:
- Layer 0 (sensing): operations.
- Layer 1 (dictionary): operations.
- Total per iteration: floating-point operations.
- Convergence: typically 50β200 iterations.
- Wall time: second on a modern CPU for this problem size.
For larger scenes (), the Kronecker structure of the physical sensing matrix (Chapter 18) is essential: the VAMP linear step at the sensing layer reduces from to .
- β’
VAMP linear step requires SVD of β precompute and cache
- β’
For structured (DFT, Kronecker), use fast transforms (Ch. 18)
- β’
Deep generative priors () slow per-iteration cost by β
- β’
GPU acceleration recommended for voxels
Why This Matters: Connection to Diffusion Model Priors (Chapter 22)
ML-VAMP with a deep generative prior is the principled precursor to the diffusion-based imaging methods in Chapter 22. The key difference:
- ML-VAMP: the generative model is explicit () and integrated directly into the message-passing loop.
- Diffusion priors: the score function of the diffusion model is used as a plug-in denoiser, without explicit latent structure.
Both approaches share the same goal: to exploit a learned prior over RF scenes while maintaining data consistency with the measurements.
See full treatment in Score-Based Diffusion Models Recap
Quick Check
ML-VAMP with a two-layer generative prior (, latent code) achieves reliable reconstruction at where single-layer GAMP fails. The primary reason is:
ML-VAMP uses a more accurate state evolution
The latent dimension makes the effective problem well-posed
ML-VAMP uses a better linear estimator at each step
Multi-layer iteration provides more passes over the data
Correct. Single-layer GAMP must reconstruct all unknowns from measurements β severely under-determined. ML-VAMP reduces the effective problem to estimating latent variables from measurements, which is well-posed ().
Key Takeaway
Multi-layer VAMP integrates deep generative priors into Bayesian message passing: the generative model defines a low-dimensional signal manifold, and ML-VAMP performs inference in the latent space rather than the full -dimensional scene space. This enables reliable reconstruction at ratios well below the standard CS phase transition. The price is a more complex algorithm, slower convergence, and sensitivity to prior mismatch. BiG-AMP handles the related problem of unknown sensing matrices (blind calibration, dictionary learning) via alternating GAMP updates.