Ferkans — Interactive Telecom Tutor

Structured Scene Priors from Generative Models

The Bernoulli-Gaussian prior used in Chapters 17–19 captures coordinate-wise sparsity but ignores spatial structure: neighboring voxels in a real scene are correlated, extended objects have smooth boundaries, and urban environments have characteristic textures.

A more powerful approach is to model the scene $\mathbf{c}$ as the output of a deep generative model trained on realistic scene data:

$\mathbf{c} = G(\mathbf{z}^{(L)}), \quad \mathbf{z}^{(\ell-1)} = f_\ell(\mathbf{A}^{(\ell)}\mathbf{z}^{(\ell)}) \text{ for } \ell = L, \ldots, 1,$

where $\mathbf{z}^{(L)}$ is a low-dimensional latent code. ML-VAMP (Multi-Layer VAMP) integrates this generative prior directly into the message-passing inference framework, yielding a principled alternative to PnP (Chapter 21) that is jointly trained with the sensing model.

,

Definition:
Multi-Layer Generalized Linear Model

A multi-layer GLM with $L$ layers has the form:

$\mathbf{z}^{(0)} = \mathbf{c},$ $\mathbf{z}^{(\ell)} = g_\ell(\mathbf{A}^{(\ell)}\mathbf{z}^{(\ell-1)}), \quad \ell = 1, \ldots, L,$ $\mathbf{y} \sim p(\mathbf{y} \mid \mathbf{z}^{(L)}),$

where:

$\mathbf{A}^{(\ell)} \in \mathbb{R}^{N_\ell \times N_{\ell-1}}$ is the $\ell$ -th layer's linear mixing matrix (possibly random or structured).
$g_\ell(\cdot)$ is an element-wise non-linear activation function (ReLU, sign, magnitude-squared, etc.).
$\mathbf{z}^{(L)}$ is the latent code with a simple prior $p(\mathbf{z}^{(L)}) = \prod_i p_L(z_i^{(L)})$ .

Special cases:

$L = 1$ : Standard GAMP (single-layer GLM).
$L = 2$ : $\mathbf{y} = \mathbf{A}_{\text{obs}}\mathbf{z}^{(1)} + \mathbf{w}$ with $\mathbf{z}^{(1)} = \text{ReLU}(\mathbf{A}^{(1)}\mathbf{c})$ (one-hidden-layer generator).
Deep VAE/flow: $L = 5$ – $20$ , with $N_L \ll N_0$ .

ML-VAMP: Multi-Layer Vector AMP

Complexity:

O\!\left(\sum_{\ell=0}^{L} N_{\ell-1} N_\ell \cdot T\right)

per full forward-backward pass. For deep networks with

N_\ell \sim N/2^\ell

, total cost

\approx 2\,MN\,T

.

Input: Measurements

\mathbf{y}

, sensing matrix

\mathbf{A}_{\text{obs}}

,

layer matrices

\{\mathbf{A}^{(\ell)}\}

, activations

\{g_\ell\}

Output: Posterior mean estimates

\{\hat{\mathbf{z}}^{(\ell)}\}

,

\hat{\mathbf{c}}

Initialize:

\hat{\mathbf{z}}^{(\ell)} = \mathbf{0}

,

\tau^{(\ell)} = 1

for all

\ell

for

t = 1, 2, \ldots

until convergence do

Forward pass (layer

\ell = 0 \to L

):

1. Compute linear output:

\hat{\mathbf{u}}^{(\ell)} = \mathbf{A}^{(\ell)}\hat{\mathbf{z}}^{(\ell-1)}

2. Apply activation denoiser (MMSE under

p(\mathbf{z}^{(\ell)} | \hat{\mathbf{u}}^{(\ell)}, \tau^{(\ell-1)})

)

3. Compute extrinsic message to next layer:

\tau^{(\ell)}_{\text{fwd}}

Backward pass (layer

\ell = L \to 0

):

4. Receive backward message from layer

\ell+1

5. Update

\hat{\mathbf{z}}^{(\ell)}

using VAMP linear estimator:

\hat{\mathbf{z}}^{(\ell)} \leftarrow (\mathbf{A}^{(\ell)T}\mathbf{A}^{(\ell)} + \gamma^{(\ell)}\mathbf{I})^{-1} (\mathbf{A}^{(\ell)T}\hat{\mathbf{r}}^{(\ell)}_{\text{bwd}} + \gamma^{(\ell)}\hat{\mathbf{z}}^{(\ell)}_{\text{fwd}})

6. Update

\tau^{(\ell)}

and pass backward extrinsic message

Apply observation likelihood (at layer

L

):

7.

g_{\text{out}}(y_m, \hat{p}_m, \tau_p)

→ update

\hat{\mathbf{z}}^{(L)}

end for

The VAMP linear estimator at each layer requires solving an $N_\ell \times N_\ell$ linear system. When $\mathbf{A}^{(\ell)}$ has SVD-friendly structure (DFT, random orthonormal), this reduces to $O(N_\ell \log N_\ell)$ using the Kronecker techniques from Chapter 18.

,

Theorem: State Evolution for ML-VAMP

Under i.i.d. Gaussian $\mathbf{A}^{(\ell)}$ at each layer, the ML-VAMP state variables $(\tau_{\text{fwd}}^{(\ell,t)}, \tau_{\text{bwd}}^{(\ell,t)})$ satisfy a coupled system of state evolution recursions:

For each layer $\ell = 1, \ldots, L$ :

$\tau_{\text{fwd}}^{(\ell,t)} = \frac{\tau_{\text{fwd}}^{(\ell-1,t)}}{\delta_\ell}, \quad \tau_{\text{bwd}}^{(\ell-1,t+1)} = \frac{\tau_{\text{bwd}}^{(\ell,t+1)}}{\delta_\ell},$

where $\delta_\ell = N_\ell / N_{\ell-1}$ is the compression ratio at layer $\ell$ , and the activation MSE at each layer satisfies:

$\mathbb{E}[(g_{\text{in},\ell}(X + \sqrt{\tau_{\text{bwd}}^{(\ell)}}\,Z, \tau_{\text{bwd}}^{(\ell)}) - X)^2] = f_\ell(\tau_{\text{fwd}}^{(\ell)}, \tau_{\text{bwd}}^{(\ell)}).$

The system of SE equations couples all layers simultaneously.

Each layer in ML-VAMP acts as an independent VAMP module, passing Gaussian messages up and down. The compression ratios $\delta_\ell$ multiply the effective noise variance at each level: a deep generator with $\delta_\ell < 1$ at each layer progressively reduces the effective signal dimension, concentrating the prior information and improving reconstruction.

Show Hint

The proof extends the single-layer VAMP SE (Chapter 18) by induction over layers.

The key coupling is through the inter-layer messages: the backward extrinsic variance at layer $\ell$ becomes the forward noise variance at layer $\ell-1$ .

Proof

Reduction to independent VAMP modules

At each layer, the VAMP linear estimator sees an effective scalar channel with input noise $\tau_{\text{fwd}}^{(\ell-1)}$ and output noise $\tau_{\text{bwd}}^{(\ell)}$ . By the VAMP SE analysis (Rangan et al. 2019), each module satisfies its own SE independently, conditioned on the boundary variances.

Coupled SE system

The boundary variances are linked by the activation MSE functions $f_\ell$ . Iterating the single-layer SE equations across layers yields the coupled system. Fixed points correspond to the Bayes-optimal MMSE estimator when the SE is well-posed. $\blacksquare$

,

Example: ML-VAMP for Scene Recovery with VAE Prior

A two-layer generative model produces RF scenes: $\mathbf{c} = \mathbf{D}\,\text{ReLU}(\mathbf{A}^{(1)}\mathbf{z})$ , where $\mathbf{D} \in \mathbb{R}^{N \times K}$ is a learned dictionary ( $K < N$ , $K/N = 0.25$ ), $\mathbf{A}^{(1)} \in \mathbb{R}^{K \times J}$ ( $J < K$ , $J/K = 0.25$ ), and $\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_J)$ .

The observation is $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ with $M/N = 0.3$ (highly compressed).

Compare ML-VAMP (exploits the generative structure) against: (a) GAMP with i.i.d. Gaussian prior (ignores structure), (b) GAMP with BG prior (correct sparsity, ignores spatial structure).

Solution

Problem dimensions

With $N = 1024$ , $K = 256$ , $J = 64$ : the true signal lies in a $J = 64$ dimensional latent space. At $M/N = 0.3$ , we have $M = 307$ measurements. Standard CS requires $M > K\log(N/K) \approx 512$ for reliable recovery — the problem is severely under-determined for scalar priors.

ML-VAMP exploits dimensionality reduction

ML-VAMP infers the latent code $\mathbf{z}^{(L)} \in \mathbb{R}^{64}$ from the $M = 307$ measurements. Since $M > J = 64$ (a factor of $\sim 5$ ), the effective recovery problem is well-posed.

The two-layer SE predicts MSE improvement of $\sim 8$ dB over BG-GAMP.

Numerical comparison

Method	NMSE (dB)
GAMP (Gaussian prior)	$-4.2$
GAMP (BG prior, tuned)	$-8.7$
ML-VAMP (two-layer VAE prior)	$-16.5$

ML-VAMP recovers the scene because it knows the signal lives in a 64-dimensional submanifold — far less than the $N = 1024$ ambient dimension.

,

ML-VAMP: NMSE vs Iteration for Different Layer Counts

NMSE convergence curves for ML-VAMP with $L = 1$ to $L = 4$ generative layers. More layers compress the latent representation, improving reconstruction at low oversampling ratios by exploiting the generative structure.

The oversampling ratio $M/N$ is set below the standard CS phase transition, where single-layer GAMP fails but multi-layer inference succeeds.

Parameters

Number of generative layers

L

2

Oversampling ratio

M/N

0.5

Definition:
Bilinear GAMP (BiG-AMP) — Special Case

When the sensing matrix $\mathbf{A}$ is unknown — e.g., calibration parameters, unknown polarization response, or an uncharacterized channel — the imaging model becomes bilinear:

$\mathbf{Y} = \mathbf{A}\mathbf{c} + \mathbf{w}, \quad \mathbf{A}\text{ unknown},\; \mathbf{c}\text{ unknown.}$

BiG-AMP (Bilinear GAMP) treats this as a two-layer ML-GAMP: $\mathbf{Y} = \mathbf{F}(\mathbf{A}) \cdot G(\mathbf{c}) + \mathbf{w}$ and alternates between:

Estimating $\mathbf{c}$ given $\hat{\mathbf{A}}$ (GAMP for sparse recovery).
Estimating $\mathbf{A}$ given $\hat{\mathbf{c}}$ (GAMP for calibration).

Applications in RF imaging: blind calibration of antenna gain/phase errors, dictionary learning (learn basis functions from data), and matrix completion for missing-data recovery.

Multi-Layer Inference Methods for RF Imaging

Method	Prior Structure	Convergence	Best Use Case
Single-layer GAMP	i.i.d. Bernoulli-Gaussian	Fast (20–50 iter)	Random sensing, homogeneous scene
EM-GAMP (Ch. 19.1)	BG with unknown $(\sigma^2, \rho, \sigma_x^2)$	20–30 EM outer iterations	Any CS problem, unknown noise
ML-VAMP (L=2)	Dictionary + sparse codes	Moderate (50–100 iter)	Scenes with learned dictionary
ML-VAMP (deep VAE)	Deep generative prior	Slower (100–300 iter)	Under-determined problems ( $M/N < 0.5$ )
BiG-AMP	Unknown matrix + sparse signal	Alternating (100–200 iter)	Blind calibration, dictionary learning

Generative Prior

A generative prior models the signal $\mathbf{c}$ as the output of a probabilistic generative model (e.g., VAE, normalizing flow, diffusion model) trained on realistic scene data. Unlike parametric priors (Gaussian, BG), a generative prior can capture complex multi-modal distributions and spatial correlations. In multi-layer VAMP, the generative model is explicitly integrated into the message-passing inference loop.

Bilinear Inference

Bilinear inference refers to the problem of estimating two unknown matrices (or vectors) from their product, e.g., $\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{N}$ where both $\mathbf{A}$ and $\mathbf{X}$ are unknown. Unlike linear inverse problems, bilinear problems are generally non-convex. BiG-AMP addresses them via alternating GAMP updates, which approximate the Bayesian posterior of both unknowns simultaneously.

Common Mistake: Generative Prior Mismatch Can Be Catastrophic

Mistake:

If the test scene distribution differs significantly from the training distribution of the generative model (prior mismatch), ML-VAMP will reconstruct the "most similar training sample" rather than the actual scene. This hallucination effect is worse than using a simple BG prior, which at least does not impose false scene structure.

Example: a VAE trained on urban scenes used to image a maritime scene will reconstruct ships that look like buildings.

Correction:

(1) Use diverse training sets that cover all expected scene types. (2) Regularize the generative prior with a data-consistency penalty: minimize $\|\mathbf{y} - \mathbf{A}G(\mathbf{z})\|^2$ directly in the latent space (projected gradient descent), using ML-VAMP only to initialize. (3) Quantify uncertainty: the posterior variance from ML-VAMP identifies regions where the scene is poorly constrained by both the measurements and the prior — flag these as unreliable.

⚠️Engineering Note

Computational Complexity of ML-VAMP in RF Imaging

ML-VAMP's per-iteration cost scales as $O(MN + \sum_\ell N_\ell N_{\ell-1})$ . For a two-layer model with $N = 1024$ voxels, $K = 256$ dictionary atoms, $M = 300$ measurements:

Layer 0 (sensing): $O(MN) = O(300 \times 1024) \approx 3 \times 10^5$ operations.
Layer 1 (dictionary): $O(NK) = O(1024 \times 256) \approx 2.6 \times 10^5$ operations.
Total per iteration: $\sim 6 \times 10^5$ floating-point operations.
Convergence: typically 50–200 iterations.
Wall time: $< 1$ second on a modern CPU for this problem size.

For larger scenes ( $N > 10^5$ ), the Kronecker structure of the physical sensing matrix (Chapter 18) is essential: the VAMP linear step at the sensing layer reduces from $O(MN)$ to $O(N\log N)$ .

Practical Constraints

•
VAMP linear step requires SVD of $\mathbf{A}^{(\ell)}$ — precompute and cache
•
For structured $\mathbf{A}^{(\ell)}$ (DFT, Kronecker), use fast transforms (Ch. 18)
•
Deep generative priors ( $L > 3$ ) slow per-iteration cost by $2$ – $4 imes$
•
GPU acceleration recommended for $N > 10^4$ voxels

Why This Matters: Connection to Diffusion Model Priors (Chapter 22)

ML-VAMP with a deep generative prior is the principled precursor to the diffusion-based imaging methods in Chapter 22. The key difference:

ML-VAMP: the generative model is explicit ( $G: \mathbf{z} \mapsto \mathbf{c}$ ) and integrated directly into the message-passing loop.
Diffusion priors: the score function $\nabla_x \log p_t(x)$ of the diffusion model is used as a plug-in denoiser, without explicit latent structure.

Both approaches share the same goal: to exploit a learned prior over RF scenes while maintaining data consistency with the measurements.

See full treatment in Score-Based Diffusion Models Recap

Quick Check

ML-VAMP with a two-layer generative prior ( $N = 1024$ , $K = 64$ latent code) achieves reliable reconstruction at $M/N = 0.3$ where single-layer GAMP fails. The primary reason is:

ML-VAMP uses a more accurate state evolution

The latent dimension $K = 64 \ll M = 307$ makes the effective problem well-posed

ML-VAMP uses a better linear estimator at each step

Multi-layer iteration provides more passes over the data

Correction:

The latent dimension

K = 64 \ll M = 307

makes the effective problem well-posed

Correct. Single-layer GAMP must reconstruct all $N = 1024$ unknowns from $M = 307$ measurements — severely under-determined. ML-VAMP reduces the effective problem to estimating $K = 64$ latent variables from $M = 307$ measurements, which is well-posed ( $M > K$ ).

Key Takeaway

Multi-layer VAMP integrates deep generative priors into Bayesian message passing: the generative model defines a low-dimensional signal manifold, and ML-VAMP performs inference in the latent space rather than the full $N$ -dimensional scene space. This enables reliable reconstruction at $M/N$ ratios well below the standard CS phase transition. The price is a more complex algorithm, slower convergence, and sensitivity to prior mismatch. BiG-AMP handles the related problem of unknown sensing matrices (blind calibration, dictionary learning) via alternating GAMP updates.

Multi-Layer Inference

Structured Scene Priors from Generative Models

Definition: Multi-Layer Generalized Linear Model

ML-VAMP: Multi-Layer Vector AMP

Theorem: State Evolution for ML-VAMP

Reduction to independent VAMP modules

Coupled SE system

Example: ML-VAMP for Scene Recovery with VAE Prior

Problem dimensions

ML-VAMP exploits dimensionality reduction

Numerical comparison

ML-VAMP: NMSE vs Iteration for Different Layer Counts

Parameters

Definition: Bilinear GAMP (BiG-AMP) — Special Case

Multi-Layer Inference Methods for RF Imaging

Generative Prior

Bilinear Inference

Common Mistake: Generative Prior Mismatch Can Be Catastrophic

Computational Complexity of ML-VAMP in RF Imaging

Why This Matters: Connection to Diffusion Model Priors (Chapter 22)

Quick Check

Key Takeaway

Definition:
Multi-Layer Generalized Linear Model

Definition:
Bilinear GAMP (BiG-AMP) — Special Case