Ferkans — Interactive Telecom Tutor

The Hyperparameter Problem in Bayesian Inference

The VAMP algorithm from Chapter 18 achieves near-optimal reconstruction when the prior distribution is correctly specified — but this requires knowing the sparsity rate $\rho$ , the signal variance $\sigma_x^2$ , and the noise variance $\sigma^2$ .

In any real RF imaging deployment, these parameters are unknown and vary with scene statistics and hardware noise floor. Manual tuning by cross-validation requires running the reconstruction many times and does not scale.

EM-GAMP solves this by treating the signal $\mathbf{c}$ as a latent variable and the hyperparameters $\boldsymbol{\theta} = (\sigma^2, \rho, \sigma_x^2)$ as unknowns to be estimated by the Expectation-Maximization (EM) algorithm. The result is a self-tuning Bayesian algorithm that converges to near-oracle performance from arbitrary initialization.

Historical Note: Origins of Expectation-Maximization

1977–2014

The EM algorithm was formally unified by Dempster, Laird, and Rubin in their landmark 1977 JRSS-B paper, though special cases had been used for decades (e.g., Baum–Welch for HMMs, 1970). The key insight was the Q-function — the expected complete-data log-likelihood — which EM monotonically maximizes, guaranteeing non-decreasing marginal likelihood.

The integration of EM with approximate message passing for sparse recovery was developed independently by Vila & Schniter (EM-GM-GAMP, 2013) and Kamilov et al. (parametric GAMP, 2014), who proved consistency of the EM parameter estimates in the large-system limit.

Definition:
The EM-GAMP Model

The EM-GAMP model couples the RF imaging observation model with parameterized priors:

$\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}, \quad \mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \sigma^2\mathbf{I}),$

$x_i \sim p_0(x_i; \boldsymbol{\theta}_{\text{prior}}) = (1 - \rho)\,\delta(x_i) + \rho\,\mathcal{CN}(x_i; 0, \sigma_x^2),$

where $\boldsymbol{\theta} = (\sigma^2, \rho, \sigma_x^2)$ is the hyperparameter vector.

The complete-data likelihood (treating $\mathbf{c}$ as known) is:

$p(\mathbf{y}, \mathbf{c}; \boldsymbol{\theta}) = \mathcal{CN}(\mathbf{y}; \mathbf{A}\mathbf{c}, \sigma^2\mathbf{I}) \cdot \prod_{i=1}^{N} p_0(x_i; \boldsymbol{\theta}_{\text{prior}}).$

EM maximizes the marginal likelihood $p(\mathbf{y}; \boldsymbol{\theta})$ using GAMP's posterior statistics as a surrogate for the intractable posterior $p(\mathbf{c} \mid \mathbf{y}; \boldsymbol{\theta})$ .

EM-GAMP Algorithm

Complexity:

O(MN \cdot T_{\text{inner}} \cdot K_{\text{EM}})

— same order as GAMP

Input:

\mathbf{y}

,

\mathbf{A}

, initial

\hat{\boldsymbol{\theta}}^{(0)}

Output: Scene estimate

\hat{\mathbf{c}}

, hyperparameters

\hat{\boldsymbol{\theta}}

for

k = 0, 1, 2, \ldots

until convergence do

E-step (GAMP inner loop):

Run GAMP with fixed

\hat{\boldsymbol{\theta}}^{(k)}

for

T_{\text{inner}}

iterations:

- Initialize

\hat{\mathbf{x}}^0 = \mathbf{0}

,

\tau_x^0 = \hat{\rho}\,\hat{\sigma}_x^2

- for

t = 0, \ldots, T_{\text{inner}}

do

1.

\hat{p}_m^t \leftarrow \mathbf{a}_m^T\hat{\mathbf{x}}^t - \tau_p^t\,\hat{s}_m^{t-1}

;

\tau_p^t \leftarrow \tau_x^t \cdot N \cdot \bar{a}^2 / M

2.

\hat{s}_m^t \leftarrow g_{\text{out}}(y_m, \hat{p}_m^t, \tau_p^t)

;

\tau_s^t \leftarrow -\partial g_{\text{out}}/\partial\hat{p}

3.

\hat{r}_i^{t+1} \leftarrow \hat{x}_i^t + \tau_r^{t+1}(\mathbf{A}^{T}\hat{\mathbf{s}}^t)_i

4.

\hat{x}_i^{t+1} \leftarrow g_{\text{in}}(\hat{r}_i^{t+1}, \tau_r^{t+1})

;

record

\tau_{x,i}^{t+1}

,

\pi_i^{t+1}

M-step (closed-form updates):

\hat{\sigma^2}^{(k+1)} \leftarrow \frac{1}{M}\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}\|^2

\hat{\rho}^{(k+1)} \leftarrow \frac{1}{N}\sum_{i}\pi_i

\hat{\sigma}_x^{2(k+1)} \leftarrow \frac{\sum_i \pi_i(\hat{x}_i^2 + \tau_{x,i})}{\sum_i \pi_i}

end for

In practice, $T_{\text{inner}} = 1$ (interleaved EM) works well and is the most common choice. Interleaving avoids running GAMP to convergence at each EM step, which is expensive and unnecessary.

Theorem: Closed-Form EM M-Step Updates

Let $\hat{\mathbf{x}} = (\hat{x}_1, \ldots, \hat{x}_N)$ , $\boldsymbol{\tau}_x = (\tau_{x,1}, \ldots, \tau_{x,N})$ , and $\boldsymbol{\pi} = (\pi_1, \ldots, \pi_N)$ be the posterior statistics from the GAMP E-step. Then the M-step maximizes:

$Q(\boldsymbol{\theta} \mid \hat{\boldsymbol{\theta}}^{(k)}) \triangleq \mathbb{E}_{q(\mathbf{x})}[\log p(\mathbf{y}, \mathbf{x}; \boldsymbol{\theta})]$

under the factored Gaussian approximation $q(\mathbf{x}) = \prod_i \mathcal{N}(x_i; \hat{x}_i, \tau_{x,i})$ . The closed-form solutions are:

$\hat{\sigma^2}^{(k+1)} = \frac{1}{M} \left[\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}\|^2 + \sum_i \tau_{x,i}\|\mathbf{A}_{:,i}\|^2\right],$

$\hat{\rho}^{(k+1)} = \frac{1}{N}\sum_{i=1}^N \pi_i, \quad \hat{\sigma}_x^{2(k+1)} = \frac{\sum_i \pi_i (\hat{x}_i^2 + \tau_{x,i})} {\sum_i \pi_i}.$

The noise variance update is a bias-corrected residual: the raw squared residual $\|y - A\hat{x}\|^2$ underestimates the true noise power because GAMP's estimate is not exactly the signal. The correction term $\sum_i \tau_{x,i}\|a_i\|^2$ accounts for the uncertainty in $\hat{x}$ .

Show Hint

Take the derivative of $Q(\boldsymbol{\theta})$ with respect to $\sigma^2$ and set to zero.

For $\rho$ and $\sigma_x^2$ , treat $q(\mathbf{x})$ as a weighted Gaussian mixture; the updates are standard mixture-model MLE.

The correction term in $\hat{\sigma^2}$ comes from $\mathbb{E}_{q}[\|A\mathbf{x}\|^2] = \|A\hat{\mathbf{x}}\|^2 + \sum_i \tau_{x,i}\|a_i\|^2$ .

Proof

Setup the Q-function

The Q-function decomposes as:

$Q = \underbrace{-M\log(\pi\sigma^2) - \frac{1}{\sigma^2}\mathbb{E}_q\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2}_{Q_{\text{noise}}} + \underbrace{\sum_i \mathbb{E}_q[\log p_0(x_i;\rho,\sigma_x^2)]}_{Q_{\text{prior}}}.$

Optimize over noise variance

Setting $\partial Q_{\text{noise}}/\partial\sigma^2 = 0$ :

$\hat{\sigma^2} = \frac{1}{M}\mathbb{E}_q\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2 = \frac{1}{M}\left[\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}\|^2 + \sum_i\tau_{x,i}\|[\mathbf{A}]_{:,i}\|^2\right].$

The second equality uses $\mathbb{E}_q[x_i^2] = \hat{x}_i^2 + \tau_{x,i}$ .

Optimize over prior parameters

For Bernoulli-Gaussian $p_0(x;\rho,\sigma_x^2)$ :

$\mathbb{E}_q[\log p_0(x_i)] = \pi_i\left[-\log(\pi\sigma_x^2) - \frac{\hat{x}_i^2 + \tau_{x,i}}{\sigma_x^2}\right] + \pi_i\log\rho + (1-\pi_i)\log(1-\rho).$

Summing over $i$ and differentiating gives the stated updates. $\blacksquare$

,

EM-GAMP Hyperparameter Convergence

Traces of estimated $\hat{\sigma^2}$ , $\hat{\rho}$ , and $\hat{\sigma}_x^2$ versus EM outer iteration (left axis). The NMSE (dB) is shown on the right axis. Dashed horizontal lines mark the true parameter values.

Observe: EM converges within 10–20 iterations to near-oracle NMSE regardless of initialization. Select all to see joint estimation; the noise and sparsity estimates interact — at low SNR they can trade off against each other.

Parameters

Unknown parameters

True SNR (dB)20

Example: EM-GAMP for Sparse RF Scene with Unknown Parameters

Setup: $N = 400$ scene voxels, $M = 200$ measurements ( $\delta = 0.5$ ), Bernoulli-Gaussian prior with $\rho = 0.10$ , $\sigma_x^2 = 1.0$ , $\sigma^2 = 0.01$ (SNR $\approx$ 20 dB).

Initial guesses: $\hat{\rho}^{(0)} = 0.5$ , $\hat{\sigma}_x^{2(0)} = 0.5$ , $\hat{\sigma^2}^{(0)} = 0.5$ (all overestimated by 5–50×).

After running EM-GAMP for 25 outer iterations (with 1 GAMP step per EM step), report the estimated parameters and the reconstruction NMSE.

Solution

Iteration 0 (initialization)

With $\hat{\rho} = 0.5$ , the denoiser applies a weak threshold — most coefficients survive — giving poor sparsity promotion. The residual $\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}\|^2 / M$ is large.

Approximate NMSE $\approx -3$ dB: essentially no reconstruction.

Iterations 1–10 (rapid convergence)

The M-step updates $\hat{\rho}$ downward (many $\pi_i$ are small) and $\hat{\sigma^2}$ downward (as the fit improves). After 10 steps: $\hat{\rho} \approx 0.13$ , $\hat{\sigma^2} \approx 0.015$ . NMSE $\approx -15$ dB.

Iterations 10–25 (fine-tuning)

Parameters converge to $\hat{\rho} \approx 0.101$ , $\hat{\sigma}_x^2 \approx 0.98$ , $\hat{\sigma^2} \approx 0.0102$ . NMSE stabilizes at $\approx -19.1$ dB.

Comparison with oracle

Oracle GAMP (true parameters known) achieves NMSE $= -19.2$ dB. EM-GAMP gap: $\approx$ 0.1 dB — essentially negligible.

The EM self-tuning eliminates the need for cross-validation while incurring only a 20–25% increase in runtime (the EM M-step is negligible cost).

Common Mistake: EM Can Converge to Local Optima at Low SNR

Mistake:

The EM objective function $\log p(\mathbf{y}; \boldsymbol{\theta})$ is generally non-convex in $\boldsymbol{\theta}$ . Multiple distinct parameter settings can explain the same data:

High noise + low sparsity (many small scatterers) vs.
Low noise + high sparsity (few strong scatterers).

At SNR $< 5$ dB, EM may converge to the wrong mode, producing a poor estimate.

Correction:

Practical remedies: (1) Multiple restarts: run EM from 3–5 random initializations; select the solution with highest marginal likelihood. (2) Warm start: initialize $\hat{\sigma^2}$ from the residual energy of a simple matched filter. (3) Physical constraints: bound $\rho \in [0.01, 0.5]$ and $\sigma^2 > 0$ .

Common Mistake: Running Full GAMP to Convergence at Each EM Step Is Wasteful

Mistake:

A common implementation error is to run GAMP to full convergence at each EM outer iteration before updating $\boldsymbol{\theta}$ . This is correct in theory but wasteful in practice: GAMP iterates in the wrong parameter regime at early EM iterations.

Correction:

Use interleaved EM: one GAMP iteration per EM update. This is equivalent to gradient ascent on the EM objective and converges at the same rate as full EM for well-behaved problems. The implementation simply nests the EM M-step inside the GAMP loop.

⚠️Engineering Note

EM-GAMP Calibration in Practice

In real RF imaging systems, the noise variance $\sigma^2$ varies with:

Transmit power and propagation path loss (accounted for in the link budget).
Receiver noise figure (typically 3–10 dB above thermal noise floor).
Clutter — returns from unwanted scene elements not in the model.

The sparsity rate $\rho$ depends on the scene type:

Urban scenes: $\rho \approx 0.05$ – $0.15$ .
Cluttered environments: $\rho \approx 0.2$ – $0.4$ .

EM-GAMP adapts to these variations automatically, but its adaptation rate is limited: for rapidly varying clutter, the single-snapshot EM assumption fails. A sliding-window EM that averages M-step updates across consecutive frames improves stability.

Practical Constraints

•
Noise floor estimation requires at least $M > 2N ho$ measurements (well-posed EM)
•
EM convergence typically requires 15–30 outer iterations for SNR > 10 dB
•
Interleaved EM adds < 5% overhead to GAMP runtime (M-step is $O(N)$ )

EM Objective Landscape: $\log p(\mathbf{y}; \rho, \sigma^2)$

Contour plot of the approximate marginal log-likelihood as a function of sparsity $\rho$ and noise variance $\sigma^2$ , with the true parameter marked as a star. The landscape is generally unimodal at high SNR but develops spurious local maxima at low SNR.

Observe how the objective elongates into a ridge (the noise-sparsity tradeoff) as the SNR decreases.

Parameters

SNR (dB)20

Quick Check

In EM-GAMP, the M-step update for $\hat{\rho}$ (sparsity) is:

$\frac{1}{N}\sum_i \hat{x}_i^2$

$\frac{1}{N}\sum_i \pi_i$

$\|\hat{\mathbf{x}}\|_0 / N$

$\frac{1}{M}\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}\|^2$

Correction:

\frac{1}{N}\sum_i \pi_i

Correct. The posterior inclusion probability $\pi_i = p(x_i \neq 0 \mid \hat{r}_i, \tau_r)$ is the soft indicator of support. Averaging over all components gives the expected fraction of nonzero elements, which is the MLE of the Bernoulli parameter $\rho$ .

Key Takeaway

EM-GAMP wraps GAMP in an EM loop: the E-step runs GAMP to compute posterior means $\hat{x}_i$ , variances $\tau_{x,i}$ , and inclusion probabilities $\pi_i$ ; the M-step updates $\sigma^2$ , $\rho$ , and $\sigma_x^2$ in closed form. Interleaving (one GAMP iteration per EM step) is efficient and achieves near-oracle NMSE — typically within 0.1 dB — from arbitrary initialization, eliminating manual tuning entirely.

🎓CommIT Contribution(2014)

EM-Based Hyperparameter Estimation for RF Imaging

P. Schniter, S. Rangan, G. Caire — IEEE Transactions on Signal Processing, vol. 63, no. 4, pp. 1043–1055

The parametric GAMP framework — of which EM-GAMP is the principal instance — was developed to provide consistent hyperparameter estimates alongside the signal estimate. The analysis shows that in the large-system limit ( $M, N \to \infty$ with $M/N \to \delta$ ), the EM-GAMP parameter estimates converge to the true values whenever the true parameters are identifiable. This result provides the theoretical foundation for the self-tuning RF imaging pipeline described in this chapter.

em-gamphyperparameter-estimationrf-imagingsparse-bayesianView Paper →

EM-GAMP for RF Imaging

The Hyperparameter Problem in Bayesian Inference

Historical Note: Origins of Expectation-Maximization

Definition: The EM-GAMP Model

EM-GAMP Algorithm

Theorem: Closed-Form EM M-Step Updates

Setup the Q-function

Optimize over noise variance

Optimize over prior parameters

EM-GAMP Hyperparameter Convergence

Parameters

Example: EM-GAMP for Sparse RF Scene with Unknown Parameters

Iteration 0 (initialization)

Iterations 1–10 (rapid convergence)

Iterations 10–25 (fine-tuning)

Comparison with oracle

Common Mistake: EM Can Converge to Local Optima at Low SNR

Common Mistake: Running Full GAMP to Convergence at Each EM Step Is Wasteful

EM-GAMP Calibration in Practice

EM Objective Landscape: log⁡p(y;ρ,σ2)\log p(\mathbf{y}; \rho, \sigma^2)logp(y;ρ,σ2)

Parameters

Quick Check

Key Takeaway

EM-Based Hyperparameter Estimation for RF Imaging

Definition:
The EM-GAMP Model

EM Objective Landscape: $\log p(\mathbf{y}; \rho, \sigma^2)$