Ferkans — Interactive Telecom Tutor

ex-ch19-01

Easy

(Gaussian GAMP reduces to AMP)

(a) Derive the GAMP output function $g_{\text{out}}(y, \hat{p}, \tau_p)$ for the Gaussian likelihood $p(y|z) = \mathcal{N}(y; z, \sigma^2)$ .

(b) Show that with i.i.d. Gaussian $\mathbf{A}$ ( $M \times N$ , entries $\sim \mathcal{N}(0, 1/M)$ ), the GAMP update simplifies to the AMP iteration from Chapter 17.

(c) Implement both algorithms in Python. Run on a sparse recovery problem ( $N = 500$ , $M = 200$ , $\rho = 0.1$ , SNR = 25 dB). Verify that iterates match to numerical precision (NMSE difference $< 10^{-10}$ ).

Show Hint

The posterior $p(z|y,\hat{p},\tau_p)$ is Gaussian. Compute its mean by combining two Gaussian factors.

For i.i.d. $\mathbf{A}$ , the output linear step gives $\tau_p = \tau_x/\delta$ where $\delta = M/N$ .

The Onsager correction term $\tau_p\,\hat{s}_m^{t-1}$ in the GAMP output linear step is the key difference from naive substitution.

Solution

Derive the posterior

$p(z|y,\hat{p},\tau_p) \propto \mathcal{N}(y;z,\sigma^2)\mathcal{N}(z;\hat{p},\tau_p) = \mathcal{N}(z; \mu_{\text{post}}, \tau_{\text{post}})$ where:

$\mu_{\text{post}} = \frac{\tau_p y + \sigma^2\hat{p}}{\tau_p + \sigma^2}, \quad \tau_{\text{post}} = \frac{\tau_p\sigma^2}{\tau_p + \sigma^2}.$

Compute $g_{\text{out}}$

$g_{\text{out}} = \frac{\mu_{\text{post}} - \hat{p}}{\tau_p} = \frac{y - \hat{p}}{\sigma^2 + \tau_p}.$ $

Identify the AMP residual

In AMP, the output step computes $\mathbf{r} = \mathbf{y} - \mathbf{A}\hat{\mathbf{x}}$ scaled by $(\sigma^2 + \tau_x/\delta)^{-1}$ . With $\tau_p = \tau_x/\delta$ , this matches $g_{\text{out}}$ exactly. $\blacksquare$

ex-ch19-02

Medium

(EM-GAMP for unknown noise variance)

(a) Generate a sparse recovery problem with unknown noise variance: $N = 500$ , $M = 250$ , $\rho = 0.08$ , $\sigma_x^2 = 1$ , $\sigma^2 = 0.01$ . Initialize $\hat{\sigma^2}^{(0)} = 1.0$ (100× overestimate).

(b) Implement the EM update for $\sigma^2$ : $\hat{\sigma^2}^{(k+1)} = \frac{1}{M}\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}\|^2$ .

(c) Run EM-GAMP for 50 outer iterations (1 GAMP step per EM step). Plot $\hat{\sigma^2}^{(k)}$ and the reconstruction NMSE (dB) vs. $k$ .

(d) Repeat with $\hat{\sigma^2}^{(0)} = 10^{-4}$ (100× underestimate). Does EM converge from both sides?

(e) Compare the final NMSE of EM-GAMP with (i) oracle GAMP (true $\sigma^2$ ) and (ii) GAMP with $\hat{\sigma^2} = 0.1$ (10× overestimate, fixed).

Show Hint

The M-step update for $\sigma^2$ is the average squared residual — this is a method-of-moments estimator for the noise power.

Monitor convergence by tracking $|\hat{\sigma^2}^{(k+1)} - \hat{\sigma^2}^{(k)}|$ .

Both initializations should converge to the same fixed point; EM guarantees non-decreasing marginal likelihood regardless of starting point.

Solution

Implementation sketch

hat_sig2 = 1.0  # initialization
for k in range(50):
    hat_x = gamp_step(A, y, rho, sig_x2, hat_sig2)
    residual = y - A @ hat_x
    hat_sig2 = np.sum(residual**2) / M  # M-step

Expected behavior

From above ( $\hat{\sigma^2}^{(0)} = 1.0$ ): EM reduces $\hat{\sigma^2}$ rapidly in the first 5–10 steps as the reconstruction improves. From below ( $\hat{\sigma^2}^{(0)} = 10^{-4}$ ): EM increases $\hat{\sigma^2}$ as the overly aggressive denoiser introduces artifacts. Both converge to $\hat{\sigma^2} \approx 0.01$ – $0.012$ within 30 steps.

NMSE comparison

Oracle GAMP: $\approx -19$ dB. EM-GAMP (converged): $\approx -18.8$ dB (gap $< 0.2$ dB). Mismatched GAMP ( $\hat{\sigma^2} = 0.1$ ): $\approx -14$ dB (5 dB penalty).

ex-ch19-03

Medium

(GAMP for 1-bit compressed sensing)

(a) Generate a sparse signal ( $N = 500$ , $\rho = 0.1$ ) and 1-bit measurements: $y_m = \text{sign}(\mathbf{a}_m^T\mathbf{x} + d_m)$ where $d_m \sim \mathcal{N}(0, 0.05)$ is the dither.

(b) Implement the probit output function: $g_{\text{out}}(y, \hat{p}, \tau_p) = y \cdot \phi(u) / (\Phi(yu) + \epsilon) / \sqrt{\sigma^2 + \tau_p}$ where $u = \hat{p}/\sqrt{\sigma^2 + \tau_p}$ .

(c) Run GAMP with this output function. Compare with (i) treating $y_m$ as a continuous Gaussian measurement (mismatched), (ii) BIHT (Binary Iterative Hard Thresholding).

(d) Sweep $M/N$ from 0.5 to 5.0. At what oversampling ratio does 1-bit GAMP achieve NMSE $< -15$ dB?

(e) How much extra oversampling is needed compared to standard CS (full-precision measurements at the same SNR)?

Show Hint

Use scipy.special.ndtr for $\Phi(\cdot)$ and scipy.special.ndtri for its inverse.

The Mills ratio $\phi(x)/\Phi(x)$ is numerically stable via scipy.special.erfcx.

Add a small $\epsilon = 10^{-10}$ to the denominator to prevent division by zero in the deep tail.

Solution

Probit output function

from scipy.special import ndtr, ndtri
def g_out_probit(y, phat, tau_p, sig_d=0.05):
    denom = np.sqrt(sig_d**2 + tau_p)
    u = phat / denom
    ratio = np.exp(scipy.stats.norm.logpdf(u) - np.log(ndtr(y * u) + 1e-10))
    return y * ratio / denom

Phase transition comparison

Standard CS (Gaussian noise, $\sigma^2 = \sigma_d^2 = 0.05$ ): NMSE $< -15$ dB at $M/N \approx 0.35$ for $\rho = 0.1$ .

1-bit GAMP: NMSE $< -15$ dB at $M/N \approx 0.9$ – $1.1$ . Extra oversampling penalty: $\approx 2.5$ – $3\times$ .

ex-ch19-04

Medium

(EM-GAMP with all parameters unknown)

Implement EM-GAMP for simultaneous estimation of $(\sigma^2, \rho, \sigma_x^2)$ .

(a) Use $N = 1000$ , $M = 500$ . True: $\rho = 0.08$ , $\sigma_x^2 = 2.0$ , $\sigma^2 = 0.02$ . Initial: $\hat{\rho} = 0.5$ , $\hat{\sigma}_x^2 = 0.5$ , $\hat{\sigma^2} = 0.5$ .

(b) Plot all three parameter estimates vs. EM iteration (30 total), alongside the true values.

(c) Run 20 random initializations. Plot the distribution of final NMSE across runs. Do all initializations converge to the same solution?

(d) Identify the SNR threshold below which EM-GAMP fails to correctly estimate parameters. Test SNR $\in \{5, 10, 15, 20, 25\}$ dB.

Show Hint

The sparsity update $\hat{\rho} = \frac{1}{N}\sum_i \pi_i$ requires computing the posterior inclusion probability $\pi_i$ from the BG denoiser output.

At low SNR, the noise and sparsity parameters trade off: high noise + high sparsity and low noise + low sparsity can have similar likelihoods.

For the multi-initialization experiment, initialize $\hat{\rho}$ uniformly in $[0.1, 0.9]$ and $\hat{\sigma^2}$ uniformly in $[0.01, 1.0]$ .

Solution

M-step update for all parameters

hat_rho = np.mean(pi_post)  # posterior inclusion average
hat_sigx2 = np.sum(pi_post * (hat_x**2 + tau_x)) / np.sum(pi_post)
hat_sig2 = np.sum((y - A @ hat_x)**2) / M

Expected convergence behavior

At SNR = 20 dB: all three parameters converge within 15–20 iterations. Final estimates within 10% of true values; NMSE within 0.3 dB of oracle.

At SNR = 5 dB: parameter estimation unreliable; EM may converge to a solution with $\hat{\rho} \approx 0$ (all noise, no signal) or $\hat{\rho} \approx 1$ (dense signal, high noise). Multiple restarts required.

ex-ch19-05

Hard

(State evolution for GAMP)

(a) Implement the GAMP state evolution (SE) recursion for the Gaussian likelihood with Bernoulli-Gaussian prior. For i.i.d. Gaussian $\mathbf{A}$ :

$\tau_p^{t+1} = \frac{\tau_x^t}{\delta}, \quad \tau_x^{t+1} = \mathbb{E}_{X_0, Z}\!\left[\left(g_{\text{in}}(X_0 + \sqrt{\tau_r^t}\,Z) - X_0\right)^2\right],$

where the expectation is over $X_0 \sim p_0$ and $Z \sim \mathcal{N}(0,1)$ .

(b) Implement GAMP SE for the probit (1-bit) output channel. Predict MSE vs. $M/N$ for 1-bit CS at $\rho = 0.1$ .

(c) Run actual GAMP for $N = 2000$ and compare with SE predictions. How accurately does SE predict the empirical MSE?

(d) Use GAMP SE to compute the phase transition for 1-bit CS: minimum $M/N$ for NMSE $< -10$ dB, as a function of sparsity $\rho \in [0.05, 0.4]$ .

(e) Compare the 1-bit phase transition with the standard (full-precision) phase transition. What is the "quantization penalty" in required measurements?

Show Hint

The SE expectation over $X_0$ requires numerical integration: discretize the BG distribution as $P(X_0 = 0) = 1-\rho$ and $X_0 | X_0 \neq 0 \sim \mathcal{N}(0, \sigma_x^2)$ .

For the probit output, the SE expectation $\mathbb{E}[-\partial g_{\text{out}}/\partial\hat{p}]$ involves a 2D integral over $(Y, W)$ ; use Gauss-Hermite quadrature.

SE is more accurate for larger $N$ ; at $N = 2000$ , expect SE to predict empirical MSE within $\pm 1$ dB.

Solution

Gaussian SE implementation

def gamp_se_gaussian(rho, sig_x2, sig2, delta, n_iter=50):
    tau_x = rho * sig_x2  # initial variance
    for t in range(n_iter):
        tau_p = tau_x / delta
        tau_r = 1 / (delta / (sig2 + tau_p))
        # Compute E[(g_in(X+sqrt(tau_r)*Z) - X)^2] by Monte Carlo
        X0 = (np.random.rand(5000) < rho) * np.random.randn(5000) * np.sqrt(sig_x2)
        Z = np.random.randn(5000)
        r = X0 + np.sqrt(tau_r) * Z
        # BG denoiser output
        hat_x = bg_denoiser(r, tau_r, rho, sig_x2)
        tau_x = np.mean((hat_x - X0)**2)
    return tau_x

Phase transition

For $\rho = 0.1$ : 1-bit CS achieves NMSE $< -10$ dB at $M/N \approx 0.8$ ; standard CS requires $M/N \approx 0.3$ . Quantization penalty: $\approx 2.7\times$ more measurements for 1-bit.

ex-ch19-06

Hard

(Multi-layer inference for dictionary learning)

(a) Generate a dictionary $\mathbf{D} \in \mathbb{R}^{50 \times 100}$ (i.i.d. Gaussian, column-normalized) and $L = 200$ sparse coefficient vectors $\mathbf{s}_\ell$ ( $\rho = 0.1$ , i.i.d. BG). Observations: $\mathbf{Y} = \mathbf{D}\mathbf{S} + \mathbf{W}$ ( $\sigma^2 = 0.01$ ).

(b) Implement simplified BiG-AMP: alternate between (i) GAMP to estimate $\mathbf{S}$ given $\hat{\mathbf{D}}$ , (ii) gradient step to update $\hat{\mathbf{D}}$ given $\hat{\mathbf{S}}$ .

(c) Initialize with random $\hat{\mathbf{D}}$ (column-normalized Gaussian). Run for 100 outer iterations. Plot the dictionary recovery error $\min_{\mathbf{P}} \|\hat{\mathbf{D}} - \mathbf{D}\mathbf{P}\|_F$ vs. iteration.

(d) Vary $L$ from 50 to 500. How many observations are needed for reliable dictionary recovery ( $< 5\%$ column error)?

(e) Compare with K-SVD on the same problem.

Show Hint

Dictionary recovery requires solving a permutation alignment problem: use the Hungarian algorithm (scipy.optimize.linear_sum_assignment).

The gradient step for $\mathbf{D}$ is: $\hat{\mathbf{D}} \leftarrow \hat{\mathbf{D}} - \eta(\hat{\mathbf{D}}\hat{\mathbf{S}} - \mathbf{Y})\hat{\mathbf{S}}^T$ ; normalize columns after each step.

For $L/N < 2$ , dictionary recovery is information-theoretically impossible; $L/N \approx 5$ is a practical threshold.

Solution

BiG-AMP outer loop

hat_D = np.random.randn(50, 100)
hat_D /= np.linalg.norm(hat_D, axis=0)  # normalize columns
for k in range(100):
    # E-step: estimate S given D
    hat_S = gamp_sparse(hat_D, Y, rho=0.1, sig2=0.01)
    # M-step: update D (gradient + normalize)
    residual = hat_D @ hat_S - Y
    hat_D -= 0.01 * residual @ hat_S.T
    hat_D /= np.linalg.norm(hat_D, axis=0)

Required observations

Empirically: $L \geq 5N = 500$ observations give $< 5\%$ column recovery error. This matches the degrees of freedom count: each dictionary column has $N_d = 50$ parameters, and each observation provides $\sim 1$ constraint → need $\sim 50 \times 5 = 250$ effective equations per column, which requires $L \approx 5N_{\text{dict}}$ .

ex-ch19-07

Hard

(EM-GAMP for RF imaging with unknown noise)

Build a complete EM-GAMP pipeline for RF imaging with a physical sensing matrix.

(a) System: $N_t = 4$ , $N_r = 8$ , $N_f = 32$ frequencies. Sensing matrix $\mathbf{A}$ : frequency-domain MIMO steering matrix (not i.i.d. Gaussian). Scene: $8 \times 8$ grid, Bernoulli-Gaussian, $\rho = 0.1$ .

(b) EM-GAMP: Implement with Gaussian likelihood, learning $(\sigma^2, \rho, \sigma_x^2)$ . Note: since $\mathbf{A}$ is structured (not i.i.d. Gaussian), use VAMP (Chapter 18) as the inner solver rather than AMP.

(c) Comparison: Test (i) EM-GAMP, (ii) GAMP oracle (true parameters), (iii) GAMP cross-validated, (iv) FISTA with cross-validated $\lambda$ .

(d) Noise robustness: Sweep SNR $\in \{5, 10, 15, 20, 25, 30\}$ dB. Plot NMSE and parameter estimation error vs. SNR for all methods.

(e) Analysis: At what SNR does EM-GAMP gain more than 3 dB over mismatched GAMP? What is the dominant source of performance degradation at low SNR?

Show Hint

The frequency-domain MIMO steering matrix has the Kronecker structure $\mathbf{A} = \mathbf{F}_f \otimes \mathbf{G}_{\text{array}}$ ; use the Kronecker VAMP from Chapter 18.

Cross-validation for GAMP requires splitting the measurements and evaluating held-out log-likelihood — computationally expensive.

At low SNR, the EM fixed point may correspond to the trivial solution $\hat{\rho} \to 0$ , $\hat{\sigma^2} \to \|\mathbf{y}\|^2/M$ .

Solution

Kronecker VAMP as inner solver

For the physical sensing matrix $\mathbf{A} = \mathbf{F}_f \otimes \mathbf{G}$ , use the VAMP SVD trick: precompute $\mathbf{U}, \mathbf{D}, \mathbf{V} = \text{SVD}(\mathbf{A})$ once; the VAMP linear estimator is then $O(MN)$ via pre-rotated measurements.

Expected results

At SNR = 20 dB: EM-GAMP $\approx$ oracle GAMP within 0.3 dB. At SNR = 10 dB: EM-GAMP gains $\approx 4$ dB over FISTA (correct noise model vs. $\ell_1$ ). At SNR = 5 dB: EM-GAMP may fail; use multiple restarts or bound $\hat{\rho}$ .

ex-ch19-08

Challenge

(Poisson GAMP for photon-limited sensing)

(a) Model a photon-counting RF/optical sensing system: $y_m \sim \text{Poisson}(\lambda_m)$ where $\lambda_m = \exp(\mathbf{a}_m^T\mathbf{x} + b)$ , $b = 1$ (background intensity).

(b) Derive the Poisson output function via Laplace approximation: $\hat{z}_{\text{post}} \approx \hat{p} + \tau_p(y - e^{\hat{p} + \tau_p/2})$ and implement $g_{\text{out}}$ .

(c) Generate a non-negative sparse signal ( $N = 500$ , $\rho = 0.15$ , entries $\sim \text{Exp}(1)$ ) and Poisson measurements. Run GAMP with Bernoulli-Exponential prior (non-negative support).

(d) Compare with (i) standard GAMP treating $y_m$ as Gaussian with variance $\lambda_m$ (known mean approximation), (ii) standard GAMP treating $y_m$ as Gaussian with variance $y_m$ (empirical approximation).

(e) Vary the background intensity $b$ from 0 to 3. At what background level does the Gaussian approximation become adequate (NMSE gap $< 1$ dB)?

Show Hint

The Laplace approximation to the Poisson posterior is accurate when $\lambda_m \gg 1$ (many photons); this requires $\exp(\hat{p}) \gg 1$ .

The Bernoulli-Exponential denoiser (non-negative BG): the posterior given $r \sim x + \mathcal{N}(0, \tau_r)$ for $x \sim (1-\rho)\delta(x) + \rho\,\text{Exp}(\mu)$ has a closed form involving the complementary error function.

At high background ( $b > 2$ ), the Poisson distribution approaches Gaussian; at low background ( $b < 0$ ), the shot noise dominates and Gaussian approximation breaks down.

Solution

Poisson output function

def g_out_poisson(y, phat, tau_p, b=1.0):
    lambda_hat = np.exp(phat + tau_p / 2 + b)
    z_post = phat + tau_p * (y - lambda_hat)
    return (z_post - phat) / tau_p  # g_out

Phase transition for background

At $b = 0$ : Poisson GAMP outperforms Gaussian by $\sim 3$ dB. At $b = 2$ ( $\lambda \approx 7.4$ photons average): gap reduces to $\approx 0.5$ dB. At $b = 3$ ( $\lambda \approx 20$ photons): gap $< 0.2$ dB (Gaussian is adequate).

ex-ch19-09

Easy

(Identifying valid GAMP output channels)

For each of the following measurement models, state whether GAMP can be applied directly (i.e., whether $g_{\text{out}}$ can be computed in closed form or efficiently approximated) and write the key formula for $g_{\text{out}}$ :

(a) $p(y|z) = \mathcal{N}(y; z, \sigma^2)$ (additive Gaussian).

(b) $p(y|z) = \text{Bernoulli}(\sigma(z))$ where $\sigma(z) = (1+e^{-z})^{-1}$ .

(c) $p(y|z) = \text{Poisson}(e^z)$ .

(d) $p(y|z) = \delta(y - |z|^2)$ (noiseless power measurement).

(e) $p(y|z)$ is a Gaussian mixture: $\sum_k w_k \mathcal{N}(y; z, \sigma^2_{k})$ .

Show Hint

For (d), the posterior $p(z|y) \propto \mathcal{N}(z;\hat{p},\tau_p)\delta(y-|z|^2)$ is concentrated on a circle in the complex plane — requires an approximation.

For (e), the posterior is a weighted sum of Gaussians — computable in closed form.

Solution

Answers

(a) Gaussian: $g_{\text{out}} = (y - \hat{p})/(\sigma^2 + \tau_p)$ . Exact.

(b) Logistic (binary): $g_{\text{out}} \approx y\,\phi(u)/\Phi(yu)$ via probit approximation (logistic $\approx$ probit with $\sigma = \pi/\sqrt{3}$ ).

(c) Poisson: $g_{\text{out}} \approx (y - e^{\hat{p}+\tau_p/2})/\tau_p$ via Laplace approximation (exact in the large-count limit).

(d) Power-only (phaseless): requires a truncated complex Gaussian approximation. $g_{\text{out}}$ involves the ratio $y/(\hat{p}^2 + \tau_p)$ multiplied by a phase correction term. Exact computation requires numerical integration.

(e) Gaussian mixture: $g_{\text{out}} = \sum_k \tilde{w}_k (y-\hat{p})/(\sigma^2_{k}+\tau_p)$ where $\tilde{w}_k \propto w_k \mathcal{N}(y;\hat{p},\sigma^2_{k}+\tau_p)$ . Exact.

ex-ch19-10

Easy

(EM-GAMP M-step derivation)

Starting from the Q-function:

$Q(\boldsymbol{\theta}) = -M\log(\pi\sigma^2) - \frac{1}{\sigma^2}\mathbb{E}_q\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2 + \sum_i \mathbb{E}_q[\log p_0(x_i; \rho, \sigma_x^2)],$

where $q(\mathbf{x}) = \prod_i \mathcal{N}(x_i; \hat{x}_i, \tau_{x,i})$ :

(a) Derive $\hat{\sigma^2} = \arg\max_{\sigma^2} Q(\boldsymbol{\theta})$ .

(b) Show that $\mathbb{E}_q\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2 = \|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}\|^2 + \sum_i \tau_{x,i}\|[\mathbf{A}]_{:,i}\|^2$ .

(c) For a Bernoulli-Gaussian prior, show that the M-step for $\rho$ gives $\hat{\rho} = \frac{1}{N}\sum_i \pi_i$ where $\pi_i$ is the posterior inclusion probability from the GAMP E-step.

Show Hint

For (a): differentiate $Q$ with respect to $\sigma^2$ and set to zero.

For (b): use $\mathbb{E}_q[x_i^2] = \hat{x}_i^2 + \tau_{x,i}$ and expand the squared norm.

For (c): $\log p_0(x;\rho,\sigma_x^2) = \log[(1-\rho)\delta(x) + \rho\mathcal{N}(x;0,\sigma_x^2)]$ ; take the expected value under $q$ and differentiate w.r.t. $\rho$ .

Solution

M-step for noise variance

$\partial Q/\partial\sigma^2 = -M/\sigma^2 + (1/(\sigma^2)^2)\mathbb{E}_q\|\mathbf{y}-\mathbf{A}\mathbf{x}\|^2 = 0$

$\Rightarrow \hat{\sigma^2} = (1/M)\mathbb{E}_q\|\mathbf{y}-\mathbf{A}\mathbf{x}\|^2$ .

Bias correction

$\mathbb{E}_q\|(y_m - \sum_i A_{mi}x_i)\|^2 = \mathbb{E}_q\sum_m(y_m - \sum_i A_{mi}x_i)^2$

$= \sum_m\left[(y_m - \sum_i A_{mi}\hat{x}_i)^2 + \sum_i A_{mi}^2\tau_{x,i}\right]$

since $\text{Cov}_q(x_i, x_j) = \tau_{x,i}\delta_{ij}$ (independent approximation). $\blacksquare$

M-step for sparsity

$\mathbb{E}_q[\log p_0(x_i;\rho)] = \pi_i\log\rho + (1-\pi_i)\log(1-\rho) + \text{const}$ .

Summing over $i$ : $\partial/\partial\rho \sum_i [\pi_i\log\rho + (1-\pi_i)\log(1-\rho)] = 0$

$\Rightarrow \hat{\rho} = (1/N)\sum_i\pi_i$ . $\blacksquare$

ex-ch19-11

Medium

(GAMP damping and convergence)

GAMP's convergence can be improved by damping: replacing the update $\hat{s}_m^{t+1} \leftarrow g_{\text{out}}$ with $\hat{s}_m^{t+1} \leftarrow \alpha\,g_{\text{out}} + (1-\alpha)\hat{s}_m^t$ .

(a) Implement GAMP with damping parameter $\alpha \in (0, 1]$ for the 1-bit probit output channel.

(b) Run for $N = 300$ , $M = 250$ , $\rho = 0.1$ , 50 iterations. Test $\alpha \in \{0.3, 0.5, 0.7, 1.0\}$ (undamped).

(c) Plot the residual $\|\hat{p}^t - \hat{p}^{t-1}\|_2$ vs. iteration for each $\alpha$ . Which value gives the best convergence?

(d) Explain why $\alpha = 1.0$ (undamped) can diverge for the probit model but not for the Gaussian model.

Show Hint

The Gaussian model has a unique Bethe free energy minimum; the probit model may have multiple local minima, causing oscillation without damping.

The optimal $\alpha$ depends on the spectral radius of the linearized GAMP iteration; for practical purposes, $\alpha = 0.5$ is a safe default.

Solution

Why undamped GAMP oscillates for probit

The probit $g_{\text{out}}$ has large derivatives $|\partial g_{\text{out}}/\partial\hat{p}|$ near $\hat{p} = 0$ (where the sign flip occurs). For Gaussian, this derivative is bounded by $1/(\sigma^2+\tau_p)$ . The Jacobian of the linearized GAMP iteration has spectral radius $> 1$ without damping for the probit model at small $\tau_p$ , causing divergence.

Optimal damping

For 1-bit GAMP at $M/N = 0.83$ : $\alpha = 0.5$ converges in $\sim 30$ iterations; $\alpha = 1.0$ oscillates and may not converge. $\alpha = 0.3$ is stable but converges slowly (50+ iterations needed).

ex-ch19-12

Medium

(Multi-layer model: dimensionality and reconstruction)

Consider a two-layer generative model: $\mathbf{c} = \text{ReLU}(\mathbf{D}\mathbf{z})$ where $\mathbf{D} \in \mathbb{R}^{N \times K}$ (known dictionary) and $\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_K)$ .

(a) For $N = 256$ , $K = 64$ , $M = 50$ : can the latent code $\mathbf{z}$ be recovered? State the information-theoretic condition.

(b) Generate $\mathbf{z}$ , compute $\mathbf{c}$ , and observe $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ with $M = 50$ . Attempt recovery by: (i) optimizing $\mathbf{z}$ to minimize $\|\mathbf{y} - \mathbf{A}\,\text{ReLU}(\mathbf{D}\mathbf{z})\|^2$ , (ii) using standard GAMP on the full scene $\mathbf{c}$ (ignores structure).

(c) Compare NMSE for both methods. Explain the difference.

(d) What happens when $K > M$ ? When $K = N$ ?

Show Hint

For (a): $\mathbf{z}$ has $K = 64$ degrees of freedom; with $M = 50$ measurements (well, each measurement constrains one linear function of $\mathbf{c}$ , not of $\mathbf{z}$ directly).

For (b)(i): use gradient descent with backpropagation through ReLU.

When $K < M$ , the latent code can be recovered because the effective problem dimension is $K$ , not $N$ .

Solution

Information-theoretic condition

Recovery of $\mathbf{z} \in \mathbb{R}^K$ from $M$ measurements of $\mathbf{A}\,\text{ReLU}(\mathbf{D}\mathbf{z})$ requires roughly $M \geq K$ (matching degrees of freedom). For Gaussian prior on $\mathbf{z}$ , this is sufficient with overwhelming probability when $\mathbf{A}$ and $\mathbf{D}$ are both i.i.d. Gaussian. For $M = 50 > K = 64$ — wait, here $M < K$ , so recovery is impossible without additional structure.

Corrected setup with $M > K$

Correcting to $M = 80 > K = 64$ : latent code recovery is feasible. Gradient descent on $\mathbf{z}$ (latent space optimization) achieves NMSE $\approx -15$ dB; standard GAMP on the full $N = 256$ -dimensional $\mathbf{c}$ fails (NMSE $\approx -2$ dB) since $M/N = 80/256 = 0.31$ is below the CS phase transition for non-sparse $\mathbf{c}$ .

ex-ch19-13

Medium

(GAMP vs VAMP for physical sensing matrices)

A practical RF imaging system uses a MIMO sensing matrix $\mathbf{A} = \mathbf{F}_{\text{freq}} \otimes \mathbf{G}_{\text{array}}$ (Kronecker product, not i.i.d. Gaussian).

(a) Explain why standard GAMP state evolution does not apply to this matrix.

(b) Generate a $128 \times 256$ Kronecker sensing matrix. Run (i) AMP, (ii) VAMP on the same sparse recovery problem ( $\rho = 0.1$ , SNR = 20 dB).

(c) Plot NMSE vs. iteration for AMP and VAMP. Which converges faster and why?

(d) For this matrix, apply EM-GAMP (using VAMP as inner solver). Does the EM M-step still apply in the same form?

(e) What is the computational saving of using Kronecker VAMP (FFT-based) vs. dense SVD-based VAMP for $N_t = 8$ , $N_r = 8$ , $N_f = 64$ ?

Show Hint

GAMP SE assumes i.i.d. Gaussian $\mathbf{A}$ ; Kronecker matrices have correlated entries and structured singular values.

VAMP corrects for the structured matrix by applying the exact LMMSE estimator at each step (using the SVD of $\mathbf{A}$ ).

For (e): dense SVD costs $O(M^2 N)$ once; Kronecker FFT costs $O(N\log N)$ per iteration.

Solution

Why AMP fails for Kronecker matrices

AMP's Onsager correction assumes the random matrix concentration property: $\|A^T A / M - I\|_F \to 0$ . For Kronecker matrices, the singular values are products $\sigma_i^{(1)} \sigma_j^{(2)}$ , which are concentrated near zero for large dimensions — violating the concentration assumption.

M-step validity

The EM M-step updates for $(\sigma^2, \rho, \sigma_x^2)$ depend only on the posterior statistics $(\hat{x}_i, \tau_{x,i}, \pi_i)$ , which are correctly computed by VAMP regardless of the matrix structure. The M-step is therefore identical for GAMP and VAMP inner solvers.

ex-ch19-14

Hard

(EM for Gaussian mixture prior)

The Gaussian mixture (GM) prior generalizes BG:

$p_0(x) = \sum_{\ell=1}^{L} w_\ell\,\mathcal{CN}(x; \mu_\ell, \sigma_\ell^2).$

(a) Derive the EM M-step for the mixture weights $\{w_\ell\}$ , means $\{\mu_\ell\}$ , and variances $\{\sigma_\ell^2\}$ .

(b) Implement EM-GM-GAMP with $L = 3$ components for a sparse signal with non-zero entries drawn from a Gaussian mixture (two Gaussians at $\pm 2$ plus a small cluster near zero).

(c) Compare with EM-BG-GAMP (Bernoulli-Gaussian, $L=2$ effectively). Which achieves lower NMSE when the prior is correctly specified?

(d) What happens when EM-BG-GAMP is applied to data from a GM prior (prior mismatch)? Quantify the NMSE penalty.

Show Hint

The GM E-step computes the posterior responsibility $r_{\ell,i} = p(\ell | x_i, \hat{r}_i, \tau_r)$ : the probability that $x_i$ was drawn from component $\ell$ .

M-step for $\mu_\ell$ : weighted mean of $\hat{x}_i$ with weights $r_{\ell,i}$ .

M-step for $\sigma_\ell^2$ : weighted variance of $\hat{x}_i^2 + \tau_{x,i}$ minus $\hat{\mu}_\ell^2$ .

Solution

GM posterior responsibility

Given a Gaussian cavity $\mathcal{N}(r; \hat{r}_i, \tau_r)$ , the posterior responsibility for component $\ell$ is:

$r_{\ell,i} = \frac{w_\ell\,\mathcal{N}(\hat{r}_i; \mu_\ell, \sigma_\ell^2 + \tau_r)} {\sum_{k} w_k\,\mathcal{N}(\hat{r}_i; \mu_k, \sigma_k^2 + \tau_r)}.$

M-step updates

$\hat{w}_\ell = \frac{1}{N}\sum_i r_{\ell,i}$ , $\hat{\mu}_\ell = \frac{\sum_i r_{\ell,i}\hat{x}_i}{\sum_i r_{\ell,i}}$ , $\hat{\sigma}_\ell^2 = \frac{\sum_i r_{\ell,i}[(\hat{x}_i - \hat{\mu}_\ell)^2 + \tau_{x,i}]}{\sum_i r_{\ell,i}}.$

Prior mismatch penalty

For the Gaussian mixture signal: EM-GM-GAMP achieves NMSE $\approx -22$ dB; EM-BG-GAMP (mismatch) achieves $\approx -18$ dB — a 4 dB penalty from imposing the wrong prior shape on the nonzero coefficients.

ex-ch19-15

Challenge

(Complete self-tuning RF imaging system)

Design and implement a complete self-tuning Bayesian RF imaging system combining all three elements from this chapter.

System specification:

MIMO radar: $N_t = 4$ , $N_r = 8$ , $N_f = 32$ frequencies.
Scene: $16 \times 16$ grid ( $N = 256$ voxels), BG prior.
Physical sensing matrix (Kronecker structure).

(a) Implement EM-VAMP (VAMP as inner solver, EM outer loop) to learn $(\sigma^2, \rho, \sigma_x^2)$ automatically.

(b) Extend to 1-bit receivers: replace the Gaussian likelihood with the probit output channel. How does the EM M-step for $\sigma^2$ change (hint: $\sigma^2$ is now the dither variance)?

(c) Test at SNR $\in \{5, 10, 15, 20\}$ dB with both full-precision and 1-bit receivers. Plot NMSE vs. SNR for: EM-VAMP (full), EM-VAMP (1-bit), oracle VAMP (full), oracle VAMP (1-bit).

(d) For the 1-bit case, does EM converge to the correct $\sigma^2$ (dither variance)? Or does EM learn a different effective noise level?

(e) Quantify the information loss from 1-bit quantization as a function of SNR: plot the "measurement overhead" (extra $M/N$ needed to match full-precision NMSE) vs. SNR.

Show Hint

For (b): the probit model has no explicit $\sigma^2$ in the likelihood (it is baked into the dither); the EM update for $\sigma^2$ should be disabled or replaced with dither variance estimation.

For (d): EM will learn an effective noise level proportional to $\pi\tau_p/4$ (related to the Fisher information of the sign measurement), not the true dither variance.

For (e): the overhead curve has a minimum of $\sim 2\times$ at high SNR and grows to $\sim 5\times$ at SNR = 5 dB.

Solution

EM-VAMP framework

The EM outer loop is identical to EM-GAMP, with VAMP replacing the inner solver. The M-step receives $(\hat{x}_i, \tau_{x,i}, \pi_i)$ from VAMP and applies the same closed-form updates. The Kronecker structure of $\mathbf{A}$ reduces the VAMP linear step from $O(MN)$ to $O(N\log N)$ .

1-bit EM adaptation

For probit likelihood, the dither variance $\sigma_d^2$ controls the output channel steepness. EM can be used to optimize $\sigma_d^2$ via:

$\hat{\sigma}_d^{2(k+1)} = \arg\max_{\sigma_d^2} \sum_m \mathbb{E}_q[\log\Phi(y_m z_m / \sigma_d)].$

This has no closed form but can be solved by 1D line search. Alternatively, fix $\sigma_d^2$ to the known dither level and use EM only for prior parameters.

Measurement overhead result

At SNR = 20 dB: overhead $\approx 2.3\times$ (need $M/N = 0.8$ vs. $0.35$ for full-precision). At SNR = 10 dB: overhead $\approx 3.1\times$ . At SNR = 5 dB: overhead $> 5\times$ (1-bit becomes highly inefficient at low SNR).

Exercises

ex-ch19-01

Derive the posterior

Compute $g_{\text{out}}$

Identify the AMP residual

ex-ch19-02

Implementation sketch

Expected behavior

NMSE comparison

ex-ch19-03

Probit output function

Phase transition comparison

ex-ch19-04

M-step update for all parameters

Expected convergence behavior

ex-ch19-05

Gaussian SE implementation

Phase transition

ex-ch19-06

BiG-AMP outer loop

Required observations

ex-ch19-07

Kronecker VAMP as inner solver

Expected results

ex-ch19-08

Poisson output function

Phase transition for background

ex-ch19-09

Answers

ex-ch19-10

M-step for noise variance

Bias correction

M-step for sparsity

ex-ch19-11

Why undamped GAMP oscillates for probit

Optimal damping

ex-ch19-12

Information-theoretic condition

Corrected setup with $M > K$

ex-ch19-13

Why AMP fails for Kronecker matrices

M-step validity

ex-ch19-14

GM posterior responsibility

M-step updates

Prior mismatch penalty

ex-ch19-15

EM-VAMP framework

1-bit EM adaptation

Measurement overhead result