Exercises
ex-ch19-01
Easy(Gaussian GAMP reduces to AMP)
(a) Derive the GAMP output function for the Gaussian likelihood .
(b) Show that with i.i.d. Gaussian (, entries ), the GAMP update simplifies to the AMP iteration from Chapter 17.
(c) Implement both algorithms in Python. Run on a sparse recovery problem (, , , SNR = 25 dB). Verify that iterates match to numerical precision (NMSE difference ).
The posterior is Gaussian. Compute its mean by combining two Gaussian factors.
For i.i.d. , the output linear step gives where .
The Onsager correction term in the GAMP output linear step is the key difference from naive substitution.
Derive the posterior
where:
Compute $g_{\text{out}}$
$
Identify the AMP residual
In AMP, the output step computes scaled by . With , this matches exactly.
ex-ch19-02
Medium(EM-GAMP for unknown noise variance)
(a) Generate a sparse recovery problem with unknown noise variance: , , , , . Initialize (100Γ overestimate).
(b) Implement the EM update for : .
(c) Run EM-GAMP for 50 outer iterations (1 GAMP step per EM step). Plot and the reconstruction NMSE (dB) vs. .
(d) Repeat with (100Γ underestimate). Does EM converge from both sides?
(e) Compare the final NMSE of EM-GAMP with (i) oracle GAMP (true ) and (ii) GAMP with (10Γ overestimate, fixed).
The M-step update for is the average squared residual β this is a method-of-moments estimator for the noise power.
Monitor convergence by tracking .
Both initializations should converge to the same fixed point; EM guarantees non-decreasing marginal likelihood regardless of starting point.
Implementation sketch
hat_sig2 = 1.0 # initialization
for k in range(50):
hat_x = gamp_step(A, y, rho, sig_x2, hat_sig2)
residual = y - A @ hat_x
hat_sig2 = np.sum(residual**2) / M # M-step
Expected behavior
From above (): EM reduces rapidly in the first 5β10 steps as the reconstruction improves. From below (): EM increases as the overly aggressive denoiser introduces artifacts. Both converge to β within 30 steps.
NMSE comparison
Oracle GAMP: dB. EM-GAMP (converged): dB (gap dB). Mismatched GAMP (): dB (5 dB penalty).
ex-ch19-03
Medium(GAMP for 1-bit compressed sensing)
(a) Generate a sparse signal (, ) and 1-bit measurements: where is the dither.
(b) Implement the probit output function: where .
(c) Run GAMP with this output function. Compare with (i) treating as a continuous Gaussian measurement (mismatched), (ii) BIHT (Binary Iterative Hard Thresholding).
(d) Sweep from 0.5 to 5.0. At what oversampling ratio does 1-bit GAMP achieve NMSE dB?
(e) How much extra oversampling is needed compared to standard CS (full-precision measurements at the same SNR)?
Use scipy.special.ndtr for and scipy.special.ndtri for its inverse.
The Mills ratio is numerically stable via scipy.special.erfcx.
Add a small to the denominator to prevent division by zero in the deep tail.
Probit output function
from scipy.special import ndtr, ndtri
def g_out_probit(y, phat, tau_p, sig_d=0.05):
denom = np.sqrt(sig_d**2 + tau_p)
u = phat / denom
ratio = np.exp(scipy.stats.norm.logpdf(u) - np.log(ndtr(y * u) + 1e-10))
return y * ratio / denom
Phase transition comparison
Standard CS (Gaussian noise, ): NMSE dB at for .
1-bit GAMP: NMSE dB at β. Extra oversampling penalty: β.
ex-ch19-04
Medium(EM-GAMP with all parameters unknown)
Implement EM-GAMP for simultaneous estimation of .
(a) Use , . True: , , . Initial: , , .
(b) Plot all three parameter estimates vs. EM iteration (30 total), alongside the true values.
(c) Run 20 random initializations. Plot the distribution of final NMSE across runs. Do all initializations converge to the same solution?
(d) Identify the SNR threshold below which EM-GAMP fails to correctly estimate parameters. Test SNR dB.
The sparsity update requires computing the posterior inclusion probability from the BG denoiser output.
At low SNR, the noise and sparsity parameters trade off: high noise + high sparsity and low noise + low sparsity can have similar likelihoods.
For the multi-initialization experiment, initialize uniformly in and uniformly in .
M-step update for all parameters
hat_rho = np.mean(pi_post) # posterior inclusion average
hat_sigx2 = np.sum(pi_post * (hat_x**2 + tau_x)) / np.sum(pi_post)
hat_sig2 = np.sum((y - A @ hat_x)**2) / M
Expected convergence behavior
At SNR = 20 dB: all three parameters converge within 15β20 iterations. Final estimates within 10% of true values; NMSE within 0.3 dB of oracle.
At SNR = 5 dB: parameter estimation unreliable; EM may converge to a solution with (all noise, no signal) or (dense signal, high noise). Multiple restarts required.
ex-ch19-05
Hard(State evolution for GAMP)
(a) Implement the GAMP state evolution (SE) recursion for the Gaussian likelihood with Bernoulli-Gaussian prior. For i.i.d. Gaussian :
where the expectation is over and .
(b) Implement GAMP SE for the probit (1-bit) output channel. Predict MSE vs. for 1-bit CS at .
(c) Run actual GAMP for and compare with SE predictions. How accurately does SE predict the empirical MSE?
(d) Use GAMP SE to compute the phase transition for 1-bit CS: minimum for NMSE dB, as a function of sparsity .
(e) Compare the 1-bit phase transition with the standard (full-precision) phase transition. What is the "quantization penalty" in required measurements?
The SE expectation over requires numerical integration: discretize the BG distribution as and .
For the probit output, the SE expectation involves a 2D integral over ; use Gauss-Hermite quadrature.
SE is more accurate for larger ; at , expect SE to predict empirical MSE within dB.
Gaussian SE implementation
def gamp_se_gaussian(rho, sig_x2, sig2, delta, n_iter=50):
tau_x = rho * sig_x2 # initial variance
for t in range(n_iter):
tau_p = tau_x / delta
tau_r = 1 / (delta / (sig2 + tau_p))
# Compute E[(g_in(X+sqrt(tau_r)*Z) - X)^2] by Monte Carlo
X0 = (np.random.rand(5000) < rho) * np.random.randn(5000) * np.sqrt(sig_x2)
Z = np.random.randn(5000)
r = X0 + np.sqrt(tau_r) * Z
# BG denoiser output
hat_x = bg_denoiser(r, tau_r, rho, sig_x2)
tau_x = np.mean((hat_x - X0)**2)
return tau_x
Phase transition
For : 1-bit CS achieves NMSE dB at ; standard CS requires . Quantization penalty: more measurements for 1-bit.
ex-ch19-06
Hard(Multi-layer inference for dictionary learning)
(a) Generate a dictionary (i.i.d. Gaussian, column-normalized) and sparse coefficient vectors (, i.i.d. BG). Observations: ().
(b) Implement simplified BiG-AMP: alternate between (i) GAMP to estimate given , (ii) gradient step to update given .
(c) Initialize with random (column-normalized Gaussian). Run for 100 outer iterations. Plot the dictionary recovery error vs. iteration.
(d) Vary from 50 to 500. How many observations are needed for reliable dictionary recovery ( column error)?
(e) Compare with K-SVD on the same problem.
Dictionary recovery requires solving a permutation alignment problem: use the Hungarian algorithm (scipy.optimize.linear_sum_assignment).
The gradient step for is: ; normalize columns after each step.
For , dictionary recovery is information-theoretically impossible; is a practical threshold.
BiG-AMP outer loop
hat_D = np.random.randn(50, 100)
hat_D /= np.linalg.norm(hat_D, axis=0) # normalize columns
for k in range(100):
# E-step: estimate S given D
hat_S = gamp_sparse(hat_D, Y, rho=0.1, sig2=0.01)
# M-step: update D (gradient + normalize)
residual = hat_D @ hat_S - Y
hat_D -= 0.01 * residual @ hat_S.T
hat_D /= np.linalg.norm(hat_D, axis=0)
Required observations
Empirically: observations give column recovery error. This matches the degrees of freedom count: each dictionary column has parameters, and each observation provides constraint β need effective equations per column, which requires .
ex-ch19-07
Hard(EM-GAMP for RF imaging with unknown noise)
Build a complete EM-GAMP pipeline for RF imaging with a physical sensing matrix.
(a) System: , , frequencies. Sensing matrix : frequency-domain MIMO steering matrix (not i.i.d. Gaussian). Scene: grid, Bernoulli-Gaussian, .
(b) EM-GAMP: Implement with Gaussian likelihood, learning . Note: since is structured (not i.i.d. Gaussian), use VAMP (Chapter 18) as the inner solver rather than AMP.
(c) Comparison: Test (i) EM-GAMP, (ii) GAMP oracle (true parameters), (iii) GAMP cross-validated, (iv) FISTA with cross-validated .
(d) Noise robustness: Sweep SNR dB. Plot NMSE and parameter estimation error vs. SNR for all methods.
(e) Analysis: At what SNR does EM-GAMP gain more than 3 dB over mismatched GAMP? What is the dominant source of performance degradation at low SNR?
The frequency-domain MIMO steering matrix has the Kronecker structure ; use the Kronecker VAMP from Chapter 18.
Cross-validation for GAMP requires splitting the measurements and evaluating held-out log-likelihood β computationally expensive.
At low SNR, the EM fixed point may correspond to the trivial solution , .
Kronecker VAMP as inner solver
For the physical sensing matrix , use the VAMP SVD trick: precompute once; the VAMP linear estimator is then via pre-rotated measurements.
Expected results
At SNR = 20 dB: EM-GAMP oracle GAMP within 0.3 dB. At SNR = 10 dB: EM-GAMP gains dB over FISTA (correct noise model vs. ). At SNR = 5 dB: EM-GAMP may fail; use multiple restarts or bound .
ex-ch19-08
Challenge(Poisson GAMP for photon-limited sensing)
(a) Model a photon-counting RF/optical sensing system: where , (background intensity).
(b) Derive the Poisson output function via Laplace approximation: and implement .
(c) Generate a non-negative sparse signal (, , entries ) and Poisson measurements. Run GAMP with Bernoulli-Exponential prior (non-negative support).
(d) Compare with (i) standard GAMP treating as Gaussian with variance (known mean approximation), (ii) standard GAMP treating as Gaussian with variance (empirical approximation).
(e) Vary the background intensity from 0 to 3. At what background level does the Gaussian approximation become adequate (NMSE gap dB)?
The Laplace approximation to the Poisson posterior is accurate when (many photons); this requires .
The Bernoulli-Exponential denoiser (non-negative BG): the posterior given for has a closed form involving the complementary error function.
At high background (), the Poisson distribution approaches Gaussian; at low background (), the shot noise dominates and Gaussian approximation breaks down.
Poisson output function
def g_out_poisson(y, phat, tau_p, b=1.0):
lambda_hat = np.exp(phat + tau_p / 2 + b)
z_post = phat + tau_p * (y - lambda_hat)
return (z_post - phat) / tau_p # g_out
Phase transition for background
At : Poisson GAMP outperforms Gaussian by dB. At ( photons average): gap reduces to dB. At ( photons): gap dB (Gaussian is adequate).
ex-ch19-09
Easy(Identifying valid GAMP output channels)
For each of the following measurement models, state whether GAMP can be applied directly (i.e., whether can be computed in closed form or efficiently approximated) and write the key formula for :
(a) (additive Gaussian).
(b) where .
(c) .
(d) (noiseless power measurement).
(e) is a Gaussian mixture: .
For (d), the posterior is concentrated on a circle in the complex plane β requires an approximation.
For (e), the posterior is a weighted sum of Gaussians β computable in closed form.
Answers
(a) Gaussian: . Exact.
(b) Logistic (binary): via probit approximation (logistic probit with ).
(c) Poisson: via Laplace approximation (exact in the large-count limit).
(d) Power-only (phaseless): requires a truncated complex Gaussian approximation. involves the ratio multiplied by a phase correction term. Exact computation requires numerical integration.
(e) Gaussian mixture: where . Exact.
ex-ch19-10
Easy(EM-GAMP M-step derivation)
Starting from the Q-function:
where :
(a) Derive .
(b) Show that .
(c) For a Bernoulli-Gaussian prior, show that the M-step for gives where is the posterior inclusion probability from the GAMP E-step.
For (a): differentiate with respect to and set to zero.
For (b): use and expand the squared norm.
For (c): ; take the expected value under and differentiate w.r.t. .
M-step for noise variance
.
Bias correction
since (independent approximation).
M-step for sparsity
.
Summing over :
.
ex-ch19-11
Medium(GAMP damping and convergence)
GAMP's convergence can be improved by damping: replacing the update with .
(a) Implement GAMP with damping parameter for the 1-bit probit output channel.
(b) Run for , , , 50 iterations. Test (undamped).
(c) Plot the residual vs. iteration for each . Which value gives the best convergence?
(d) Explain why (undamped) can diverge for the probit model but not for the Gaussian model.
The Gaussian model has a unique Bethe free energy minimum; the probit model may have multiple local minima, causing oscillation without damping.
The optimal depends on the spectral radius of the linearized GAMP iteration; for practical purposes, is a safe default.
Why undamped GAMP oscillates for probit
The probit has large derivatives near (where the sign flip occurs). For Gaussian, this derivative is bounded by . The Jacobian of the linearized GAMP iteration has spectral radius without damping for the probit model at small , causing divergence.
Optimal damping
For 1-bit GAMP at : converges in iterations; oscillates and may not converge. is stable but converges slowly (50+ iterations needed).
ex-ch19-12
Medium(Multi-layer model: dimensionality and reconstruction)
Consider a two-layer generative model: where (known dictionary) and .
(a) For , , : can the latent code be recovered? State the information-theoretic condition.
(b) Generate , compute , and observe with . Attempt recovery by: (i) optimizing to minimize , (ii) using standard GAMP on the full scene (ignores structure).
(c) Compare NMSE for both methods. Explain the difference.
(d) What happens when ? When ?
For (a): has degrees of freedom; with measurements (well, each measurement constrains one linear function of , not of directly).
For (b)(i): use gradient descent with backpropagation through ReLU.
When , the latent code can be recovered because the effective problem dimension is , not .
Information-theoretic condition
Recovery of from measurements of requires roughly (matching degrees of freedom). For Gaussian prior on , this is sufficient with overwhelming probability when and are both i.i.d. Gaussian. For β wait, here , so recovery is impossible without additional structure.
Corrected setup with $M > K$
Correcting to : latent code recovery is feasible. Gradient descent on (latent space optimization) achieves NMSE dB; standard GAMP on the full -dimensional fails (NMSE dB) since is below the CS phase transition for non-sparse .
ex-ch19-13
Medium(GAMP vs VAMP for physical sensing matrices)
A practical RF imaging system uses a MIMO sensing matrix (Kronecker product, not i.i.d. Gaussian).
(a) Explain why standard GAMP state evolution does not apply to this matrix.
(b) Generate a Kronecker sensing matrix. Run (i) AMP, (ii) VAMP on the same sparse recovery problem (, SNR = 20 dB).
(c) Plot NMSE vs. iteration for AMP and VAMP. Which converges faster and why?
(d) For this matrix, apply EM-GAMP (using VAMP as inner solver). Does the EM M-step still apply in the same form?
(e) What is the computational saving of using Kronecker VAMP (FFT-based) vs. dense SVD-based VAMP for , , ?
GAMP SE assumes i.i.d. Gaussian ; Kronecker matrices have correlated entries and structured singular values.
VAMP corrects for the structured matrix by applying the exact LMMSE estimator at each step (using the SVD of ).
For (e): dense SVD costs once; Kronecker FFT costs per iteration.
Why AMP fails for Kronecker matrices
AMP's Onsager correction assumes the random matrix concentration property: . For Kronecker matrices, the singular values are products , which are concentrated near zero for large dimensions β violating the concentration assumption.
M-step validity
The EM M-step updates for depend only on the posterior statistics , which are correctly computed by VAMP regardless of the matrix structure. The M-step is therefore identical for GAMP and VAMP inner solvers.
ex-ch19-14
Hard(EM for Gaussian mixture prior)
The Gaussian mixture (GM) prior generalizes BG:
(a) Derive the EM M-step for the mixture weights , means , and variances .
(b) Implement EM-GM-GAMP with components for a sparse signal with non-zero entries drawn from a Gaussian mixture (two Gaussians at plus a small cluster near zero).
(c) Compare with EM-BG-GAMP (Bernoulli-Gaussian, effectively). Which achieves lower NMSE when the prior is correctly specified?
(d) What happens when EM-BG-GAMP is applied to data from a GM prior (prior mismatch)? Quantify the NMSE penalty.
The GM E-step computes the posterior responsibility : the probability that was drawn from component .
M-step for : weighted mean of with weights .
M-step for : weighted variance of minus .
GM posterior responsibility
Given a Gaussian cavity , the posterior responsibility for component is:
M-step updates
, ,
Prior mismatch penalty
For the Gaussian mixture signal: EM-GM-GAMP achieves NMSE dB; EM-BG-GAMP (mismatch) achieves dB β a 4 dB penalty from imposing the wrong prior shape on the nonzero coefficients.
ex-ch19-15
Challenge(Complete self-tuning RF imaging system)
Design and implement a complete self-tuning Bayesian RF imaging system combining all three elements from this chapter.
System specification:
- MIMO radar: , , frequencies.
- Scene: grid ( voxels), BG prior.
- Physical sensing matrix (Kronecker structure).
(a) Implement EM-VAMP (VAMP as inner solver, EM outer loop) to learn automatically.
(b) Extend to 1-bit receivers: replace the Gaussian likelihood with the probit output channel. How does the EM M-step for change (hint: is now the dither variance)?
(c) Test at SNR dB with both full-precision and 1-bit receivers. Plot NMSE vs. SNR for: EM-VAMP (full), EM-VAMP (1-bit), oracle VAMP (full), oracle VAMP (1-bit).
(d) For the 1-bit case, does EM converge to the correct (dither variance)? Or does EM learn a different effective noise level?
(e) Quantify the information loss from 1-bit quantization as a function of SNR: plot the "measurement overhead" (extra needed to match full-precision NMSE) vs. SNR.
For (b): the probit model has no explicit in the likelihood (it is baked into the dither); the EM update for should be disabled or replaced with dither variance estimation.
For (d): EM will learn an effective noise level proportional to (related to the Fisher information of the sign measurement), not the true dither variance.
For (e): the overhead curve has a minimum of at high SNR and grows to at SNR = 5 dB.
EM-VAMP framework
The EM outer loop is identical to EM-GAMP, with VAMP replacing the inner solver. The M-step receives from VAMP and applies the same closed-form updates. The Kronecker structure of reduces the VAMP linear step from to .
1-bit EM adaptation
For probit likelihood, the dither variance controls the output channel steepness. EM can be used to optimize via:
This has no closed form but can be solved by 1D line search. Alternatively, fix to the known dither level and use EM only for prior parameters.
Measurement overhead result
At SNR = 20 dB: overhead (need vs. for full-precision). At SNR = 10 dB: overhead . At SNR = 5 dB: overhead (1-bit becomes highly inefficient at low SNR).