Exercises
ex-ch03-01
EasyConsider the scalar model with known , , and prior .
(a) Write the likelihood and prior .
(b) Derive the posterior by completing the square in . Verify it is Gaussian and identify the posterior mean and variance .
(c) Show that and interpret the shrinkage factor as a function of the SNR .
The exponent of the unnormalized posterior is . Expand and collect terms quadratic and linear in .
The posterior variance satisfies .
Likelihood and prior
, .
Completing the square
.
This is with and .
Simplification and SNR interpretation
Substituting: where .
As SNR : (ML estimate). As SNR : (prior mean).
ex-ch03-02
EasyFor the forward model with and prior :
(a) Show that the MAP estimate satisfies with .
(b) Express in the SVD basis of and show it equals the Tikhonov spectral filter from §Spectral Regularization Methods.
(c) What happens to as (uninformative prior)? As (very informative prior)?
The Tikhonov filter factors are .
In the SVD basis, has eigenvalues .
MAP optimization
. Setting gradient to zero: , giving with .
SVD basis
Let . Then in the SVD basis: , with filter factors — exactly Tikhonov spectral regularization.
Limiting cases
As : , — the MAP approaches the pseudoinverse (no regularization). As : , — the MAP approaches (prior completely dominates).
ex-ch03-03
EasyLet with and prior .
(a) Compute in terms of the SVD of .
(b) Show that for null-space directions (), .
(c) For , show that . Interpret: data reduces uncertainty only in the range space of .
In the SVD basis, is diagonal with entries .
Posterior covariance in SVD basis
. In the SVD basis of , with diagonal entries . Therefore .
Null space
For null-space directions: , so . The posterior variance equals the prior variance — data provides no information about null-space components.
Data-dominated regime
For : . Uncertainty in mode is determined by the noise level relative to the singular value .
ex-ch03-04
MediumFor a scalar observation , , with Laplace prior :
(a) Show that the MAP estimate is the soft-thresholding operator .
(b) Show that the MMSE estimate does not produce exact zeros. Compute it numerically for , , and .
(c) Explain when MAP and MMSE estimates differ significantly and which is preferable for sparse signal recovery.
The MAP estimate minimizes . Differentiate separately for and .
The posterior under a Laplace prior is a mixture of two truncated Gaussians on and .
MAP via subgradient
. Subgradient optimality: . For : iff . For : iff . For : satisfies the inclusion. Therefore .
MMSE (posterior mean)
The unnormalized posterior is . Splitting on and : where . For : , , , .
Comparison
MAP and MMSE agree for large (both approximately for ). They differ near the threshold: MAP gives exact zeros for , MMSE gives small nonzero values. For support identification, MAP is preferred (exact sparsity). For prediction/reconstruction accuracy (MSE), MMSE is optimal by construction.
ex-ch03-05
MediumFor with and :
(a) Show that .
(b) Derive the log-evidence in terms of the singular values of .
(c) Differentiate the log-evidence with respect to and show that satisfies the implicit equation where , are the posterior mean and covariance.
For linear Gaussian models: if and independently, then .
Use the matrix determinant lemma: .
Marginal distribution
and are independent Gaussians, so has covariance . The marginal is with .
Log-evidence
. Using the determinant lemma: .
Differentiating and implicit equation
Differentiating w.r.t. and setting to zero yields the MacKay update: where is the effective number of "well-determined" parameters (the "gamma number" in the original evidence approximation).
ex-ch03-06
MediumConsider the hierarchical model (SBL): , , .
(a) Derive the E-step of the EM algorithm: the posterior mean and covariance given the current .
(b) Derive the M-step update formula .
(c) Show that if , then , meaning the component should be pruned ().
The E-step follows from Theorem (Gaussian Prior-Posterior) with .
The M-step maximizes over .
E-step
From Theorem (Gaussian Prior-Posterior): where , and .
M-step derivation
. Differentiating w.r.t. and setting to zero: , giving . Rearranging using ... yields the stated formula.
Pruning condition
. Note (fraction of posterior variance from prior). If then (grows). If additionally , then , which is infeasible — the EM update drives (prune).
ex-ch03-07
MediumFor the scalar horseshoe model: , , .
(a) Show that the conditional posterior mean (given ) is where .
(b) Show that the shrinkage coefficient has a prior density with poles at and — the "horseshoe" shape.
(c) Numerically compute the marginal posterior mean by integrating over (use quadrature with ), and compare with the Laplace soft-threshold for .
Given , and , so from the Gaussian conjugate formula.
implies . Change variables from to .
Conditional posterior mean
With and , the conjugate formula gives: posterior variance = , posterior mean .
Prior on $\kappa$
has density for . Change variables: , , . — poles at and .
Numerical comparison
. For : horseshoe (aggressive shrinkage), Laplace MAP (threshold ). For : horseshoe (minimal shrinkage), Laplace MAP (still shrunk).
ex-ch03-08
HardConsider the Whittle-Matérn covariance operator on : with periodic boundary conditions.
(a) Compute the eigenvalues of in terms of , , .
(b) Show that is trace class if and only if (in 1D).
(c) Identify the Cameron-Martin space as a Sobolev space and compute the Cameron-Martin norm .
(d) For (exponential covariance) and , generate 3 sample draws from (via the KL truncated to the first 100 terms) and describe the smoothness difference.
The eigenfunctions of with periodic BC are with eigenvalue .
Trace class requires . Count the decay rate.
Eigenvalues
The operator on with periodic BC has eigenfunctions with eigenvalues , . Therefore , so .
Trace class condition
. For large , . The series converges iff , which requires . Hence trace class iff .
Cameron-Martin space
consists of functions with . This is the Sobolev space . The Cameron-Martin norm is the seminorm weighted by .
ex-ch03-09
HardImplement the pCN sampler for a 1D deblurring problem: where is a Gaussian convolution kernel of width pixels, with , and the prior is with Matérn-3/2 covariance (, , pixels).
(a) Implement pCN with prior samples drawn via the truncated KL expansion (first 50 modes).
(b) Run for iterations with . Plot acceptance rate vs and identify the near-optimal step size.
(c) Compute the posterior mean and credible bands from samples after discarding the first 1000 as burn-in. Verify that the true signal is within the bands.
Generate prior samples in the frequency domain: , , then IFFT.
The potential can be computed via FFT convolution in .
Setup
Matérn-3/2 eigenvalues: (discrete Fourier modes). Prior sample: draw for , compute , pad with zeros for , IFFT to get .
pCN loop
At each step: propose . Compute . Accept if , . Near-optimal gives acceptance rate .
Posterior statistics
From samples : , . Credible band: . With correct and sufficient samples, of true values should be covered.
ex-ch03-10
HardCompare HMC and random-walk MH on Gaussian posteriors of increasing dimension .
(a) For each , construct a Gaussian posterior with (isotropic for simplicity). Run both samplers for 5000 samples and compute the effective sample size (ESS) per gradient evaluation.
(b) Plot ESS/evaluation vs on a log-log scale. Verify theoretical scaling: random-walk MH , HMC .
(c) Tune the random-walk MH step size to (optimal for isotropic Gaussian) and the HMC leapfrog step to achieve acceptance. Comment on the practical difficulty of tuning each algorithm.
ESS = where is the estimated lag- autocorrelation. Use arviz.ess() or implement directly.
For HMC on : the negative log-posterior is , so .
Random-walk MH
Proposal: , . Acceptance: . With : acceptance , ESS — scales as .
HMC
Introduce momentum . Leapfrog with steps, step size tuned for 65% acceptance. ESS — confirmed numerically. Each HMC proposal costs gradient evaluations.
Discussion
For : random-walk MH has ESS/eval while HMC has ESS/eval . HMC is more efficient. The tuning difficulty: MH requires only ; HMC requires and (mitigated by NUTS which auto-tunes ).
ex-optimal-design
ChallengeConsider a linear imaging system where consists of rows selected from a DFT matrix (1D Fourier sampling), , and prior .
(a) For a given selection of Fourier frequencies, derive the posterior variance as a function of the selected rows.
(b) Formulate the A-optimal design problem: select frequencies to minimize (average posterior variance).
(c) Implement a greedy algorithm: at each step, add the frequency that maximally reduces . Compare with random frequency selection for a Shepp-Logan phantom ().
(d) Plot the posterior standard deviation maps for greedy-optimal vs random frequency selection. Quantify the reduction in average uncertainty.
For DFT rows, where is the -th DFT row.
The rank-1 update formula: enables efficient greedy updates.
Posterior variance
. For DFT rows, has a circulant structure exploitable via FFT.
A-optimal formulation
. This is NP-hard in general; the greedy algorithm has a guarantee for the related D-optimal criterion.
Greedy algorithm and comparison
Greedy: initialize ; at step , add frequency via rank-1 update formula. For , , greedy achieves that of random selection — confirming the practical value of optimal design.
ex-ch03-12
ChallengeBuild a complete Bayesian sparse radar imaging pipeline for a simulated 2D scene with 3 point targets:
(a) Forward model: Simulate measurements with the Bernoulli-Gaussian prior (3 targets of random reflectivity, ) and a random Gaussian matrix (, , normalized columns).
(b) SBL reconstruction: Run Algorithm (EM for SBL) for 50 iterations. Plot the convergence of and the pruning events.
(c) UQ: Report the posterior mean and credible intervals for the 3 detected pixels. Compare the posterior standard deviations with the Laplace approximation.
(d) LASSO comparison: Run LASSO with selected by the discrepancy principle (§Parameter Choice Rules). Compare NMSE and detection rate with SBL over 20 Monte Carlo trials.
For SBL with : the dominant cost is inverting the posterior covariance. Use the Woodbury identity to work with the matrix: .
Monte Carlo: for each trial, re-generate the target locations, reflectivities, and noise. Run both algorithms, record whether targets are detected (posterior mean exceeds noise floor).
Forward model setup
with i.i.d. entries. Place 3 targets at random positions with , angle uniform on .
SBL with Woodbury
Use Woodbury: . Cost per EM iteration: — much cheaper than for .
Comparison
Typical results (SNR 10 dB): SBL NMSE dB, detection rate at PFA . LASSO NMSE dB, detection rate at same PFA. SBL outperforms due to automatic regularization and sparse-promoting hyperparameter updates.
ex-ch03-13
MediumThe posterior variance map can be interpreted as a diagnostic for the quality of the sensing geometry.
(a) For a circular aperture imaging system (transmitters and receivers on a circle of radius , scene in the center), compute analytically and show it is approximately circulant.
(b) For a Gaussian prior, show that the posterior variance is approximately constant across the scene (translation-invariant UQ). When does this approximation break down?
(c) For a linear aperture system (transmitters and receivers on a line), show that the posterior variance is much higher at the edges of the scene than at the center, reflecting the non-uniform k-space coverage.
For a circular aperture with dense Tx/Rx spacing, depends only on the displacement (stationarity).
Non-uniform coverage: a linear aperture samples the k-space along a limited angular range, leaving large k-space regions unmeasured at the scene edges.
Circular aperture
For a uniformly sampled circular aperture with Tx-Rx pairs evenly spaced in angle, the PSF (point spread function) is a function only of (in the 2D sense) — approximately shift-invariant. The posterior variance is approximately constant: for all near the center.
Linear aperture
Linear aperture: k-space coverage is limited to a sector, leading to non-uniform resolution. Pixels at scene edges require large-angle components of , which are undersupported. grows toward the edges, revealing reduced resolution and increased uncertainty. This motivates MIMO configurations ([?ch33:s01]) that provide more uniform k-space coverage.
ex-ch03-14
EasyVerify calibration of the Gaussian posterior credible intervals.
(a) Generate 500 trials: for each, draw , simulate with and , and compute the credible interval for each pixel.
(b) Count the fraction of trials where the true lies within its credible interval. Verify this is close to .
(c) Repeat with a misspecified prior ( but prior used in inference is ). Show that the credible intervals are overconfident (coverage ).
Credible interval for pixel : where .
Correctly specified model
For the correctly specified model, by Bayes optimality the credible intervals are calibrated: empirical coverage should be (with Monte Carlo error for 500 trials and 20 pixels coverage events).
Misspecified model
With but prior : the true prior variance is larger than assumed. The posterior covariance is too small (overconfident), so credible intervals are too narrow. Empirical coverage instead of .
ex-ch03-15
MediumProve the following property of the Bayesian posterior: for any unbiased estimator of under the model , the MMSE estimate minimizes the Bayes risk over all estimators (not just unbiased ones).
(a) Write the Bayes risk as an expectation over both and .
(b) Use the law of iterated expectation to reduce to minimizing the posterior expected loss.
(c) Show that for each fixed , the minimizer of over is the posterior mean.
.
Minimize over constant : differentiate with respect to .
Bayes risk decomposition
. Minimizing over all estimators is equivalent to minimizing the inner expectation pointwise for each .
Pointwise minimization
Fix and minimize over . gives . The Hessian , confirming this is a global minimum.