Exercises

ex-ch03-01

Easy

Consider the scalar model y=aγ+wy = a\gamma + w with known a>0a > 0, wN(0,σ2)w \sim \mathcal{N}(0, \sigma^2), and prior γN(0,γ02)\gamma \sim \mathcal{N}(0, \gamma_0^2).

(a) Write the likelihood p(yγ)p(y \mid \gamma) and prior π(γ)\pi(\gamma).

(b) Derive the posterior p(γy)p(\gamma \mid y) by completing the square in γ\gamma. Verify it is Gaussian and identify the posterior mean γ^post\hat{\gamma}_{\text{post}} and variance σpost2\sigma_{\text{post}}^2.

(c) Show that γ^post=aγ02a2γ02+σ2y\hat{\gamma}_{\text{post}} = \frac{a\gamma_0^2}{a^2\gamma_0^2 + \sigma^2} y and interpret the shrinkage factor as a function of the SNR a2γ02/σ2a^2\gamma_0^2/\sigma^2.

ex-ch03-02

Easy

For the forward model y=Aγ+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w} with wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}) and prior γN(0,γ02I)\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \gamma_0^2\mathbf{I}):

(a) Show that the MAP estimate satisfies γ^MAP=(AHA+λI)1AHy\hat{\boldsymbol{\gamma}}_{\text{MAP}} = (\mathbf{A}^H\mathbf{A} + \lambda\mathbf{I})^{-1}\mathbf{A}^H\mathbf{y} with λ=σ2/γ02\lambda = \sigma^2/\gamma_0^2.

(b) Express γ^MAP\hat{\boldsymbol{\gamma}}_{\text{MAP}} in the SVD basis of A=UΣVH\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H and show it equals the Tikhonov spectral filter from §Spectral Regularization Methods.

(c) What happens to γ^MAP\hat{\boldsymbol{\gamma}}_{\text{MAP}} as γ02\gamma_0^2 \to \infty (uninformative prior)? As γ020\gamma_0^2 \to 0 (very informative prior)?

ex-ch03-03

Easy

Let ARm×n\mathbf{A} \in \mathbb{R}^{m \times n} with m<nm < n and prior γN(0,γ02I)\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \gamma_0^2\mathbf{I}).

(a) Compute Γpost\mathbf{\Gamma}_{\text{post}} in terms of the SVD of A\mathbf{A}.

(b) Show that for null-space directions (σk=0\sigma_k = 0), [Γpost]kk=γ02[\mathbf{\Gamma}_{\text{post}}]_{kk} = \gamma_0^2.

(c) For σkσ/γ0\sigma_k \gg \sigma/\gamma_0, show that [Γpost]kkσ2/σk2[\mathbf{\Gamma}_{\text{post}}]_{kk} \approx \sigma^2/\sigma_k^2. Interpret: data reduces uncertainty only in the range space of A\mathbf{A}.

ex-ch03-04

Medium

For a scalar observation y=γ+wy = \gamma + w, wN(0,σ2)w \sim \mathcal{N}(0, \sigma^2), with Laplace prior π(γ)=λ2exp(λγ)\pi(\gamma) = \frac{\lambda}{2}\exp(-\lambda|\gamma|):

(a) Show that the MAP estimate is the soft-thresholding operator γ^MAP=Sλσ2(y)\hat{\gamma}_{\text{MAP}} = \mathcal{S}_{\lambda\sigma^2}(y).

(b) Show that the MMSE estimate γ^MMSE=E[γy]\hat{\gamma}_{\text{MMSE}} = \mathbb{E}[\gamma \mid y] does not produce exact zeros. Compute it numerically for σ=1\sigma = 1, λ=1\lambda = 1, and y{0.1,0.5,1.0,3.0}y \in \{0.1, 0.5, 1.0, 3.0\}.

(c) Explain when MAP and MMSE estimates differ significantly and which is preferable for sparse signal recovery.

ex-ch03-05

Medium

For y=Aγ+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w} with wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}) and γN(0,α1I)\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \alpha^{-1}\mathbf{I}):

(a) Show that yN(0,σ2I+α1AAH)\mathbf{y} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I} + \alpha^{-1}\mathbf{A}\mathbf{A}^H).

(b) Derive the log-evidence logZ(yα)\log\mathcal{Z}(\mathbf{y} \mid \alpha) in terms of the singular values {σk}\{\sigma_k\} of A\mathbf{A}.

(c) Differentiate the log-evidence with respect to α\alpha and show that α^\hat{\alpha} satisfies the implicit equation α^1=μ2/(nα^tr(Σ))\hat{\alpha}^{-1} = \|\boldsymbol{\mu}\|^2 / (n - \hat{\alpha}\operatorname{tr}(\mathbf{\Sigma})) where μ\boldsymbol{\mu}, Σ\mathbf{\Sigma} are the posterior mean and covariance.

ex-ch03-06

Medium

Consider the hierarchical model (SBL): y=Aγ+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}, γiαiN(0,αi1)\gamma_i \mid \alpha_i \sim \mathcal{N}(0, \alpha_i^{-1}), wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}).

(a) Derive the E-step of the EM algorithm: the posterior mean μ\boldsymbol{\mu} and covariance Σ\mathbf{\Sigma} given the current α\boldsymbol{\alpha}.

(b) Derive the M-step update formula αinew=(1αi[Σ]ii)/μi2\alpha_i^{\text{new}} = (1 - \alpha_i[\mathbf{\Sigma}]_{ii})/\mu_i^2.

(c) Show that if μi2αi[Σ]ii\mu_i^2 \leq \alpha_i[\mathbf{\Sigma}]_{ii}, then αinew0\alpha_i^{\text{new}} \leq 0, meaning the component should be pruned (αi\alpha_i \to \infty).

ex-ch03-07

Medium

For the scalar horseshoe model: yγN(γ,1)y \mid \gamma \sim \mathcal{N}(\gamma, 1), γλN(0,λ2τ2)\gamma \mid \lambda \sim \mathcal{N}(0, \lambda^2\tau^2), λHalf-Cauchy(0,1)\lambda \sim \text{Half-Cauchy}(0, 1).

(a) Show that the conditional posterior mean (given λ\lambda) is E[γy,λ]=(1κ)y\mathbb{E}[\gamma \mid y, \lambda] = (1 - \kappa)y where κ=1/(1+λ2τ2)\kappa = 1/(1 + \lambda^2\tau^2).

(b) Show that the shrinkage coefficient κ\kappa has a prior density with poles at κ=0\kappa = 0 and κ=1\kappa = 1 — the "horseshoe" shape.

(c) Numerically compute the marginal posterior mean E[γy]\mathbb{E}[\gamma \mid y] by integrating over λ\lambda (use quadrature with τ=1\tau = 1), and compare with the Laplace soft-threshold for y{0.1,0.5,1,2,4}y \in \{0.1, 0.5, 1, 2, 4\}.

ex-ch03-08

Hard

Consider the Whittle-Matérn covariance operator on [0,1][0, 1]: C0=(κ2Ixx)s\mathcal{C}_0 = (\kappa^2 I - \partial_{xx})^{-s} with periodic boundary conditions.

(a) Compute the eigenvalues λk\lambda_k of C0\mathcal{C}_0 in terms of kk, κ\kappa, ss.

(b) Show that C0\mathcal{C}_0 is trace class if and only if 2s>12s > 1 (in 1D).

(c) Identify the Cameron-Martin space H=Range(C01/2)\mathcal{H} = \operatorname{Range}(\mathcal{C}_0^{1/2}) as a Sobolev space and compute the Cameron-Martin norm hH\|h\|_{\mathcal{H}}.

(d) For s=1s = 1 (exponential covariance) and s=2s = 2, generate 3 sample draws from μ0\mu_0 (via the KL truncated to the first 100 terms) and describe the smoothness difference.

ex-ch03-09

Hard

Implement the pCN sampler for a 1D deblurring problem: y=Aγ+w\mathbf{y} = \mathbf{A} * \boldsymbol{\gamma} + \mathbf{w} where A\mathbf{A} is a Gaussian convolution kernel of width w=5w = 5 pixels, wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}) with σ=0.05\sigma = 0.05, and the prior is μ0=N(0,C0)\mu_0 = \mathcal{N}(\mathbf{0}, \mathcal{C}_0) with Matérn-3/2 covariance (s=3/2s = 3/2, κ=5\kappa = 5, n=64n = 64 pixels).

(a) Implement pCN with prior samples drawn via the truncated KL expansion (first 50 modes).

(b) Run for T=5000T = 5000 iterations with β{0.05,0.2,0.5,0.8}\beta \in \{0.05, 0.2, 0.5, 0.8\}. Plot acceptance rate vs β\beta and identify the near-optimal step size.

(c) Compute the posterior mean and 95%95\% credible bands from samples after discarding the first 1000 as burn-in. Verify that the true signal is within the bands.

ex-ch03-10

Hard

Compare HMC and random-walk MH on Gaussian posteriors of increasing dimension n{10,50,200,500}n \in \{10, 50, 200, 500\}.

(a) For each nn, construct a Gaussian posterior N(0,Γpost)\mathcal{N}(\mathbf{0}, \mathbf{\Gamma}_{\text{post}}) with Γpost=I\mathbf{\Gamma}_{\text{post}} = \mathbf{I} (isotropic for simplicity). Run both samplers for 5000 samples and compute the effective sample size (ESS) per gradient evaluation.

(b) Plot ESS/evaluation vs nn on a log-log scale. Verify theoretical scaling: random-walk MH n1\sim n^{-1}, HMC n1/4\sim n^{-1/4}.

(c) Tune the random-walk MH step size to δ=2.38/n\delta = 2.38/\sqrt{n} (optimal for isotropic Gaussian) and the HMC leapfrog step to achieve 0.65\approx 0.65 acceptance. Comment on the practical difficulty of tuning each algorithm.

ex-optimal-design

Challenge

Consider a linear imaging system y=Aγ+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w} where A\mathbf{A} consists of mm rows selected from a 2N×N2N \times N DFT matrix (1D Fourier sampling), wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}), and prior γN(0,γ02I)\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \gamma_0^2\mathbf{I}).

(a) For a given selection of mm Fourier frequencies, derive the posterior variance [Γpost]ii[\mathbf{\Gamma}_{\text{post}}]_{ii} as a function of the selected rows.

(b) Formulate the A-optimal design problem: select mm frequencies to minimize tr(Γpost)\operatorname{tr}(\mathbf{\Gamma}_{\text{post}}) (average posterior variance).

(c) Implement a greedy algorithm: at each step, add the frequency that maximally reduces tr(Γpost)\operatorname{tr}(\mathbf{\Gamma}_{\text{post}}). Compare with random frequency selection for a Shepp-Logan phantom (N=64N = 64).

(d) Plot the posterior standard deviation maps for greedy-optimal vs random frequency selection. Quantify the reduction in average uncertainty.

ex-ch03-12

Challenge

Build a complete Bayesian sparse radar imaging pipeline for a simulated 2D scene with 3 point targets:

(a) Forward model: Simulate measurements y=Aγ+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w} with the Bernoulli-Gaussian prior (3 targets of random reflectivity, w=3/642w = 3/64^2) and A\mathbf{A} a random Gaussian matrix (m=200m = 200, n=642n = 64^2, normalized columns).

(b) SBL reconstruction: Run Algorithm (EM for SBL) for 50 iterations. Plot the convergence of logα\log\boldsymbol{\alpha} and the pruning events.

(c) UQ: Report the posterior mean and 95%95\% credible intervals for the 3 detected pixels. Compare the posterior standard deviations with the Laplace approximation.

(d) LASSO comparison: Run LASSO with λ\lambda selected by the discrepancy principle (§Parameter Choice Rules). Compare NMSE and detection rate with SBL over 20 Monte Carlo trials.

ex-ch03-13

Medium

The posterior variance map [Γpost]ii[\mathbf{\Gamma}_{\text{post}}]_{ii} can be interpreted as a diagnostic for the quality of the sensing geometry.

(a) For a circular aperture imaging system (transmitters and receivers on a circle of radius RR, scene in the center), compute AHA\mathbf{A}^H\mathbf{A} analytically and show it is approximately circulant.

(b) For a Gaussian prior, show that the posterior variance is approximately constant across the scene (translation-invariant UQ). When does this approximation break down?

(c) For a linear aperture system (transmitters and receivers on a line), show that the posterior variance is much higher at the edges of the scene than at the center, reflecting the non-uniform k-space coverage.

ex-ch03-14

Easy

Verify calibration of the Gaussian posterior credible intervals.

(a) Generate 500 trials: for each, draw γN(0,I)\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}), simulate y=Aγ+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w} with AR10×20\mathbf{A} \in \mathbb{R}^{10 \times 20} and wN(0,0.1I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, 0.1\mathbf{I}), and compute the 95%95\% credible interval for each pixel.

(b) Count the fraction of trials where the true γi\gamma_i^\dagger lies within its credible interval. Verify this is close to 95%95\%.

(c) Repeat with a misspecified prior (γN(0,4I)\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, 4\mathbf{I}) but prior used in inference is N(0,I)\mathcal{N}(\mathbf{0}, \mathbf{I})). Show that the credible intervals are overconfident (coverage <95%< 95\%).

ex-ch03-15

Medium

Prove the following property of the Bayesian posterior: for any unbiased estimator γ^(y)\hat{\boldsymbol{\gamma}}(\mathbf{y}) of γ\boldsymbol{\gamma} under the model (p(yγ),π(γ))(p(\mathbf{y} \mid \boldsymbol{\gamma}), \pi(\boldsymbol{\gamma})), the MMSE estimate γ^MMSE=E[γy]\hat{\boldsymbol{\gamma}}_{\text{MMSE}} = \mathbb{E}[\boldsymbol{\gamma} \mid \mathbf{y}] minimizes the Bayes risk R(γ^)=E[γγ^2]\mathcal{R}(\hat{\boldsymbol{\gamma}}) = \mathbb{E}[\|\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}\|^2] over all estimators (not just unbiased ones).

(a) Write the Bayes risk as an expectation over both y\mathbf{y} and γ\boldsymbol{\gamma}.

(b) Use the law of iterated expectation to reduce to minimizing the posterior expected loss.

(c) Show that for each fixed y\mathbf{y}, the minimizer of E[γγ^2y]\mathbb{E}[\|\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}\|^2 \mid \mathbf{y}] over γ^\hat{\boldsymbol{\gamma}} is the posterior mean.