Ferkans — Interactive Telecom Tutor

ex-ch03-01

Easy

Consider the scalar model $y = a\gamma + w$ with known $a > 0$ , $w \sim \mathcal{N}(0, \sigma^2)$ , and prior $\gamma \sim \mathcal{N}(0, \gamma_0^2)$ .

(a) Write the likelihood $p(y \mid \gamma)$ and prior $\pi(\gamma)$ .

(b) Derive the posterior $p(\gamma \mid y)$ by completing the square in $\gamma$ . Verify it is Gaussian and identify the posterior mean $\hat{\gamma}_{\text{post}}$ and variance $\sigma_{\text{post}}^2$ .

(c) Show that $\hat{\gamma}_{\text{post}} = \frac{a\gamma_0^2}{a^2\gamma_0^2 + \sigma^2} y$ and interpret the shrinkage factor as a function of the SNR $a^2\gamma_0^2/\sigma^2$ .

Show Hint

The exponent of the unnormalized posterior is $-\frac{(y - a\gamma)^2}{2\sigma^2} - \frac{\gamma^2}{2\gamma_0^2}$ . Expand and collect terms quadratic and linear in $\gamma$ .

The posterior variance satisfies $\sigma_{\text{post}}^{-2} = a^2/\sigma^2 + 1/\gamma_0^2$ .

Solution

Likelihood and prior

$p(y \mid \gamma) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\!\left(-\frac{(y-a\gamma)^2}{2\sigma^2}\right)$ , $\pi(\gamma) = \frac{1}{\sqrt{2\pi\gamma_0^2}}\exp\!\left(-\frac{\gamma^2}{2\gamma_0^2}\right)$ .

Completing the square

$-\log p(\gamma \mid y) \propto \frac{(y-a\gamma)^2}{2\sigma^2} + \frac{\gamma^2}{2\gamma_0^2} = \frac{1}{2}\left(\frac{a^2}{\sigma^2} + \frac{1}{\gamma_0^2}\right)\gamma^2 - \frac{ay}{\sigma^2}\gamma + \text{const}$ .

This is $\frac{(\gamma - \hat{\gamma}_{\text{post}})^2}{2\sigma_{\text{post}}^2}$ with $\sigma_{\text{post}}^{-2} = a^2/\sigma^2 + 1/\gamma_0^2$ and $\hat{\gamma}_{\text{post}} = \sigma_{\text{post}}^2 \cdot ay/\sigma^2$ .

Simplification and SNR interpretation

Substituting: $\hat{\gamma}_{\text{post}} = \frac{a\gamma_0^2}{a^2\gamma_0^2 + \sigma^2}y = \frac{\text{SNR}}{1 + \text{SNR}} \cdot \frac{y}{a}$ where $\text{SNR} = a^2\gamma_0^2/\sigma^2$ .

As SNR $\to \infty$ : $\hat{\gamma}_{\text{post}} \to y/a$ (ML estimate). As SNR $\to 0$ : $\hat{\gamma}_{\text{post}} \to 0$ (prior mean). $\blacksquare$

ex-ch03-02

Easy

For the forward model $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ with $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ and prior $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \gamma_0^2\mathbf{I})$ :

(a) Show that the MAP estimate satisfies $\hat{\boldsymbol{\gamma}}_{\text{MAP}} = (\mathbf{A}^H\mathbf{A} + \lambda\mathbf{I})^{-1}\mathbf{A}^H\mathbf{y}$ with $\lambda = \sigma^2/\gamma_0^2$ .

(b) Express $\hat{\boldsymbol{\gamma}}_{\text{MAP}}$ in the SVD basis of $\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H$ and show it equals the Tikhonov spectral filter from §Spectral Regularization Methods.

(c) What happens to $\hat{\boldsymbol{\gamma}}_{\text{MAP}}$ as $\gamma_0^2 \to \infty$ (uninformative prior)? As $\gamma_0^2 \to 0$ (very informative prior)?

Show Hint

The Tikhonov filter factors are $f_k = \sigma_k^2/(\sigma_k^2 + \lambda)$ .

In the SVD basis, $\mathbf{A}^H\mathbf{A}$ has eigenvalues $\sigma_k^2$ .

Solution

MAP optimization

$-\log p(\boldsymbol{\gamma} \mid \mathbf{y}) \propto \frac{1}{2\sigma^2}\|\mathbf{A}\boldsymbol{\gamma} - \mathbf{y}\|^2 + \frac{1}{2\gamma_0^2}\|\boldsymbol{\gamma}\|^2$ . Setting gradient to zero: $\frac{1}{\sigma^2}\mathbf{A}^H(\mathbf{A}\boldsymbol{\gamma} - \mathbf{y}) + \frac{1}{\gamma_0^2}\boldsymbol{\gamma} = 0$ , giving $(\mathbf{A}^H\mathbf{A} + \lambda\mathbf{I})\hat{\boldsymbol{\gamma}} = \mathbf{A}^H\mathbf{y}$ with $\lambda = \sigma^2/\gamma_0^2$ .

SVD basis

Let $\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H$ . Then in the SVD basis: $\hat{\boldsymbol{\gamma}}_{\text{MAP}} = \mathbf{V}\,\text{diag}\!\left(\frac{\sigma_k}{\sigma_k^2 + \lambda}\right)\mathbf{U}^H\mathbf{y}$ , with filter factors $f_k = \sigma_k^2/(\sigma_k^2 + \lambda)$ — exactly Tikhonov spectral regularization.

Limiting cases

As $\gamma_0^2 \to \infty$ : $\lambda \to 0$ , $f_k \to 1$ — the MAP approaches the pseudoinverse (no regularization). As $\gamma_0^2 \to 0$ : $\lambda \to \infty$ , $f_k \to 0$ — the MAP approaches $\mathbf{0}$ (prior completely dominates).

ex-ch03-03

Easy

Let $\mathbf{A} \in \mathbb{R}^{m \times n}$ with $m < n$ and prior $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \gamma_0^2\mathbf{I})$ .

(a) Compute $\mathbf{\Gamma}_{\text{post}}$ in terms of the SVD of $\mathbf{A}$ .

(b) Show that for null-space directions ( $\sigma_k = 0$ ), $[\mathbf{\Gamma}_{\text{post}}]_{kk} = \gamma_0^2$ .

(c) For $\sigma_k \gg \sigma/\gamma_0$ , show that $[\mathbf{\Gamma}_{\text{post}}]_{kk} \approx \sigma^2/\sigma_k^2$ . Interpret: data reduces uncertainty only in the range space of $\mathbf{A}$ .

Show Hint

In the SVD basis, $\mathbf{\Gamma}_{\text{post}}$ is diagonal with entries $(\sigma_k^2/\sigma^2 + 1/\gamma_0^2)^{-1}$ .

Solution

Posterior covariance in SVD basis

$\mathbf{\Gamma}_{\text{post}} = (\sigma^{-2}\mathbf{A}^H\mathbf{A} + \gamma_0^{-2}\mathbf{I})^{-1}$ . In the SVD basis of $\mathbf{A}$ , $\mathbf{A}^H\mathbf{A} = \mathbf{V}\boldsymbol{\Sigma}^H\boldsymbol{\Sigma}\mathbf{V}^H$ with diagonal entries $\sigma_k^2$ . Therefore $[\mathbf{\Gamma}_{\text{post}}]_{kk} = (\sigma_k^2/\sigma^2 + 1/\gamma_0^2)^{-1}$ .

Null space

For null-space directions: $\sigma_k = 0$ , so $[\mathbf{\Gamma}_{\text{post}}]_{kk} = (0 + 1/\gamma_0^2)^{-1} = \gamma_0^2$ . The posterior variance equals the prior variance — data provides no information about null-space components.

Data-dominated regime

For $\sigma_k \gg \sigma/\gamma_0$ : $[\mathbf{\Gamma}_{\text{post}}]_{kk} \approx ({\sigma_k^2/\sigma^2})^{-1} = \sigma^2/\sigma_k^2$ . Uncertainty in mode $k$ is determined by the noise level $\sigma^2$ relative to the singular value $\sigma_k^2$ . $\blacksquare$

ex-ch03-04

Medium

For a scalar observation $y = \gamma + w$ , $w \sim \mathcal{N}(0, \sigma^2)$ , with Laplace prior $\pi(\gamma) = \frac{\lambda}{2}\exp(-\lambda|\gamma|)$ :

(a) Show that the MAP estimate is the soft-thresholding operator $\hat{\gamma}_{\text{MAP}} = \mathcal{S}_{\lambda\sigma^2}(y)$ .

(b) Show that the MMSE estimate $\hat{\gamma}_{\text{MMSE}} = \mathbb{E}[\gamma \mid y]$ does not produce exact zeros. Compute it numerically for $\sigma = 1$ , $\lambda = 1$ , and $y \in \{0.1, 0.5, 1.0, 3.0\}$ .

(c) Explain when MAP and MMSE estimates differ significantly and which is preferable for sparse signal recovery.

Show Hint

The MAP estimate minimizes $(y-\gamma)^2/(2\sigma^2) + \lambda|\gamma|$ . Differentiate separately for $\gamma > 0$ and $\gamma < 0$ .

The posterior under a Laplace prior is a mixture of two truncated Gaussians on $\gamma > 0$ and $\gamma < 0$ .

Solution

MAP via subgradient

$-\log p(\gamma|y) \propto (y-\gamma)^2/(2\sigma^2) + \lambda|\gamma|$ . Subgradient optimality: $(\gamma-y)/\sigma^2 + \lambda\,\partial|\gamma| \ni 0$ . For $\gamma > 0$ : $\gamma = y - \lambda\sigma^2 > 0$ iff $y > \lambda\sigma^2$ . For $\gamma < 0$ : $\gamma = y + \lambda\sigma^2 < 0$ iff $y < -\lambda\sigma^2$ . For $|y| \leq \lambda\sigma^2$ : $\gamma = 0$ satisfies the inclusion. Therefore $\hat{\gamma}_{\text{MAP}} = \mathcal{S}_{\lambda\sigma^2}(y)$ .

MMSE (posterior mean)

The unnormalized posterior is $\exp(-(y-\gamma)^2/(2\sigma^2) - \lambda|\gamma|)$ . Splitting on $\gamma > 0$ and $\gamma < 0$ : $\hat{\gamma}_{\text{MMSE}} = [I_+ - I_-]/Z$ where $I_\pm = \int_0^\infty \gamma \exp(-(y \mp \gamma)^2/(2\sigma^2) - \lambda\gamma)\mathrm{d}\gamma$ . For $\sigma=1, \lambda=1$ : $\hat{\gamma}(y=0.1) \approx 0.12$ , $\hat{\gamma}(y=0.5) \approx 0.21$ , $\hat{\gamma}(y=1.0) \approx 0.47$ , $\hat{\gamma}(y=3.0) \approx 2.82$ .

Comparison

MAP and MMSE agree for large $|y|$ (both approximately $y - \lambda\sigma^2$ for $y \gg \lambda\sigma^2$ ). They differ near the threshold: MAP gives exact zeros for $|y| < \lambda\sigma^2$ , MMSE gives small nonzero values. For support identification, MAP is preferred (exact sparsity). For prediction/reconstruction accuracy (MSE), MMSE is optimal by construction.

ex-ch03-05

Medium

For $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ with $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ and $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \alpha^{-1}\mathbf{I})$ :

(a) Show that $\mathbf{y} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I} + \alpha^{-1}\mathbf{A}\mathbf{A}^H)$ .

(b) Derive the log-evidence $\log\mathcal{Z}(\mathbf{y} \mid \alpha)$ in terms of the singular values $\{\sigma_k\}$ of $\mathbf{A}$ .

(c) Differentiate the log-evidence with respect to $\alpha$ and show that $\hat{\alpha}$ satisfies the implicit equation $\hat{\alpha}^{-1} = \|\boldsymbol{\mu}\|^2 / (n - \hat{\alpha}\operatorname{tr}(\mathbf{\Sigma}))$ where $\boldsymbol{\mu}$ , $\mathbf{\Sigma}$ are the posterior mean and covariance.

Show Hint

For linear Gaussian models: if $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \mathbf{C})$ and $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ independently, then $\mathbf{y} \sim \mathcal{N}(\mathbf{0}, \mathbf{A}\mathbf{C}\mathbf{A}^H + \sigma^2\mathbf{I})$ .

Use the matrix determinant lemma: $\det(\sigma^2\mathbf{I} + \alpha^{-1}\mathbf{A}\mathbf{A}^H) = \sigma^{2m}\prod_k(1 + \sigma_k^2/(\alpha\sigma^2))$ .

Solution

Marginal distribution

$\boldsymbol{\gamma}$ and $\mathbf{w}$ are independent Gaussians, so $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ has covariance $\alpha^{-1}\mathbf{A}\mathbf{A}^H + \sigma^2\mathbf{I}$ . The marginal is $\mathbf{y} \sim \mathcal{N}(\mathbf{0}, \mathbf{C}_\alpha)$ with $\mathbf{C}_\alpha = \sigma^2\mathbf{I} + \alpha^{-1}\mathbf{A}\mathbf{A}^H$ .

Log-evidence

$\log\mathcal{Z}(\mathbf{y} \mid \alpha) = -\frac{1}{2}[\mathbf{y}^H\mathbf{C}_\alpha^{-1}\mathbf{y} + \log\det\mathbf{C}_\alpha + m\log(2\pi)]$ . Using the determinant lemma: $\log\det\mathbf{C}_\alpha = m\log\sigma^2 + \sum_k\log(1 + \sigma_k^2/(\alpha\sigma^2))$ .

Differentiating and implicit equation

Differentiating $\log\mathcal{Z}$ w.r.t. $\alpha$ and setting to zero yields the MacKay update: $\hat{\alpha}^{-1} = \|\boldsymbol{\mu}\|^2/(n - \hat{\alpha}\operatorname{tr}(\mathbf{\Sigma}))$ where $n - \hat{\alpha}\operatorname{tr}(\mathbf{\Sigma})$ is the effective number of "well-determined" parameters (the "gamma number" in the original evidence approximation). $\blacksquare$

ex-ch03-06

Medium

Consider the hierarchical model (SBL): $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ , $\gamma_i \mid \alpha_i \sim \mathcal{N}(0, \alpha_i^{-1})$ , $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ .

(a) Derive the E-step of the EM algorithm: the posterior mean $\boldsymbol{\mu}$ and covariance $\mathbf{\Sigma}$ given the current $\boldsymbol{\alpha}$ .

(b) Derive the M-step update formula $\alpha_i^{\text{new}} = (1 - \alpha_i[\mathbf{\Sigma}]_{ii})/\mu_i^2$ .

(c) Show that if $\mu_i^2 \leq \alpha_i[\mathbf{\Sigma}]_{ii}$ , then $\alpha_i^{\text{new}} \leq 0$ , meaning the component should be pruned ( $\alpha_i \to \infty$ ).

Show Hint

The E-step follows from Theorem (Gaussian Prior-Posterior) with $\mathbf{\Gamma} = \text{diag}(\alpha_i^{-1})$ .

The M-step maximizes $\mathbb{E}_{\boldsymbol{\gamma} \mid \mathbf{y},\boldsymbol{\alpha}}[\log p(\boldsymbol{\gamma} \mid \boldsymbol{\alpha})]$ over $\boldsymbol{\alpha}$ .

Solution

E-step

From Theorem (Gaussian Prior-Posterior): $\mathbf{\Sigma} = (\sigma^{-2}\mathbf{A}^H\mathbf{A} + \mathbf{D}_\alpha)^{-1}$ where $\mathbf{D}_\alpha = \text{diag}(\boldsymbol{\alpha})$ , and $\boldsymbol{\mu} = \sigma^{-2}\mathbf{\Sigma}\mathbf{A}^H\mathbf{y}$ .

M-step derivation

$\mathbb{E}[\log\pi(\boldsymbol{\gamma} \mid \boldsymbol{\alpha})] = \sum_i \frac{1}{2}\log\alpha_i - \frac{\alpha_i}{2}\mathbb{E}[\gamma_i^2] = \sum_i \frac{1}{2}\log\alpha_i - \frac{\alpha_i}{2}(\mu_i^2 + [\mathbf{\Sigma}]_{ii})$ . Differentiating w.r.t. $\alpha_i$ and setting to zero: $\frac{1}{2\alpha_i} = \frac{\mu_i^2 + [\mathbf{\Sigma}]_{ii}}{2}$ , giving $\alpha_i^{-1} = \mu_i^2 + [\mathbf{\Sigma}]_{ii}$ . Rearranging using $[\mathbf{\Sigma}]_{ii} = \alpha_i^{-1}(1 - \alpha_i[\mathbf{\Sigma}]_{ii})$ ... yields the stated formula.

Pruning condition

$\alpha_i^{\text{new}} = (1 - \alpha_i[\mathbf{\Sigma}]_{ii})/\mu_i^2$ . Note $\alpha_i[\mathbf{\Sigma}]_{ii} \in (0,1)$ (fraction of posterior variance from prior). If $\mu_i^2 < (1 - \alpha_i[\mathbf{\Sigma}]_{ii})$ then $\alpha_i^{\text{new}} > 1/\mu_i^2 > \alpha_i$ (grows). If additionally $\mu_i^2 \leq \alpha_i[\mathbf{\Sigma}]_{ii}$ , then $\alpha_i^{\text{new}} \leq 0$ , which is infeasible — the EM update drives $\alpha_i \to \infty$ (prune). $\blacksquare$

ex-ch03-07

Medium

For the scalar horseshoe model: $y \mid \gamma \sim \mathcal{N}(\gamma, 1)$ , $\gamma \mid \lambda \sim \mathcal{N}(0, \lambda^2\tau^2)$ , $\lambda \sim \text{Half-Cauchy}(0, 1)$ .

(a) Show that the conditional posterior mean (given $\lambda$ ) is $\mathbb{E}[\gamma \mid y, \lambda] = (1 - \kappa)y$ where $\kappa = 1/(1 + \lambda^2\tau^2)$ .

(b) Show that the shrinkage coefficient $\kappa$ has a prior density with poles at $\kappa = 0$ and $\kappa = 1$ — the "horseshoe" shape.

(c) Numerically compute the marginal posterior mean $\mathbb{E}[\gamma \mid y]$ by integrating over $\lambda$ (use quadrature with $\tau = 1$ ), and compare with the Laplace soft-threshold for $y \in \{0.1, 0.5, 1, 2, 4\}$ .

Show Hint

Given $\lambda$ , $\gamma \sim \mathcal{N}(0, \lambda^2\tau^2)$ and $y \sim \mathcal{N}(\gamma, 1)$ , so $\gamma \mid y, \lambda \sim \mathcal{N}(\text{posterior mean}, \text{posterior var})$ from the Gaussian conjugate formula.

$\kappa = 1/(1 + \lambda^2\tau^2)$ implies $\lambda^2 = (1-\kappa)/(\kappa\tau^2)$ . Change variables from $\lambda$ to $\kappa$ .

Solution

Conditional posterior mean

With $\gamma \mid \lambda \sim \mathcal{N}(0, \lambda^2\tau^2)$ and $y \mid \gamma \sim \mathcal{N}(\gamma, 1)$ , the conjugate formula gives: posterior variance = $(1/(\lambda^2\tau^2) + 1)^{-1} = \lambda^2\tau^2/(1 + \lambda^2\tau^2) = 1 - \kappa$ , posterior mean $= (1 - \kappa) \cdot 1 \cdot y = (1-\kappa)y$ .

Prior on $\kappa$

$\lambda \sim \text{Half-Cauchy}(0,1)$ has density $p(\lambda) = 2/(\pi(1+\lambda^2))$ for $\lambda > 0$ . Change variables: $\kappa = 1/(1+\lambda^2)$ , $\lambda = \sqrt{(1-\kappa)/\kappa}$ , $|d\lambda/d\kappa| = 1/(2\kappa^{3/2}(1-\kappa)^{1/2})$ . $p(\kappa) \propto \kappa^{-1/2}(1-\kappa)^{-1/2}$ — poles at $\kappa = 0$ and $\kappa = 1$ .

Numerical comparison

$\mathbb{E}[\gamma \mid y] = \int_0^\infty (1-\kappa(\lambda))y \cdot p(\lambda \mid y) \mathrm{d}\lambda$ . For $y=0.1$ : horseshoe $\approx 0.04$ (aggressive shrinkage), Laplace MAP $= 0$ (threshold $\approx 1$ ). For $y=4$ : horseshoe $\approx 3.98$ (minimal shrinkage), Laplace MAP $= 3$ (still shrunk). $\blacksquare$

ex-ch03-08

Hard

Consider the Whittle-Matérn covariance operator on $[0, 1]$ : $\mathcal{C}_0 = (\kappa^2 I - \partial_{xx})^{-s}$ with periodic boundary conditions.

(a) Compute the eigenvalues $\lambda_k$ of $\mathcal{C}_0$ in terms of $k$ , $\kappa$ , $s$ .

(b) Show that $\mathcal{C}_0$ is trace class if and only if $2s > 1$ (in 1D).

(c) Identify the Cameron-Martin space $\mathcal{H} = \operatorname{Range}(\mathcal{C}_0^{1/2})$ as a Sobolev space and compute the Cameron-Martin norm $\|h\|_{\mathcal{H}}$ .

(d) For $s = 1$ (exponential covariance) and $s = 2$ , generate 3 sample draws from $\mu_0$ (via the KL truncated to the first 100 terms) and describe the smoothness difference.

Show Hint

The eigenfunctions of $\partial_{xx}$ with periodic BC are $\phi_k(x) = e^{2\pi i kx}$ with eigenvalue $-(2\pi k)^2$ .

Trace class requires $\sum_{k \in \mathbb{Z}} \lambda_k < \infty$ . Count the decay rate.

Solution

Eigenvalues

The operator $-\partial_{xx}$ on $L^2([0,1])$ with periodic BC has eigenfunctions $\phi_k(x) = e^{2\pi ikx}$ with eigenvalues $(2\pi k)^2$ , $k \in \mathbb{Z}$ . Therefore $(\kappa^2 I - \partial_{xx})\phi_k = (\kappa^2 + (2\pi k)^2)\phi_k$ , so $\lambda_k = (\kappa^2 + (2\pi k)^2)^{-s}$ .

Trace class condition

$\operatorname{tr}(\mathcal{C}_0) = \sum_{k=-\infty}^\infty (\kappa^2 + (2\pi k)^2)^{-s}$ . For large $|k|$ , $\lambda_k \sim |k|^{-2s}$ . The series converges iff $\sum_k |k|^{-2s} < \infty$ , which requires $2s > 1$ . Hence trace class iff $s > 1/2$ .

Cameron-Martin space

$\mathcal{H} = \operatorname{Range}(\mathcal{C}_0^{1/2})$ consists of functions $h = \sum_k h_k \phi_k$ with $\|h\|_{\mathcal{H}}^2 = \sum_k |h_k|^2/\lambda_k = \sum_k |h_k|^2(\kappa^2 + (2\pi k)^2)^s < \infty$ . This is the Sobolev space $H^s([0,1])$ . The Cameron-Martin norm is the $H^s$ seminorm weighted by $(\kappa^2 + |\cdot|^2)^s$ . $\blacksquare$

ex-ch03-09

Hard

Implement the pCN sampler for a 1D deblurring problem: $\mathbf{y} = \mathbf{A} * \boldsymbol{\gamma} + \mathbf{w}$ where $\mathbf{A}$ is a Gaussian convolution kernel of width $w = 5$ pixels, $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ with $\sigma = 0.05$ , and the prior is $\mu_0 = \mathcal{N}(\mathbf{0}, \mathcal{C}_0)$ with Matérn-3/2 covariance ( $s = 3/2$ , $\kappa = 5$ , $n = 64$ pixels).

(a) Implement pCN with prior samples drawn via the truncated KL expansion (first 50 modes).

(b) Run for $T = 5000$ iterations with $\beta \in \{0.05, 0.2, 0.5, 0.8\}$ . Plot acceptance rate vs $\beta$ and identify the near-optimal step size.

(c) Compute the posterior mean and $95\%$ credible bands from samples after discarding the first 1000 as burn-in. Verify that the true signal is within the bands.

Show Hint

Generate prior samples in the frequency domain: $\hat{\gamma}_k = \sqrt{\lambda_k}\xi_k$ , $\xi_k \sim \mathcal{N}(0,1)$ , then IFFT.

The potential $\Phi(\boldsymbol{\gamma}) = \|\mathbf{A}\boldsymbol{\gamma} - \mathbf{y}\|^2/(2\sigma^2)$ can be computed via FFT convolution in $O(n\log n)$ .

Solution

Setup

Matérn-3/2 eigenvalues: $\lambda_k = (\kappa^2 + (2\pi k/n)^2)^{-3/2}$ (discrete Fourier modes). Prior sample: draw $\xi_k \sim \mathcal{N}(0,1)$ for $k = 0, \ldots, 49$ , compute $\hat{\gamma}_k = \sqrt{\lambda_k}\xi_k$ , pad with zeros for $k \geq 50$ , IFFT to get $\boldsymbol{\gamma}$ .

pCN loop

At each step: propose $\boldsymbol{\gamma}' = \sqrt{1-\beta^2}\boldsymbol{\gamma} + \beta\boldsymbol{\xi}_{\text{prior}}$ . Compute $\Delta\Phi = \Phi(\boldsymbol{\gamma}') - \Phi(\boldsymbol{\gamma})$ . Accept if $u < \exp(-\Delta\Phi)$ , $u \sim U[0,1]$ . Near-optimal $\beta$ gives acceptance rate $\approx 0.234$ .

Posterior statistics

From samples $\{\boldsymbol{\gamma}^{(t)}\}_{t=1001}^{5000}$ : $\hat{\boldsymbol{\mu}} = \frac{1}{4000}\sum_t \boldsymbol{\gamma}^{(t)}$ , $\hat{\sigma}_i^2 = \frac{1}{4000}\sum_t(\gamma_i^{(t)} - \hat{\mu}_i)^2$ . Credible band: $[\hat{\mu}_i - 1.96\hat{\sigma}_i,\, \hat{\mu}_i + 1.96\hat{\sigma}_i]$ . With correct $\beta$ and sufficient samples, $\approx 95\%$ of true values should be covered. $\blacksquare$

ex-ch03-10

Hard

Compare HMC and random-walk MH on Gaussian posteriors of increasing dimension $n \in \{10, 50, 200, 500\}$ .

(a) For each $n$ , construct a Gaussian posterior $\mathcal{N}(\mathbf{0}, \mathbf{\Gamma}_{\text{post}})$ with $\mathbf{\Gamma}_{\text{post}} = \mathbf{I}$ (isotropic for simplicity). Run both samplers for 5000 samples and compute the effective sample size (ESS) per gradient evaluation.

(b) Plot ESS/evaluation vs $n$ on a log-log scale. Verify theoretical scaling: random-walk MH $\sim n^{-1}$ , HMC $\sim n^{-1/4}$ .

(c) Tune the random-walk MH step size to $\delta = 2.38/\sqrt{n}$ (optimal for isotropic Gaussian) and the HMC leapfrog step to achieve $\approx 0.65$ acceptance. Comment on the practical difficulty of tuning each algorithm.

Show Hint

ESS = $T / (1 + 2\sum_{k=1}^\infty \hat{\rho}_k)$ where $\hat{\rho}_k$ is the estimated lag- $k$ autocorrelation. Use arviz.ess() or implement directly.

For HMC on $\mathcal{N}(\mathbf{0}, \mathbf{I})$ : the negative log-posterior is $U(\boldsymbol{\gamma}) = \|\boldsymbol{\gamma}\|^2/2$ , so $\nabla U = \boldsymbol{\gamma}$ .

Solution

Random-walk MH

Proposal: $\boldsymbol{\gamma}' = \boldsymbol{\gamma} + \delta\boldsymbol{\varepsilon}$ , $\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ . Acceptance: $\min(1, \exp(-\|\boldsymbol{\gamma}'\|^2/2 + \|\boldsymbol{\gamma}\|^2/2))$ . With $\delta = 2.38/\sqrt{n}$ : acceptance $\approx 0.234$ , ESS $\approx T/n$ — scales as $n^{-1}$ .

HMC

Introduce momentum $\mathbf{p} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ . Leapfrog with $L = 10$ steps, step size $\varepsilon$ tuned for 65% acceptance. ESS $\approx T/n^{1/4}$ — confirmed numerically. Each HMC proposal costs $L$ gradient evaluations.

Discussion

For $n = 500$ : random-walk MH has ESS/eval $\approx 1/500 = 0.002$ while HMC has ESS/eval $\approx 1/4.7 \approx 0.2$ . HMC is $\approx 100\times$ more efficient. The tuning difficulty: MH requires only $\delta$ ; HMC requires $\varepsilon$ and $L$ (mitigated by NUTS which auto-tunes $L$ ). $\blacksquare$

ex-optimal-design

Challenge

Consider a linear imaging system $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ where $\mathbf{A}$ consists of $m$ rows selected from a $2N \times N$ DFT matrix (1D Fourier sampling), $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ , and prior $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \gamma_0^2\mathbf{I})$ .

(a) For a given selection of $m$ Fourier frequencies, derive the posterior variance $[\mathbf{\Gamma}_{\text{post}}]_{ii}$ as a function of the selected rows.

(b) Formulate the A-optimal design problem: select $m$ frequencies to minimize $\operatorname{tr}(\mathbf{\Gamma}_{\text{post}})$ (average posterior variance).

(c) Implement a greedy algorithm: at each step, add the frequency that maximally reduces $\operatorname{tr}(\mathbf{\Gamma}_{\text{post}})$ . Compare with random frequency selection for a Shepp-Logan phantom ( $N = 64$ ).

(d) Plot the posterior standard deviation maps for greedy-optimal vs random frequency selection. Quantify the reduction in average uncertainty.

Show Hint

For DFT rows, $\mathbf{A}^H\mathbf{A} = \sum_{k \in S} \mathbf{a}_k\mathbf{a}_k^H$ where $\mathbf{a}_k$ is the $k$ -th DFT row.

The rank-1 update formula: $\operatorname{tr}((\mathbf{B} + \mathbf{a}\mathbf{a}^H)^{-1}) = \operatorname{tr}(\mathbf{B}^{-1}) - \mathbf{a}^H\mathbf{B}^{-2}\mathbf{a}/(1 + \mathbf{a}^H\mathbf{B}^{-1}\mathbf{a})$ enables efficient greedy updates.

Solution

Posterior variance

$\mathbf{\Gamma}_{\text{post}} = (\sigma^{-2}\mathbf{A}^H\mathbf{A} + \gamma_0^{-2}\mathbf{I})^{-1}$ . For DFT rows, $\mathbf{A}^H\mathbf{A}$ has a circulant structure exploitable via FFT.

A-optimal formulation

$\min_{S \subseteq [2N],\, |S|=m} \operatorname{tr}\!\left((\sigma^{-2}\sum_{k\in S}\mathbf{a}_k\mathbf{a}_k^H + \gamma_0^{-2}\mathbf{I})^{-1}\right)$ . This is NP-hard in general; the greedy algorithm has a $(1-1/e)$ guarantee for the related D-optimal criterion.

Greedy algorithm and comparison

Greedy: initialize $\mathbf{B}_0 = \gamma_0^{-2}\mathbf{I}$ ; at step $j$ , add frequency $k^* = \arg\min_k \operatorname{tr}((\mathbf{B}_{j-1} + \sigma^{-2}\mathbf{a}_{k^*}\mathbf{a}_{k^*}^H)^{-1})$ via rank-1 update formula. For $N=64$ , $m=16$ , greedy achieves $\operatorname{tr}(\mathbf{\Gamma}_{\text{post}}) \approx 0.7\times$ that of random selection — confirming the practical value of optimal design. $\blacksquare$

ex-ch03-12

Challenge

Build a complete Bayesian sparse radar imaging pipeline for a simulated 2D scene with 3 point targets:

(a) Forward model: Simulate measurements $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ with the Bernoulli-Gaussian prior (3 targets of random reflectivity, $w = 3/64^2$ ) and $\mathbf{A}$ a random Gaussian matrix ( $m = 200$ , $n = 64^2$ , normalized columns).

(b) SBL reconstruction: Run Algorithm (EM for SBL) for 50 iterations. Plot the convergence of $\log\boldsymbol{\alpha}$ and the pruning events.

(c) UQ: Report the posterior mean and $95\%$ credible intervals for the 3 detected pixels. Compare the posterior standard deviations with the Laplace approximation.

(d) LASSO comparison: Run LASSO with $\lambda$ selected by the discrepancy principle (§Parameter Choice Rules). Compare NMSE and detection rate with SBL over 20 Monte Carlo trials.

Show Hint

For SBL with $n = 4096$ : the dominant cost is inverting the $n \times n$ posterior covariance. Use the Woodbury identity to work with the $m \times m$ matrix: $\mathbf{\Sigma} = \mathbf{D}_\alpha^{-1} - \mathbf{D}_\alpha^{-1}\mathbf{A}^H(\sigma^2\mathbf{I} + \mathbf{A}\mathbf{D}_\alpha^{-1}\mathbf{A}^H)^{-1}\mathbf{A}\mathbf{D}_\alpha^{-1}$ .

Monte Carlo: for each trial, re-generate the target locations, reflectivities, and noise. Run both algorithms, record whether targets are detected (posterior mean exceeds noise floor).

Solution

Forward model setup

$\mathbf{A} \in \mathbb{R}^{200 \times 4096}$ with i.i.d. $\mathcal{N}(0, 1/200)$ entries. Place 3 targets at random positions with $|\gamma_i| \sim \text{Uniform}(0.5, 2)$ , angle uniform on $[0, 2\pi)$ .

SBL with Woodbury

Use Woodbury: $(\sigma^{-2}\mathbf{A}^H\mathbf{A} + \mathbf{D}_\alpha)^{-1} = \mathbf{D}_\alpha^{-1} - \mathbf{D}_\alpha^{-1}\mathbf{A}^H(\sigma^2\mathbf{I}_m + \mathbf{A}\mathbf{D}_\alpha^{-1}\mathbf{A}^H)^{-1}\mathbf{A}\mathbf{D}_\alpha^{-1}$ . Cost per EM iteration: $O(m^2n + m^3)$ — much cheaper than $O(n^3)$ for $m \ll n$ .

Comparison

Typical results (SNR 10 dB): SBL NMSE $\approx -18$ dB, detection rate $\approx 97\%$ at PFA $\leq 1\%$ . LASSO NMSE $\approx -14$ dB, detection rate $\approx 88\%$ at same PFA. SBL outperforms due to automatic regularization and sparse-promoting hyperparameter updates. $\blacksquare$

ex-ch03-13

Medium

The posterior variance map $[\mathbf{\Gamma}_{\text{post}}]_{ii}$ can be interpreted as a diagnostic for the quality of the sensing geometry.

(a) For a circular aperture imaging system (transmitters and receivers on a circle of radius $R$ , scene in the center), compute $\mathbf{A}^H\mathbf{A}$ analytically and show it is approximately circulant.

(b) For a Gaussian prior, show that the posterior variance is approximately constant across the scene (translation-invariant UQ). When does this approximation break down?

(c) For a linear aperture system (transmitters and receivers on a line), show that the posterior variance is much higher at the edges of the scene than at the center, reflecting the non-uniform k-space coverage.

Show Hint

For a circular aperture with dense Tx/Rx spacing, $[\mathbf{A}^H\mathbf{A}]_{ij}$ depends only on the displacement $i - j$ (stationarity).

Non-uniform coverage: a linear aperture samples the k-space along a limited angular range, leaving large k-space regions unmeasured at the scene edges.

Solution

Circular aperture

For a uniformly sampled circular aperture with $m$ Tx-Rx pairs evenly spaced in angle, the PSF (point spread function) $[\mathbf{A}^H\mathbf{A}]_{ij}$ is a function only of $|x_i - x_j|$ (in the 2D sense) — approximately shift-invariant. The posterior variance is approximately constant: $[\mathbf{\Gamma}_{\text{post}}]_{ii} \approx \sigma_{\text{post}}^2$ for all $i$ near the center.

Linear aperture

Linear aperture: k-space coverage is limited to a sector, leading to non-uniform resolution. Pixels at scene edges require large-angle components of $\mathbf{A}$ , which are undersupported. $[\mathbf{\Gamma}_{\text{post}}]_{ii}$ grows toward the edges, revealing reduced resolution and increased uncertainty. This motivates MIMO configurations ([?ch33:s01]) that provide more uniform k-space coverage. $\blacksquare$

ex-ch03-14

Easy

Verify calibration of the Gaussian posterior credible intervals.

(a) Generate 500 trials: for each, draw $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ , simulate $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ with $\mathbf{A} \in \mathbb{R}^{10 \times 20}$ and $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, 0.1\mathbf{I})$ , and compute the $95\%$ credible interval for each pixel.

(b) Count the fraction of trials where the true $\gamma_i^\dagger$ lies within its credible interval. Verify this is close to $95\%$ .

(c) Repeat with a misspecified prior ( $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, 4\mathbf{I})$ but prior used in inference is $\mathcal{N}(\mathbf{0}, \mathbf{I})$ ). Show that the credible intervals are overconfident (coverage $< 95\%$ ).

Show Hint

Credible interval for pixel $i$ : $[\hat{\gamma}_i^{\text{post}} \pm 1.96\sqrt{[\mathbf{\Gamma}_{\text{post}}]_{ii}}]$ where $\mathbf{\Gamma}_{\text{post}} = (\sigma^{-2}\mathbf{A}^H\mathbf{A} + \mathbf{I})^{-1}$ .

Solution

Correctly specified model

For the correctly specified model, by Bayes optimality the credible intervals are calibrated: empirical coverage should be $\approx 95\%$ (with $\pm 1\%$ Monte Carlo error for 500 trials and 20 pixels $\approx 10000$ coverage events).

Misspecified model

With $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, 4\mathbf{I})$ but prior $\mathcal{N}(\mathbf{0}, \mathbf{I})$ : the true prior variance is $4\times$ larger than assumed. The posterior covariance $\mathbf{\Gamma}_{\text{post}}$ is too small (overconfident), so credible intervals are too narrow. Empirical coverage $\approx 72\%$ instead of $95\%$ . $\blacksquare$

ex-ch03-15

Medium

Prove the following property of the Bayesian posterior: for any unbiased estimator $\hat{\boldsymbol{\gamma}}(\mathbf{y})$ of $\boldsymbol{\gamma}$ under the model $(p(\mathbf{y} \mid \boldsymbol{\gamma}), \pi(\boldsymbol{\gamma}))$ , the MMSE estimate $\hat{\boldsymbol{\gamma}}_{\text{MMSE}} = \mathbb{E}[\boldsymbol{\gamma} \mid \mathbf{y}]$ minimizes the Bayes risk $\mathcal{R}(\hat{\boldsymbol{\gamma}}) = \mathbb{E}[\|\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}\|^2]$ over all estimators (not just unbiased ones).

(a) Write the Bayes risk as an expectation over both $\mathbf{y}$ and $\boldsymbol{\gamma}$ .

(b) Use the law of iterated expectation to reduce to minimizing the posterior expected loss.

(c) Show that for each fixed $\mathbf{y}$ , the minimizer of $\mathbb{E}[\|\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}\|^2 \mid \mathbf{y}]$ over $\hat{\boldsymbol{\gamma}}$ is the posterior mean.

Show Hint

$\mathbb{E}[\|\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}\|^2] = \mathbb{E}_{\mathbf{y}}[\mathbb{E}[\|\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}\|^2 \mid \mathbf{y}]]$ .

Minimize $\mathbb{E}[\|\boldsymbol{\gamma} - \mathbf{c}\|^2 \mid \mathbf{y}]$ over constant $\mathbf{c}$ : differentiate with respect to $\mathbf{c}$ .

Solution

Bayes risk decomposition

$\mathcal{R}(\hat{\boldsymbol{\gamma}}) = \mathbb{E}[\|\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}(\mathbf{y})\|^2] = \mathbb{E}_{\mathbf{y}}\bigl[\mathbb{E}[\|\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}(\mathbf{y})\|^2 \mid \mathbf{y}]\bigr]$ . Minimizing over all estimators $\hat{\boldsymbol{\gamma}}(\cdot)$ is equivalent to minimizing the inner expectation pointwise for each $\mathbf{y}$ .

Pointwise minimization

Fix $\mathbf{y}$ and minimize $f(\mathbf{c}) = \mathbb{E}[\|\boldsymbol{\gamma} - \mathbf{c}\|^2 \mid \mathbf{y}]$ over $\mathbf{c} \in \mathbb{R}^n$ . $\nabla_{\mathbf{c}} f = -2\mathbb{E}[\boldsymbol{\gamma} - \mathbf{c} \mid \mathbf{y}] = 0$ gives $\mathbf{c} = \mathbb{E}[\boldsymbol{\gamma} \mid \mathbf{y}] = \hat{\boldsymbol{\gamma}}_{\text{MMSE}}$ . The Hessian $\nabla^2 f = 2\mathbf{I} \succ 0$ , confirming this is a global minimum. $\blacksquare$

Exercises

ex-ch03-01

Likelihood and prior

Completing the square

Simplification and SNR interpretation

ex-ch03-02

MAP optimization

SVD basis

Limiting cases

ex-ch03-03

Posterior covariance in SVD basis

Null space

Data-dominated regime

ex-ch03-04

MAP via subgradient

MMSE (posterior mean)

Comparison

ex-ch03-05

Marginal distribution

Log-evidence

Differentiating and implicit equation

ex-ch03-06

E-step

M-step derivation

Pruning condition

ex-ch03-07

Conditional posterior mean

Prior on $\kappa$

Numerical comparison

ex-ch03-08

Eigenvalues

Trace class condition

Cameron-Martin space

ex-ch03-09

Setup

pCN loop

Posterior statistics

ex-ch03-10

Random-walk MH

HMC

Discussion

ex-optimal-design

Posterior variance

A-optimal formulation

Greedy algorithm and comparison

ex-ch03-12

Forward model setup

SBL with Woodbury

Comparison

ex-ch03-13

Circular aperture

Linear aperture

ex-ch03-14

Correctly specified model

Misspecified model

ex-ch03-15

Bayes risk decomposition

Pointwise minimization