Ferkans — Interactive Telecom Tutor

Why Infinite-Dimensional Priors?

The discretized Bayesian formulation of the previous sections works on a fixed grid of $n$ pixels. But consider what happens as we refine the grid: $n \to \infty$ . A prior $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Gamma})$ with $\mathbf{\Gamma} = \gamma^2 \mathbf{I}$ assigns variance $\gamma^2$ per pixel, so the total prior energy $\mathbb{E}[\|\boldsymbol{\gamma}\|^2] = n\gamma^2 \to \infty$ . The prior is not discretization-invariant — its properties depend on the grid.

Gaussian measures on function spaces provide the rigorous framework for priors that are consistent across grid refinements. This section provides the theoretical foundation; it can be skimmed on first reading and returned to when working with continuum imaging problems.

Definition:
Gaussian Measure on a Hilbert Space

A Gaussian measure $\mu_0 = \mathcal{N}(m_0, \mathcal{C}_0)$ on a separable Hilbert space $\mathcal{X}$ is a probability measure such that every continuous linear functional $\ell \in \mathcal{X}^*$ has a Gaussian distribution:

$\ell(\gamma) \sim \mathcal{N}\bigl(\ell(m_0),\; \ell(\mathcal{C}_0 \ell)\bigr),$

where $m_0 \in \mathcal{X}$ is the mean and $\mathcal{C}_0 \colon \mathcal{X} \to \mathcal{X}$ is a symmetric, positive, trace-class operator (the covariance).

The trace-class requirement $\operatorname{tr}(\mathcal{C}_0) < \infty$ ensures that draws from $\mu_0$ have finite expected norm:

$\mathbb{E}_{\mu_0}\|\gamma - m_0\|^2 = \operatorname{tr}(\mathcal{C}_0) < \infty.$

This is the infinite-dimensional analogue of requiring a finite covariance matrix.

Trace Class Means Finite Expected Norm

An operator $\mathcal{C}_0$ is trace class if and only if $\sum_{k=1}^\infty \lambda_k < \infty$ where $\lambda_k$ are its eigenvalues (in decreasing order). For the Whittle-Matérn covariance operator on $[0,1]^d$ ,

$\mathcal{C}_0 = (\kappa^2 I - \Delta)^{-s},$

the eigenvalues decay as $\lambda_k \sim k^{-2s/d}$ . The trace-class condition requires $2s/d > 1$ , i.e., $s > d/2$ . In 2D ( $d = 2$ ), we need $s > 1$ ; the Matérn-3/2 kernel ( $s = 3/2$ ) is the minimal choice satisfying this.

Naive covariances like $\mathcal{C}_0 = c \cdot I$ (constant times identity) are not trace class in infinite dimensions — this is why the naive $\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \gamma^2 \mathbf{I})$ fails as $n \to \infty$ .

Definition:
Karhunen-Loève Expansion

Let $\mu_0 = \mathcal{N}(0, \mathcal{C}_0)$ with eigenpairs $\{\lambda_k, \phi_k\}_{k=1}^\infty$ satisfying $\mathcal{C}_0 \phi_k = \lambda_k \phi_k$ . The Karhunen-Loève (KL) expansion represents draws from $\mu_0$ as

$\gamma = \sum_{k=1}^\infty \xi_k \sqrt{\lambda_k}\,\phi_k, \qquad \xi_k \stackrel{\text{iid}}{\sim} \mathcal{N}(0,1).$

This series converges in $\mathcal{X}$ (in mean-square sense) precisely when $\mathcal{C}_0$ is trace class: $\sum_k \lambda_k < \infty$ .

In practice, the KL expansion is truncated to the leading $r$ terms: $\gamma^{(r)} = \sum_{k=1}^r \xi_k \sqrt{\lambda_k}\,\phi_k,$ giving a low-dimensional representation that captures most of the prior energy (since $\lambda_k \to 0$ rapidly for smooth priors).

Theorem: Cameron-Martin Theorem

Let $\mu_0 = \mathcal{N}(0, \mathcal{C}_0)$ be a Gaussian measure on $\mathcal{X}$ and let $h \in \mathcal{X}$ . The translated measure $\mu_h(\cdot) = \mu_0(\cdot - h)$ is absolutely continuous with respect to $\mu_0$ (written $\mu_h \ll \mu_0$ ) if and only if $h$ belongs to the Cameron-Martin space

$\mathcal{H} = \operatorname{Range}(\mathcal{C}_0^{1/2}) = \left\{h \in \mathcal{X} : \|h\|_{\mathcal{H}}^2 = \|\mathcal{C}_0^{-1/2} h\|^2 < \infty\right\}.$

In that case, the Radon-Nikodym derivative is

$\frac{\mathrm{d}\mu_h}{\mathrm{d}\mu_0}(\gamma) = \exp\!\left(\langle \mathcal{C}_0^{-1} h,\, \gamma\rangle_{\mathcal{X}} - \frac{1}{2}\|h\|_{\mathcal{H}}^2\right).$

Proof

Sketch via Karhunen-Loève expansion

In the KL basis, $\gamma = \sum_k \xi_k \sqrt{\lambda_k}\,\phi_k$ with $\xi_k \stackrel{\text{iid}}{\sim} \mathcal{N}(0,1)$ . The translation $\gamma + h$ corresponds to shifting $\xi_k$ by $h_k/\sqrt{\lambda_k}$ where $h_k = \langle h, \phi_k\rangle_{\mathcal{X}}$ .

By the Cameron-Martin theorem for sequences of Gaussians, this shift yields an absolutely continuous measure if and only if $\sum_k h_k^2/\lambda_k < \infty$ , which is exactly $\|h\|_{\mathcal{H}}^2 = \|\mathcal{C}_0^{-1/2}h\|^2 < \infty$ .

The Radon-Nikodym derivative follows from the product formula for Gaussian likelihood ratios: $\prod_k \exp\!\left(\frac{h_k}{\lambda_k}\xi_k\sqrt{\lambda_k} - \frac{h_k^2}{2\lambda_k}\right) = \exp\!\left(\langle \mathcal{C}_0^{-1}h, \gamma\rangle - \frac{1}{2}\|h\|_{\mathcal{H}}^2\right)$ . $\blacksquare$

Theorem: Stuart's Well-Posedness Theorem

Under the following conditions:

The prior is $\mu_0 = \mathcal{N}(0, \mathcal{C}_0)$ with $\mathcal{C}_0$ trace class.
The forward operator $\mathcal{A} \colon \mathcal{X} \to \mathbb{R}^m$ is bounded.
The noise $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ .

The posterior measure $\mu^{\mathbf{y}} = p(\gamma \mid \mathbf{y})$ (as a measure on $\mathcal{X}$ ) satisfies:

Existence: $\mu^{\mathbf{y}}$ is well-defined and absolutely continuous with respect to $\mu_0$ .
Uniqueness: The posterior is the unique measure with Radon-Nikodym derivative proportional to the likelihood: $\frac{\mathrm{d}\mu^{\mathbf{y}}}{\mathrm{d}\mu_0}(\gamma) \propto \exp\!\left(-\frac{1}{2\sigma^2}\|\mathcal{A}\gamma - \mathbf{y}\|^2\right)$ .
Stability: $\mu^{\mathbf{y}}$ depends continuously on $\mathbf{y}$ in the Hellinger metric: for any $\varepsilon > 0$ there exists $\delta > 0$ such that $\|\mathbf{y}_1 - \mathbf{y}_2\| < \delta \Rightarrow d_{\text{Hell}}(\mu^{\mathbf{y}_1}, \mu^{\mathbf{y}_2}) < \varepsilon$ .

Proof

Existence via Bayes formula for measures

The unnormalized posterior density with respect to $\mu_0$ is $L(\gamma) = \exp(-\frac{1}{2\sigma^2}\|\mathcal{A}\gamma - \mathbf{y}\|^2) \geq 0$ , which is bounded by 1. Therefore $\mathcal{Z}(\mathbf{y}) = \int L(\gamma)\,\mathrm{d}\mu_0(\gamma) \leq \mu_0(\mathcal{X}) = 1$ .

Positivity: since $\mathcal{A}$ is bounded and $\mu_0$ has full support on $\mathcal{X}$ , $\mathcal{Z}(\mathbf{y}) > 0$ . Hence $\mathrm{d}\mu^{\mathbf{y}}/\mathrm{d}\mu_0 = L/\mathcal{Z}$ is a well-defined Radon-Nikodym derivative.

Stability via Hellinger metric

The Hellinger distance satisfies $d_{\text{Hell}}^2(\mu^{\mathbf{y}_1}, \mu^{\mathbf{y}_2}) \leq \|L(\cdot; \mathbf{y}_1) - L(\cdot; \mathbf{y}_2)\|_{L^1(\mu_0)} / \mathcal{Z}$ . Continuity of $L$ in $\mathbf{y}$ and dominated convergence give the result. $\blacksquare$

Discretization-Invariant Algorithms

The Gaussian measure framework motivates discretization-invariant algorithms: methods whose performance does not degrade as the mesh is refined.

The key insight: naive random-walk Metropolis proposals $\gamma' = \gamma + \varepsilon$ , $\varepsilon \sim \mathcal{N}(\mathbf{0}, \delta^2 \mathbf{I})$ achieve optimal acceptance rate $\approx 0.234$ only when $\delta \sim n^{-1/2}$ — making effective step sizes vanish as $n \to \infty$ .

The preconditioned Crank-Nicolson (pCN) proposal respects the prior covariance: $\gamma' = \sqrt{1 - \beta^2}\,\gamma + \beta\,\xi, \qquad \xi \sim \mu_0 = \mathcal{N}(0, \mathcal{C}_0).$

This proposal preserves $\mu_0$ : if $\gamma \sim \mu_0$ then $\gamma' \sim \mu_0$ . The acceptance probability in pCN involves only the likelihood ratio (the prior cancels), giving dimension-independent acceptance rates of $O(1)$ .

🚨Critical Engineering Note

Choosing the Prior Covariance for RF Imaging

In practice, the Gaussian prior covariance $\mathcal{C}_0$ must be chosen based on domain knowledge about the scene. For RF imaging:

Whittle-Matérn: $\mathcal{C}_0 = (\kappa^2 I - \Delta)^{-s}$ with length scale $1/\kappa$ and smoothness $s$ . For point-target scenes, $s = 1$ (exponential covariance, $C^0$ draws) is appropriate; for extended objects, $s = 2$ (Matérn-3/2, $C^1$ draws) or higher.
Length scale $1/\kappa$ : Should match the expected spatial extent of reflectors. For radar at 77 GHz, $\kappa$ is calibrated to the range/cross-range resolution.
Tensor-product structure: For a 2D scene, the KL expansion of $\mathcal{C}_0 = \mathcal{C}_x \otimes \mathcal{C}_z$ can be computed independently in each dimension, reducing $O(n^3)$ to $O(n_x^3 + n_z^3)$ .

Key constraint: For realistic $128 \times 128$ scenes ( $n \approx 16,384$ ), computing the full posterior covariance $\mathbf{\Gamma}_{\text{post}}$ requires $O(n^3) \approx 4 \times 10^{12}$ operations — infeasible. Low-rank approximations (§Uncertainty Quantification) or MCMC ([?s05:alg-pcn]) are required.

Practical Constraints

•
$128 \times 128$ scene: full posterior covariance computation requires $\sim 4$ TB memory
•
Low-rank approximation with $r = 100$ eigenmodes reduces to $\sim 100$ MB and $O(r^2 n)$ FLOP
•
pCN sampler scales to $n \sim 10^6$ at $O(n)$ per sample when $\mathcal{C}_0$ has fast matrix-vector products

Common Mistake: Grid-Dependent Prior Hyperparameters

Mistake:

When refining the pixel grid from $64 \times 64$ to $128 \times 128$ , keeping the prior variance $\gamma^2$ fixed gives a different effective prior — the coarser grid has fewer pixels so less total prior energy.

Correction:

Use a continuum-consistent prior (e.g., Whittle-Matérn) where the length scale $1/\kappa$ and smoothness $s$ are physical parameters independent of grid resolution. When discretizing, scale the covariance matrix by the grid spacing $h$ to maintain the continuous-limit behavior: $[\mathbf{C}_0]_{ij} = C_{\text{Matérn}}(h\|i-j\|) \cdot h^d$ where $d$ is the spatial dimension.

Historical Note: From Wiener Measure to Modern Bayesian Imaging

1923-2010

The theory of Gaussian measures on infinite-dimensional spaces traces back to Norbert Wiener's 1923 construction of Brownian motion as a measure on the space of continuous functions — the first rigorous infinite-dimensional Gaussian measure. Irving Segal, Leonard Gross, and others developed the abstract framework through the 1950s-70s.

The systematic application of Gaussian measures to Bayesian inverse problems was synthesized by Andrew Stuart's landmark 2010 paper "Inverse Problems: A Bayesian Perspective" in Acta Numerica. Stuart unified the finite-dimensional and infinite-dimensional theories, proving well-posedness and stability results that provided the theoretical foundation for the now-thriving field of Bayesian imaging. The Cameron-Martin theorem — originally proved by Robert H. Cameron and William T. Martin in 1944 for Wiener measure — plays a central role in characterizing which MAP estimates are meaningful as elements of function space.

Key Takeaway

Gaussian measures on Hilbert spaces provide discretization-invariant priors for infinite-dimensional Bayesian inverse problems.
The covariance operator must be trace class ( $\operatorname{tr}(\mathcal{C}_0) < \infty$ ) for the prior to assign finite expected norm to draws — the Whittle-Matérn family satisfies this for $s > d/2$ .
The Karhunen-Loève expansion provides a countable representation of draws from a Gaussian measure in terms of i.i.d. standard Gaussians.
The Cameron-Martin theorem characterizes when translated Gaussian measures remain absolutely continuous: the shift must lie in $\mathcal{H} = \operatorname{Range}(\mathcal{C}_0^{1/2})$ .
Stuart's theorem guarantees existence, uniqueness, and stability of the posterior measure under mild conditions on the forward operator.
Discretization-invariant algorithms (pCN sampler) exploit the Gaussian measure structure to achieve mesh-independent performance.

Trace-class operator

A compact operator $\mathcal{C} \colon \mathcal{X} \to \mathcal{X}$ is trace class if $\operatorname{tr}(\mathcal{C}) = \sum_k \lambda_k < \infty$ , where $\lambda_k$ are its eigenvalues in decreasing order. Trace-class covariance operators define valid Gaussian measures on infinite-dimensional Hilbert spaces: they ensure draws from the measure have finite expected norm $\mathbb{E}\|\gamma\|^2 = \operatorname{tr}(\mathcal{C}) < \infty$ .

Cameron-Martin space

Given a Gaussian measure $\mu_0 = \mathcal{N}(0, \mathcal{C}_0)$ on a Hilbert space $\mathcal{X}$ , the Cameron-Martin space is $\mathcal{H} = \operatorname{Range}(\mathcal{C}_0^{1/2})$ equipped with norm $\|h\|_{\mathcal{H}} = \|\mathcal{C}_0^{-1/2}h\|_{\mathcal{X}}$ . It characterizes which translations of $\mu_0$ remain absolutely continuous with respect to $\mu_0$ : a shift $h$ preserves absolute continuity if and only if $h \in \mathcal{H}$ . For the Whittle-Matérn prior $\mathcal{C}_0 = (\kappa^2 I - \Delta)^{-s}$ , the Cameron-Martin space is the Sobolev space $H^s$ .

Gaussian Measures on Function Spaces