Gaussian Measures on Function Spaces

Why Infinite-Dimensional Priors?

The discretized Bayesian formulation of the previous sections works on a fixed grid of nn pixels. But consider what happens as we refine the grid: nn \to \infty. A prior γN(0,Γ)\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Gamma}) with Γ=γ2I\mathbf{\Gamma} = \gamma^2 \mathbf{I} assigns variance γ2\gamma^2 per pixel, so the total prior energy E[γ2]=nγ2\mathbb{E}[\|\boldsymbol{\gamma}\|^2] = n\gamma^2 \to \infty. The prior is not discretization-invariant — its properties depend on the grid.

Gaussian measures on function spaces provide the rigorous framework for priors that are consistent across grid refinements. This section provides the theoretical foundation; it can be skimmed on first reading and returned to when working with continuum imaging problems.

Definition:

Gaussian Measure on a Hilbert Space

A Gaussian measure μ0=N(m0,C0)\mu_0 = \mathcal{N}(m_0, \mathcal{C}_0) on a separable Hilbert space X\mathcal{X} is a probability measure such that every continuous linear functional X\ell \in \mathcal{X}^* has a Gaussian distribution:

(γ)N((m0),  (C0)),\ell(\gamma) \sim \mathcal{N}\bigl(\ell(m_0),\; \ell(\mathcal{C}_0 \ell)\bigr),

where m0Xm_0 \in \mathcal{X} is the mean and C0 ⁣:XX\mathcal{C}_0 \colon \mathcal{X} \to \mathcal{X} is a symmetric, positive, trace-class operator (the covariance).

The trace-class requirement tr(C0)<\operatorname{tr}(\mathcal{C}_0) < \infty ensures that draws from μ0\mu_0 have finite expected norm:

Eμ0γm02=tr(C0)<.\mathbb{E}_{\mu_0}\|\gamma - m_0\|^2 = \operatorname{tr}(\mathcal{C}_0) < \infty.

This is the infinite-dimensional analogue of requiring a finite covariance matrix.

Trace Class Means Finite Expected Norm

An operator C0\mathcal{C}_0 is trace class if and only if k=1λk<\sum_{k=1}^\infty \lambda_k < \infty where λk\lambda_k are its eigenvalues (in decreasing order). For the Whittle-Matérn covariance operator on [0,1]d[0,1]^d,

C0=(κ2IΔ)s,\mathcal{C}_0 = (\kappa^2 I - \Delta)^{-s},

the eigenvalues decay as λkk2s/d\lambda_k \sim k^{-2s/d}. The trace-class condition requires 2s/d>12s/d > 1, i.e., s>d/2s > d/2. In 2D (d=2d = 2), we need s>1s > 1; the Matérn-3/2 kernel (s=3/2s = 3/2) is the minimal choice satisfying this.

Naive covariances like C0=cI\mathcal{C}_0 = c \cdot I (constant times identity) are not trace class in infinite dimensions — this is why the naive γN(0,γ2I)\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \gamma^2 \mathbf{I}) fails as nn \to \infty.

Definition:

Karhunen-Loève Expansion

Let μ0=N(0,C0)\mu_0 = \mathcal{N}(0, \mathcal{C}_0) with eigenpairs {λk,ϕk}k=1\{\lambda_k, \phi_k\}_{k=1}^\infty satisfying C0ϕk=λkϕk\mathcal{C}_0 \phi_k = \lambda_k \phi_k. The Karhunen-Loève (KL) expansion represents draws from μ0\mu_0 as

γ=k=1ξkλkϕk,ξkiidN(0,1).\gamma = \sum_{k=1}^\infty \xi_k \sqrt{\lambda_k}\,\phi_k, \qquad \xi_k \stackrel{\text{iid}}{\sim} \mathcal{N}(0,1).

This series converges in X\mathcal{X} (in mean-square sense) precisely when C0\mathcal{C}_0 is trace class: kλk<\sum_k \lambda_k < \infty.

In practice, the KL expansion is truncated to the leading rr terms: γ(r)=k=1rξkλkϕk,\gamma^{(r)} = \sum_{k=1}^r \xi_k \sqrt{\lambda_k}\,\phi_k, giving a low-dimensional representation that captures most of the prior energy (since λk0\lambda_k \to 0 rapidly for smooth priors).

Theorem: Cameron-Martin Theorem

Let μ0=N(0,C0)\mu_0 = \mathcal{N}(0, \mathcal{C}_0) be a Gaussian measure on X\mathcal{X} and let hXh \in \mathcal{X}. The translated measure μh()=μ0(h)\mu_h(\cdot) = \mu_0(\cdot - h) is absolutely continuous with respect to μ0\mu_0 (written μhμ0\mu_h \ll \mu_0) if and only if hh belongs to the Cameron-Martin space

H=Range(C01/2)={hX:hH2=C01/2h2<}.\mathcal{H} = \operatorname{Range}(\mathcal{C}_0^{1/2}) = \left\{h \in \mathcal{X} : \|h\|_{\mathcal{H}}^2 = \|\mathcal{C}_0^{-1/2} h\|^2 < \infty\right\}.

In that case, the Radon-Nikodym derivative is

dμhdμ0(γ)=exp ⁣(C01h,γX12hH2).\frac{\mathrm{d}\mu_h}{\mathrm{d}\mu_0}(\gamma) = \exp\!\left(\langle \mathcal{C}_0^{-1} h,\, \gamma\rangle_{\mathcal{X}} - \frac{1}{2}\|h\|_{\mathcal{H}}^2\right).

Theorem: Stuart's Well-Posedness Theorem

Under the following conditions:

  • The prior is μ0=N(0,C0)\mu_0 = \mathcal{N}(0, \mathcal{C}_0) with C0\mathcal{C}_0 trace class.
  • The forward operator A ⁣:XRm\mathcal{A} \colon \mathcal{X} \to \mathbb{R}^m is bounded.
  • The noise wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}).

The posterior measure μy=p(γy)\mu^{\mathbf{y}} = p(\gamma \mid \mathbf{y}) (as a measure on X\mathcal{X}) satisfies:

  1. Existence: μy\mu^{\mathbf{y}} is well-defined and absolutely continuous with respect to μ0\mu_0.
  2. Uniqueness: The posterior is the unique measure with Radon-Nikodym derivative proportional to the likelihood: dμydμ0(γ)exp ⁣(12σ2Aγy2)\frac{\mathrm{d}\mu^{\mathbf{y}}}{\mathrm{d}\mu_0}(\gamma) \propto \exp\!\left(-\frac{1}{2\sigma^2}\|\mathcal{A}\gamma - \mathbf{y}\|^2\right).
  3. Stability: μy\mu^{\mathbf{y}} depends continuously on y\mathbf{y} in the Hellinger metric: for any ε>0\varepsilon > 0 there exists δ>0\delta > 0 such that y1y2<δdHell(μy1,μy2)<ε\|\mathbf{y}_1 - \mathbf{y}_2\| < \delta \Rightarrow d_{\text{Hell}}(\mu^{\mathbf{y}_1}, \mu^{\mathbf{y}_2}) < \varepsilon.

Discretization-Invariant Algorithms

The Gaussian measure framework motivates discretization-invariant algorithms: methods whose performance does not degrade as the mesh is refined.

The key insight: naive random-walk Metropolis proposals γ=γ+ε\gamma' = \gamma + \varepsilon, εN(0,δ2I)\varepsilon \sim \mathcal{N}(\mathbf{0}, \delta^2 \mathbf{I}) achieve optimal acceptance rate 0.234\approx 0.234 only when δn1/2\delta \sim n^{-1/2} — making effective step sizes vanish as nn \to \infty.

The preconditioned Crank-Nicolson (pCN) proposal respects the prior covariance: γ=1β2γ+βξ,ξμ0=N(0,C0).\gamma' = \sqrt{1 - \beta^2}\,\gamma + \beta\,\xi, \qquad \xi \sim \mu_0 = \mathcal{N}(0, \mathcal{C}_0).

This proposal preserves μ0\mu_0: if γμ0\gamma \sim \mu_0 then γμ0\gamma' \sim \mu_0. The acceptance probability in pCN involves only the likelihood ratio (the prior cancels), giving dimension-independent acceptance rates of O(1)O(1).

🚨Critical Engineering Note

Choosing the Prior Covariance for RF Imaging

In practice, the Gaussian prior covariance C0\mathcal{C}_0 must be chosen based on domain knowledge about the scene. For RF imaging:

  • Whittle-Matérn: C0=(κ2IΔ)s\mathcal{C}_0 = (\kappa^2 I - \Delta)^{-s} with length scale 1/κ1/\kappa and smoothness ss. For point-target scenes, s=1s = 1 (exponential covariance, C0C^0 draws) is appropriate; for extended objects, s=2s = 2 (Matérn-3/2, C1C^1 draws) or higher.
  • Length scale 1/κ1/\kappa: Should match the expected spatial extent of reflectors. For radar at 77 GHz, κ\kappa is calibrated to the range/cross-range resolution.
  • Tensor-product structure: For a 2D scene, the KL expansion of C0=CxCz\mathcal{C}_0 = \mathcal{C}_x \otimes \mathcal{C}_z can be computed independently in each dimension, reducing O(n3)O(n^3) to O(nx3+nz3)O(n_x^3 + n_z^3).

Key constraint: For realistic 128×128128 \times 128 scenes (n16,384n \approx 16,384), computing the full posterior covariance Γpost\mathbf{\Gamma}_{\text{post}} requires O(n3)4×1012O(n^3) \approx 4 \times 10^{12} operations — infeasible. Low-rank approximations (§Uncertainty Quantification) or MCMC ([?s05:alg-pcn]) are required.

Practical Constraints
  • 128×128128 \times 128 scene: full posterior covariance computation requires 4\sim 4 TB memory

  • Low-rank approximation with r=100r = 100 eigenmodes reduces to 100\sim 100 MB and O(r2n)O(r^2 n) FLOP

  • pCN sampler scales to n106n \sim 10^6 at O(n)O(n) per sample when C0\mathcal{C}_0 has fast matrix-vector products

Common Mistake: Grid-Dependent Prior Hyperparameters

Mistake:

When refining the pixel grid from 64×6464 \times 64 to 128×128128 \times 128, keeping the prior variance γ2\gamma^2 fixed gives a different effective prior — the coarser grid has fewer pixels so less total prior energy.

Correction:

Use a continuum-consistent prior (e.g., Whittle-Matérn) where the length scale 1/κ1/\kappa and smoothness ss are physical parameters independent of grid resolution. When discretizing, scale the covariance matrix by the grid spacing hh to maintain the continuous-limit behavior: [C0]ij=CMateˊrn(hij)hd[\mathbf{C}_0]_{ij} = C_{\text{Matérn}}(h\|i-j\|) \cdot h^d where dd is the spatial dimension.

Historical Note: From Wiener Measure to Modern Bayesian Imaging

1923-2010

The theory of Gaussian measures on infinite-dimensional spaces traces back to Norbert Wiener's 1923 construction of Brownian motion as a measure on the space of continuous functions — the first rigorous infinite-dimensional Gaussian measure. Irving Segal, Leonard Gross, and others developed the abstract framework through the 1950s-70s.

The systematic application of Gaussian measures to Bayesian inverse problems was synthesized by Andrew Stuart's landmark 2010 paper "Inverse Problems: A Bayesian Perspective" in Acta Numerica. Stuart unified the finite-dimensional and infinite-dimensional theories, proving well-posedness and stability results that provided the theoretical foundation for the now-thriving field of Bayesian imaging. The Cameron-Martin theorem — originally proved by Robert H. Cameron and William T. Martin in 1944 for Wiener measure — plays a central role in characterizing which MAP estimates are meaningful as elements of function space.

Key Takeaway

  1. Gaussian measures on Hilbert spaces provide discretization-invariant priors for infinite-dimensional Bayesian inverse problems.

  2. The covariance operator must be trace class (tr(C0)<\operatorname{tr}(\mathcal{C}_0) < \infty) for the prior to assign finite expected norm to draws — the Whittle-Matérn family satisfies this for s>d/2s > d/2.

  3. The Karhunen-Loève expansion provides a countable representation of draws from a Gaussian measure in terms of i.i.d. standard Gaussians.

  4. The Cameron-Martin theorem characterizes when translated Gaussian measures remain absolutely continuous: the shift must lie in H=Range(C01/2)\mathcal{H} = \operatorname{Range}(\mathcal{C}_0^{1/2}).

  5. Stuart's theorem guarantees existence, uniqueness, and stability of the posterior measure under mild conditions on the forward operator.

  6. Discretization-invariant algorithms (pCN sampler) exploit the Gaussian measure structure to achieve mesh-independent performance.

Trace-class operator

A compact operator C ⁣:XX\mathcal{C} \colon \mathcal{X} \to \mathcal{X} is trace class if tr(C)=kλk<\operatorname{tr}(\mathcal{C}) = \sum_k \lambda_k < \infty, where λk\lambda_k are its eigenvalues in decreasing order. Trace-class covariance operators define valid Gaussian measures on infinite-dimensional Hilbert spaces: they ensure draws from the measure have finite expected norm Eγ2=tr(C)<\mathbb{E}\|\gamma\|^2 = \operatorname{tr}(\mathcal{C}) < \infty.

Related: Gaussian Measure on a Hilbert Space, Cameron-Martin space

Cameron-Martin space

Given a Gaussian measure μ0=N(0,C0)\mu_0 = \mathcal{N}(0, \mathcal{C}_0) on a Hilbert space X\mathcal{X}, the Cameron-Martin space is H=Range(C01/2)\mathcal{H} = \operatorname{Range}(\mathcal{C}_0^{1/2}) equipped with norm hH=C01/2hX\|h\|_{\mathcal{H}} = \|\mathcal{C}_0^{-1/2}h\|_{\mathcal{X}}. It characterizes which translations of μ0\mu_0 remain absolutely continuous with respect to μ0\mu_0: a shift hh preserves absolute continuity if and only if hHh \in \mathcal{H}. For the Whittle-Matérn prior C0=(κ2IΔ)s\mathcal{C}_0 = (\kappa^2 I - \Delta)^{-s}, the Cameron-Martin space is the Sobolev space HsH^s.

Related: Trace-class operator, Gaussian Measure on a Hilbert Space, Sobolev Spaces Hs(Ω)H^s(\Omega)