Uncertainty Quantification

A Reconstruction Without Uncertainty Is Incomplete

A point estimate Ξ³^\hat{\boldsymbol{\gamma}} (MAP or MMSE) tells us what we think the scene looks like, but not how confident we are. In safety-critical applications β€” autonomous driving radar, non-destructive testing, medical imaging β€” decision-makers need to know which features of the reconstruction are reliable and which are uncertain. Declaring a scatterer present when the posterior assigns only 60% probability to that pixel is fundamentally different from 99% probability.

Uncertainty quantification (UQ) extracts this confidence information from the posterior distribution, turning a Bayesian model into actionable guidance for the engineer or clinician. A reconstruction without uncertainty bars is, in this sense, scientifically incomplete.

Definition:

Credible Intervals and Credible Regions

A (1βˆ’Ξ±)(1-\alpha) credible interval for a scalar quantity Ο•(Ξ³)\phi(\boldsymbol{\gamma}) (e.g., a single pixel value) is an interval [a,b][a, b] such that

P(Ο•(Ξ³)∈[a,b]∣y)=1βˆ’Ξ±.P\bigl(\phi(\boldsymbol{\gamma}) \in [a, b] \mid \mathbf{y}\bigr) = 1 - \alpha.

Common choices:

  • Highest posterior density (HPD): The shortest interval containing probability 1βˆ’Ξ±1-\alpha. For unimodal posteriors, it is symmetric about the mode.
  • Equal-tailed: P(Ο•<a∣y)=P(Ο•>b∣y)=Ξ±/2P(\phi < a \mid \mathbf{y}) = P(\phi > b \mid \mathbf{y}) = \alpha/2.

For the Gaussian posterior γ∣y∼N(Ξ³^post,Ξ“post)\boldsymbol{\gamma} \mid \mathbf{y} \sim \mathcal{N}(\hat{\boldsymbol{\gamma}}_{\text{post}}, \mathbf{\Gamma}_{\text{post}}), the 95%95\% credible interval for pixel ii is

Ξ³^ipostΒ±1.96[Ξ“post]ii.\hat{\gamma}_i^{\text{post}} \pm 1.96\sqrt{[\mathbf{\Gamma}_{\text{post}}]_{ii}}.

In higher dimensions, the (1βˆ’Ξ±)(1-\alpha) credible region (ellipsoidal) is

CΞ±={Ξ³:(Ξ³βˆ’Ξ³^post)HΞ“postβˆ’1(Ξ³βˆ’Ξ³^post)≀χn, 1βˆ’Ξ±2}.\mathcal{C}_\alpha = \left\{\boldsymbol{\gamma} : (\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}_{\text{post}})^H \mathbf{\Gamma}_{\text{post}}^{-1} (\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}_{\text{post}}) \leq \chi^2_{n,\,1-\alpha}\right\}.

Definition:

Posterior Variance Map

The posterior variance map displays the diagonal of the posterior covariance as an image:

Var(Ξ³i∣y)=[Ξ“post]ii,i=1,…,n.\text{Var}(\gamma_i \mid \mathbf{y}) = [\mathbf{\Gamma}_{\text{post}}]_{ii}, \qquad i = 1, \ldots, n.

This pixel-wise uncertainty map reveals:

  • Low-variance regions: Well-constrained by data (dense measurements, high SNR, good forward-operator coverage).
  • High-variance regions: Poorly constrained (few measurements, null-space directions of A\mathbf{A}, low SNR).

For the Gaussian model, Ξ“post=(Οƒβˆ’2AHA+Ξ“βˆ’1)βˆ’1\mathbf{\Gamma}_{\text{post}} = (\sigma^{-2}\mathbf{A}^H\mathbf{A} + \mathbf{\Gamma}^{-1})^{-1}, so the variance map depends on the measurement geometry through AHA\mathbf{A}^H\mathbf{A} β€” a direct tool for optimal sensor placement (A-optimal experimental design, Eex-optimal-design).

Theorem: Posterior Contraction Rate

Under regularity conditions on the forward operator A\mathbf{A} and a Gaussian prior ΞΌ0=N(0,C0)\mu_0 = \mathcal{N}(0, \mathcal{C}_0) with the true scene Ξ³β€ βˆˆH\boldsymbol{\gamma}^\dagger \in \mathcal{H} (the Cameron-Martin space), the posterior contracts around γ†\boldsymbol{\gamma}^\dagger as noise level Οƒβ†’0\sigma \to 0:

E[βˆ₯Ξ³βˆ’Ξ³β€ βˆ₯2∣y]=O ⁣(Οƒ2Ξ²/(2Ξ²+1)),\mathbb{E}\bigl[\|\boldsymbol{\gamma} - \boldsymbol{\gamma}^\dagger\|^2 \mid \mathbf{y}\bigr] = O\!\left(\sigma^{2\beta/(2\beta + 1)}\right),

where Ξ²>0\beta > 0 is the Sobolev regularity of γ†\boldsymbol{\gamma}^\dagger relative to the prior covariance. This rate matches the minimax-optimal rate for the corresponding deterministic inverse problem.

Laplace Approximation for Non-Gaussian Posteriors

For non-Gaussian priors (Laplace, horseshoe) the posterior has no closed-form covariance. The Laplace approximation fits a Gaussian to the posterior at its mode:

p(γ∣y)β‰ˆN ⁣(Ξ³^MAP,β€…β€ŠHβˆ’1),p(\boldsymbol{\gamma} \mid \mathbf{y}) \approx \mathcal{N}\!\left(\hat{\boldsymbol{\gamma}}_{\text{MAP}},\; \mathbf{H}^{-1}\right),

where H=βˆ’βˆ‡Ξ³2log⁑p(γ∣y)∣γ^MAP\mathbf{H} = -\nabla^2_{\boldsymbol{\gamma}} \log p(\boldsymbol{\gamma} \mid \mathbf{y}) \big|_{\hat{\boldsymbol{\gamma}}_{\text{MAP}}} is the Hessian of the negative log-posterior at the MAP estimate.

For Gaussian noise: H=Οƒβˆ’2AHA+βˆ‡2(βˆ’log⁑π)(Ξ³^MAP)\mathbf{H} = \sigma^{-2}\mathbf{A}^H\mathbf{A} + \nabla^2(-\log\pi)(\hat{\boldsymbol{\gamma}}_{\text{MAP}}).

Warning: The Laplace approximation systematically underestimates uncertainty in multimodal or heavy-tailed posteriors β€” it only sees the local curvature at the mode, missing the tails.

Definition:

Metropolis-Hastings MCMC

The Metropolis-Hastings (MH) algorithm generates a Markov chain {Ξ³(0),Ξ³(1),…}\{\boldsymbol{\gamma}^{(0)}, \boldsymbol{\gamma}^{(1)}, \ldots\} with stationary distribution p(γ∣y)p(\boldsymbol{\gamma} \mid \mathbf{y}):

  1. Propose Ξ³β€²βˆΌq(Ξ³β€²βˆ£Ξ³(t))\boldsymbol{\gamma}' \sim q(\boldsymbol{\gamma}' \mid \boldsymbol{\gamma}^{(t)}) from a proposal distribution qq.
  2. Accept with probability Ξ±(Ξ³β€²,Ξ³(t))=min⁑ ⁣(1,β€…β€Šp(Ξ³β€²βˆ£y) q(Ξ³(t)βˆ£Ξ³β€²)p(Ξ³(t)∣y) q(Ξ³β€²βˆ£Ξ³(t))).\alpha(\boldsymbol{\gamma}', \boldsymbol{\gamma}^{(t)}) = \min\!\left(1,\; \frac{p(\boldsymbol{\gamma}' \mid \mathbf{y})\, q(\boldsymbol{\gamma}^{(t)} \mid \boldsymbol{\gamma}')}{p(\boldsymbol{\gamma}^{(t)} \mid \mathbf{y})\, q(\boldsymbol{\gamma}' \mid \boldsymbol{\gamma}^{(t)})}\right).
  3. Set Ξ³(t+1)=Ξ³β€²\boldsymbol{\gamma}^{(t+1)} = \boldsymbol{\gamma}' if accepted, otherwise Ξ³(t+1)=Ξ³(t)\boldsymbol{\gamma}^{(t+1)} = \boldsymbol{\gamma}^{(t)}.

The acceptance ratio involves only the unnormalized posterior p(y∣γ) π(Ξ³)p(\mathbf{y} \mid \boldsymbol{\gamma})\,\pi(\boldsymbol{\gamma}), since Z(y)\mathcal{Z}(\mathbf{y}) cancels β€” no partition function needed.

pCN: Preconditioned Crank-Nicolson Sampler

Complexity: O(CA+n)O(C_{\mathcal{A}} + n) per iteration where CAC_{\mathcal{A}} is the cost of evaluating Ξ¦(Ξ³β€²)=12Οƒ2βˆ₯AΞ³β€²βˆ’yβˆ₯2\Phi(\boldsymbol{\gamma}') = \frac{1}{2\sigma^2}\|\mathbf{A}\boldsymbol{\gamma}' - \mathbf{y}\|^2. Total: O(Tβ‹…mn)O(T \cdot mn) for a dense A\mathbf{A}.
Input: Prior covariance C0\mathcal{C}_0, potential Ξ¦(Ξ³)=βˆ’log⁑p(y∣γ)\Phi(\boldsymbol{\gamma}) = -\log p(\mathbf{y} \mid \boldsymbol{\gamma}), step size β∈(0,1]\beta \in (0,1]
Output: Posterior samples {Ξ³(t)}t=1T\{\boldsymbol{\gamma}^{(t)}\}_{t=1}^T
1. Draw γ(0)∼μ0=N(0,C0)\boldsymbol{\gamma}^{(0)} \sim \mu_0 = \mathcal{N}(\mathbf{0}, \mathcal{C}_0)
2. for t=0,1,2,…,Tβˆ’1t = 0, 1, 2, \ldots, T-1 do
3. Propose: Ξ³β€²=1βˆ’Ξ²2 γ(t)+β ξ\boldsymbol{\gamma}' = \sqrt{1 - \beta^2}\,\boldsymbol{\gamma}^{(t)} + \beta\,\boldsymbol{\xi}, where ξ∼μ0\boldsymbol{\xi} \sim \mu_0
4. Compute acceptance: a=min⁑ ⁣(1,β€…β€Šexp⁑(Ξ¦(Ξ³(t))βˆ’Ξ¦(Ξ³β€²)))a = \min\!\bigl(1,\; \exp(\Phi(\boldsymbol{\gamma}^{(t)}) - \Phi(\boldsymbol{\gamma}'))\bigr)
5. Draw u∼Uniform(0,1)u \sim \text{Uniform}(0,1)
6. if u<au < a then Ξ³(t+1)←γ′\boldsymbol{\gamma}^{(t+1)} \leftarrow \boldsymbol{\gamma}' else Ξ³(t+1)←γ(t)\boldsymbol{\gamma}^{(t+1)} \leftarrow \boldsymbol{\gamma}^{(t)}
7. end for

The pCN proposal preserves ΞΌ0\mu_0: if Ξ³(t)∼μ0\boldsymbol{\gamma}^{(t)} \sim \mu_0 then Ξ³β€²βˆΌΞΌ0\boldsymbol{\gamma}' \sim \mu_0. As a consequence, the acceptance rate depends only on the likelihood ratio β€” it is independent of the discretization dimension nn. In contrast, optimal random-walk MH requires step size δ∝nβˆ’1/2\delta \propto n^{-1/2}, giving acceptance rate β†’0\to 0 as nβ†’βˆžn \to \infty.

MCMC Samplers for Imaging-Scale Posterior Inference

MethodGradient needed?Dimension scalingBest for
Random Walk MHNoO(nβˆ’1/2)O(n^{-1/2}) step sizeLow-dim, simple posteriors
pCNNoDimension-independentGaussian priors, function space
GibbsNoDepends on conditionalsConjugate hierarchical models (SBL)
HMCYes (βˆ‡log⁑p\nabla \log p)O(n1/4)O(n^{1/4}) leapfrog stepsHigh-dim, smooth posteriors
NUTS (auto-HMC)YesO(n1/4)O(n^{1/4}), auto-tunedGeneral-purpose; Stan/PyMC
Proximal MCMC (MYULA)Yes (proximal)O(n)O(n) per stepNon-smooth priors (TV, β„“1\ell_1)

Scalable Uncertainty Quantification for Imaging

Computing the full posterior covariance Ξ“post∈RnΓ—n\mathbf{\Gamma}_{\text{post}} \in \mathbb{R}^{n \times n} is infeasible for imaging-scale problems (n∼104n \sim 10^4--10610^6). Scalable alternatives:

  • Diagonal approximation: Compute only [Ξ“post]ii[\mathbf{\Gamma}_{\text{post}}]_{ii} via Hutchinson's randomized trace estimator: tr⁑(B)β‰ˆ1Kβˆ‘k=1KzkTBzk\operatorname{tr}(\mathbf{B}) \approx \frac{1}{K}\sum_{k=1}^K \mathbf{z}_k^T \mathbf{B} \mathbf{z}_k with random zk∼N(0,I)\mathbf{z}_k \sim \mathcal{N}(\mathbf{0},\mathbf{I}).
  • Low-rank approximation: Ξ“postβ‰ˆVrΞ›rVrH+Ξ“\mathbf{\Gamma}_{\text{post}} \approx \mathbf{V}_r \mathbf{\Lambda}_r \mathbf{V}_r^H + \mathbf{\Gamma} using the rr leading eigenpairs of Οƒβˆ’2AHA\sigma^{-2}\mathbf{A}^H\mathbf{A} (computed via randomized SVD).
  • MCMC-based: Posterior variance estimated from sample variance: Var⁑^(Ξ³i)β‰ˆ1Tβˆ‘t(Ξ³i(t)βˆ’Ξ³Λ‰i)2\widehat{\operatorname{Var}}(\gamma_i) \approx \frac{1}{T}\sum_t (\gamma_i^{(t)} - \bar{\gamma}_i)^2.
  • Bootstrap: Resample data, re-solve, use ensemble spread as uncertainty proxy.

Posterior Credible Intervals β€” Bayesian vs Bootstrap

This plot compares uncertainty quantification methods for a 1D imaging inverse problem y=AΞ³+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}.

Top panel: True signal (black), posterior mean reconstruction (blue), and 95%95\% credible bands (shaded). Wide bands indicate poor observability of those pixels; narrow bands indicate high confidence.

Bottom panel: Posterior standard deviation map [Ξ“post]ii\sqrt{[\mathbf{\Gamma}_{\text{post}}]_{ii}}, showing how observability depends on position through the measurement operator. Compare Bayesian credible bands with bootstrap confidence bands β€” Bayesian UQ correctly reflects the spatial structure of AHA\mathbf{A}^H\mathbf{A}.

Parameters
0.1
1

MCMC Posterior Sampling for a 2D Inverse Problem

Visualize MCMC sampling on a 2D posterior arising from a simple imaging problem with two unknown pixels and three measurements.

Left panel: Posterior contours with MCMC sample trajectory overlaid. Random-walk MH shows diffusive, slow exploration; pCN shows more efficient traversal of the posterior.

Center panel: Trace plots of each coordinate showing mixing. Well-mixed chains explore the full support rapidly; slow chains exhibit long autocorrelations.

Right panel: Running posterior mean estimate with Β±2\pm 2 standard error bands, illustrating convergence speed for each algorithm.

Parameters
1000
0.2

Calibration β€” Are Credible Intervals Trustworthy?

A posterior is well-calibrated if its credible intervals have the correct frequentist coverage:

Py(Ξ³iβ€ βˆˆCΞ±(i))β‰ˆ1βˆ’Ξ±P_{\mathbf{y}}\bigl(\gamma_i^\dagger \in \mathcal{C}_\alpha^{(i)}\bigr) \approx 1 - \alpha

where the probability is over repeated data realizations. Calibration can be assessed by:

  1. Simulation studies: Generate many (γ†,y)(\boldsymbol{\gamma}^\dagger, \mathbf{y}) pairs, compute credible intervals, and check empirical coverage vs nominal level.
  2. Calibration plots: Plot observed coverage vs nominal level. A well-calibrated posterior lies on the diagonal.
  3. CRPS (Continuous Ranked Probability Score): A proper scoring rule that jointly evaluates sharpness and calibration.

Miscalibration arises from: misspecified noise models, incorrect priors, approximate inference (Laplace approximation underestimates uncertainty in multimodal posteriors), or model mismatch (e.g., using a Gaussian prior for a clearly sparse scene).

Common Mistake: Credible Intervals Are Not Confidence Intervals

Mistake:

A Bayesian 95%95\% credible interval [a,b][a, b] is interpreted as having a 95%95\% frequentist coverage probability β€” i.e., "in repeated experiments, the true value lies in this interval 95%95\% of the time."

Correction:

A credible interval [a,b][a, b] means: given the observed data y\mathbf{y} and the model, the posterior assigns 95%95\% probability to [a,b][a, b]. This is a conditional probability, conditioned on y\mathbf{y}. It coincides with frequentist coverage only when the prior is correct. A frequentist confidence interval [a(y),b(y)][a(\mathbf{y}), b(\mathbf{y})], by contrast, is a random interval with the property that Pγ†(Ξ³β€ βˆˆ[a,b])=0.95P_{\boldsymbol{\gamma}^\dagger}(\boldsymbol{\gamma}^\dagger \in [a,b]) = 0.95 for all γ†\boldsymbol{\gamma}^\dagger β€” a different statement. Both are valid uncertainty quantifiers, but they answer different questions.

⚠️Engineering Note

Practical UQ in Deployed RF Imaging Systems

In commercially deployed radar and SAR systems, full posterior UQ is rarely implemented due to computational cost. The standard practice is:

  1. Matched filter + empirical noise floor: Report reconstructed reflectivity with a detection threshold based on empirical clutter statistics. No formal UQ β€” binary detect/non-detect.
  2. Sparse recovery (LASSO/OMP) + posterior linearization: Run sparse recovery, then compute the Laplace approximation covariance on the estimated support. Fast but underestimates uncertainty.
  3. Full Bayesian (SBL or MCMC): Deployed in high-value applications (medical imaging, subsurface sensing, ISAR tracking) where decision quality justifies the 1010--100Γ—100\times computational overhead vs matched filter.

The trend toward GPU-accelerated MCMC and differentiable probabilistic programming (PyMC, NumPyro) is lowering this barrier. For real-time radar (>1>1 kHz update rate), variational Bayes and approximate MCMC remain the only feasible options.

Practical Constraints
  • β€’

    Real-time radar: <1< 1 ms per frame β€” precludes MCMC, requires LASSO or matched filter

  • β€’

    SAR post-processing: seconds to minutes per image β€” SBL feasible for n≀104n \leq 10^4

  • β€’

    ISAR target classification: minutes per target β€” full Bayesian with pCN viable

Key Takeaway

  1. Credible intervals extract pixel-wise uncertainty from the posterior: for Gaussian posteriors, Ξ³^ipostΒ±1.96[Ξ“post]ii\hat{\gamma}_i^{\text{post}} \pm 1.96\sqrt{[\mathbf{\Gamma}_{\text{post}}]_{ii}}.

  2. Posterior variance maps reveal which regions are well-constrained by data (near-zero variance) and which are dominated by the prior (high variance, null-space directions of A\mathbf{A}).

  3. The posterior contracts at the minimax-optimal rate O(Οƒ2Ξ²/(2Ξ²+1))O(\sigma^{2\beta/(2\beta+1)}) when the prior matches the regularity of the truth.

  4. The Laplace approximation provides fast Gaussian UQ at the MAP solution but underestimates uncertainty for non-Gaussian posteriors.

  5. pCN is the sampler of choice for Gaussian priors in high dimensions β€” dimension-independent acceptance rates via Cameron-Martin space proposals.

  6. Calibration is essential: always validate that reported credible intervals achieve their nominal coverage before trusting the UQ in production.

Quick Check

What property of the pCN proposal Ξ³β€²=1βˆ’Ξ²2Ξ³+Ξ²ΞΎ\boldsymbol{\gamma}' = \sqrt{1-\beta^2}\boldsymbol{\gamma} + \beta\boldsymbol{\xi}, ξ∼μ0\boldsymbol{\xi} \sim \mu_0, makes it dimension-independent?

It uses gradient information to make directed proposals

It preserves the prior measure ΞΌ0\mu_0, so the acceptance probability depends only on the likelihood ratio

It adapts the step size Ξ²\beta automatically to the local posterior curvature

It uses a Kronecker product structure to reduce per-step cost from O(n2)O(n^2) to O(n)O(n)

Why This Matters: Uncertainty Maps for ISAC System Design

In integrated sensing and communications (ISAC) systems, the posterior variance map [Ξ“post]ii[\mathbf{\Gamma}_{\text{post}}]_{ii} directly informs adaptive resource allocation: pixels with high uncertainty should receive more measurements (additional transmit beams, wider bandwidth), while confident pixels need no further sensing.

This posterior-variance-driven adaptive sensing is the Bayesian analogue of A-optimal experimental design (Eex-optimal-design) and connects to the capacity-distortion tradeoff in ISAC ([?ch34:s01]): reducing the posterior variance of the sensing channel corresponds to increasing the sensing mutual information term in the capacity-distortion region derived in Caire et al.

See full treatment in Chapter 34، Section 1