The Bayesian Framework for Inverse Problems

Why the Bayesian Framework?

Variational regularization (§Regularization: Concept and General Theory) selects a single point estimate by minimizing a penalized objective. While effective, this approach does not quantify uncertainty in the reconstruction. When an algorithm declares that the reflectivity at pixel ii is 0.70.7, how confident should we be? Is the true value almost certainly in [0.6,0.8][0.6, 0.8], or could it be anywhere in [0,1][0, 1]?

The Bayesian framework treats the unknown scene γ\boldsymbol{\gamma} as a random variable, encodes prior knowledge through π(γ)\pi(\boldsymbol{\gamma}), and produces the full posterior distribution p(γy)p(\boldsymbol{\gamma} \mid \mathbf{y}) — from which any point estimate, interval, or decision can be derived. Uncertainty quantification is not optional for safety-critical applications (medical imaging, autonomous driving, surveillance) where a confidently wrong answer is far more dangerous than an honest admission of uncertainty.

Definition:

Bayesian Inverse Problem

Given the linear forward model y=Aγ+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w} with additive noise w\mathbf{w}, the Bayesian inverse problem consists of three ingredients:

  1. Prior: A probability distribution π(γ)\pi(\boldsymbol{\gamma}) encoding knowledge about the scene before observing data — e.g., that it is sparse.

  2. Likelihood: The probability of observing y\mathbf{y} given γ\boldsymbol{\gamma}, determined by the noise model: p(yγ)=pw(yAγ).p(\mathbf{y} \mid \boldsymbol{\gamma}) = p_{\mathbf{w}}(\mathbf{y} - \mathbf{A}\boldsymbol{\gamma}).

  3. Posterior: The conditional distribution of γ\boldsymbol{\gamma} given y\mathbf{y}, obtained via Bayes' theorem.

The solution to the Bayesian inverse problem is the full posterior p(γy)p(\boldsymbol{\gamma} \mid \mathbf{y}), not a single point estimate.

Theorem: Bayes' Theorem for Inverse Problems

Under the forward model y=Aγ+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w} with prior π(γ)\pi(\boldsymbol{\gamma}) and likelihood p(yγ)p(\mathbf{y} \mid \boldsymbol{\gamma}), the posterior distribution is

p(γy)=p(yγ)π(γ)Z(y),p(\boldsymbol{\gamma} \mid \mathbf{y}) = \frac{p(\mathbf{y} \mid \boldsymbol{\gamma})\,\pi(\boldsymbol{\gamma})}{\mathcal{Z}(\mathbf{y})},

where the evidence (marginal likelihood) is

Z(y)=p(yγ)π(γ)dγ.\mathcal{Z}(\mathbf{y}) = \int p(\mathbf{y} \mid \boldsymbol{\gamma})\,\pi(\boldsymbol{\gamma})\, \mathrm{d}\boldsymbol{\gamma}.

The posterior is well-defined whenever 0<Z(y)<0 < \mathcal{Z}(\mathbf{y}) < \infty.

Definition:

Gaussian Likelihood for Additive Noise

When the noise is Gaussian, wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(0, \sigma^2 \mathbf{I}), the likelihood takes the form

p(yγ)=1(2πσ2)m/2exp ⁣(12σ2Aγy2),p(\mathbf{y} \mid \boldsymbol{\gamma}) = \frac{1}{(2\pi\sigma^2)^{m/2}} \exp\!\left(-\frac{1}{2\sigma^2} \|\mathbf{A}\boldsymbol{\gamma} - \mathbf{y}\|^2\right),

where mm is the number of measurements. The negative log-likelihood is proportional to the least-squares data fidelity:

logp(yγ)=12σ2Aγy2+const.-\log p(\mathbf{y} \mid \boldsymbol{\gamma}) = \frac{1}{2\sigma^2}\|\mathbf{A}\boldsymbol{\gamma} - \mathbf{y}\|^2 + \text{const}.

This connects the Bayesian likelihood directly to the data-fidelity term in variational regularization (§Regularization: Concept and General Theory).

Definition:

Maximum A Posteriori (MAP) Estimate

The MAP estimate is the mode of the posterior distribution:

γ^MAP=argmaxγ  p(γy)=argminγ  [logp(yγ)logπ(γ)].\hat{\boldsymbol{\gamma}}_{\text{MAP}} = \arg\max_{\boldsymbol{\gamma}}\; p(\boldsymbol{\gamma} \mid \mathbf{y}) = \arg\min_{\boldsymbol{\gamma}}\; \bigl[-\log p(\mathbf{y} \mid \boldsymbol{\gamma}) - \log \pi(\boldsymbol{\gamma})\bigr].

For Gaussian noise and a log-concave prior π(γ)exp(λR(γ))\pi(\boldsymbol{\gamma}) \propto \exp(-\lambda R(\boldsymbol{\gamma})), the MAP estimate solves

γ^MAP=argminγ{12σ2Aγy2+λR(γ)},\hat{\boldsymbol{\gamma}}_{\text{MAP}} = \arg\min_{\boldsymbol{\gamma}} \left\{ \frac{1}{2\sigma^2}\|\mathbf{A}\boldsymbol{\gamma} - \mathbf{y}\|^2 + \lambda\,R(\boldsymbol{\gamma})\right\},

which is precisely the variational regularization problem from §Regularization: Concept and General Theory. Variational regularization is MAP estimation under the Bayesian interpretation.

Definition:

Minimum Mean Squared Error (MMSE) Estimate

The MMSE estimate (posterior mean) minimises the expected squared error under the posterior:

γ^MMSE=E[γy]=γp(γy)dγ.\hat{\boldsymbol{\gamma}}_{\text{MMSE}} = \mathbb{E}[\boldsymbol{\gamma} \mid \mathbf{y}] = \int \boldsymbol{\gamma}\,p(\boldsymbol{\gamma} \mid \mathbf{y})\, \mathrm{d}\boldsymbol{\gamma}.

It satisfies:

γ^MMSE=argminγ^  E ⁣[γγ^2y].\hat{\boldsymbol{\gamma}}_{\text{MMSE}} = \arg\min_{\hat{\boldsymbol{\gamma}}} \; \mathbb{E}\!\left[\|\boldsymbol{\gamma} - \hat{\boldsymbol{\gamma}}\|^2 \mid \mathbf{y}\right].

For symmetric unimodal posteriors (e.g., Gaussian), MAP and MMSE coincide. For sparse, non-Gaussian posteriors they can differ substantially.

MAP vs MMSE — When Do They Differ?

The MAP and MMSE estimates coincide when the posterior is symmetric and unimodal (e.g., Gaussian). They diverge when:

  • The posterior is skewed — the mean is pulled away from the mode.
  • The posterior is multimodal — MAP selects one mode while MMSE averages over all modes, potentially landing between them.
  • The problem is high-dimensional — in Rn\mathbb{R}^n with large nn, the MAP estimate can lie in a region of low posterior probability mass (the "typical set" phenomenon from concentration of measure).

For RF imaging: MAP tends to produce sparser, edge-preserving images; MMSE produces smoother reconstructions but is more computationally demanding.

Posterior distribution

The conditional probability distribution p(γy)p(\boldsymbol{\gamma} \mid \mathbf{y}) of the unknown scene γ\boldsymbol{\gamma} given the measurements y\mathbf{y}. It combines the likelihood (data fit) and prior (regularization) through Bayes' theorem and encodes all information about the unknown after observing data.

Related: Prior Distribution, Gaussian Likelihood for Additive Noise, Maximum A Posteriori (MAP) Estimate

Evidence (marginal likelihood)

The normalizing constant Z(y)=p(yγ)π(γ)dγ\mathcal{Z}(\mathbf{y}) = \int p(\mathbf{y} \mid \boldsymbol{\gamma})\,\pi(\boldsymbol{\gamma})\,\mathrm{d}\boldsymbol{\gamma} of the posterior. It is the marginal probability of the data under the model and plays a central role in Bayesian model comparison and hyperparameter selection.

Related: Posterior distribution, Bayesian Model Comparison

Theorem: Gaussian Prior Yields Gaussian Posterior

Let y=Aγ+w\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w} with wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(0, \sigma^2 \mathbf{I}) and prior γN(γ0,Γ)\boldsymbol{\gamma} \sim \mathcal{N}(\boldsymbol{\gamma}_0, \mathbf{\Gamma}). Then the posterior is Gaussian:

p(γy)=N(γ^post,Γpost),p(\boldsymbol{\gamma} \mid \mathbf{y}) = \mathcal{N}(\hat{\boldsymbol{\gamma}}_{\text{post}},\,\mathbf{\Gamma}_{\text{post}}),

with posterior mean and covariance

γ^post=Γpost ⁣(1σ2AHy+Γ1γ0),\hat{\boldsymbol{\gamma}}_{\text{post}} = \mathbf{\Gamma}_{\text{post}} \!\left(\frac{1}{\sigma^2}\mathbf{A}^H \mathbf{y} + \mathbf{\Gamma}^{-1}\boldsymbol{\gamma}_0\right),

Γpost=(1σ2AHA+Γ1)1.\mathbf{\Gamma}_{\text{post}} = \left(\frac{1}{\sigma^2}\mathbf{A}^H\mathbf{A} + \mathbf{\Gamma}^{-1}\right)^{-1}.

Moreover, MAP == MMSE =γ^post= \hat{\boldsymbol{\gamma}}_{\text{post}} since the Gaussian posterior is symmetric and unimodal.

Tikhonov Regularization Is Gaussian MAP

Setting γ0=0\boldsymbol{\gamma}_0 = \mathbf{0} and Γ=σ2λI\mathbf{\Gamma} = \frac{\sigma^2}{\lambda} \mathbf{I}, the MAP/MMSE estimate becomes

γ^MAP=(AHA+λI)1AHy,\hat{\boldsymbol{\gamma}}_{\text{MAP}} = (\mathbf{A}^H\mathbf{A} + \lambda \mathbf{I})^{-1}\mathbf{A}^H \mathbf{y},

which is the Tikhonov regularized solution (§Regularization: Concept and General Theory). The regularization parameter λ=σ2/γ2\lambda = \sigma^2/\gamma^2 (where Γ=γ2I\mathbf{\Gamma} = \gamma^2 \mathbf{I}) has a clear probabilistic interpretation as the noise-to-signal variance ratio. This provides a principled way to set λ\lambda: if you know σ2\sigma^2 and have a prior estimate of γ2\gamma^2, no cross-validation is needed.

Common Mistake: The Prior Is Not "Objective"

Mistake:

A common misconception is that placing a Gaussian prior and computing the MAP estimate is somehow more objective than choosing a regularization parameter λ\lambda by hand — the math is more elegant, so the result must be less arbitrary.

Correction:

The prior π(γ)\pi(\boldsymbol{\gamma}) encodes subjective beliefs about the scene before observing data. A Gaussian prior with Γ=γ2I\mathbf{\Gamma} = \gamma^2 \mathbf{I} still requires choosing γ2\gamma^2, and the Tikhonov MAP just shifts that choice from "λ\lambda" to "γ2=σ2/λ\gamma^2 = \sigma^2/\lambda." The Bayesian framework makes the choice explicit and allows it to be informed by domain knowledge (e.g., known typical reflectivity ranges, spatial correlation length from physics), but it does not eliminate the need for a choice.

1D Posterior for Gaussian Likelihood and Various Priors

This plot illustrates Bayesian inference for the scalar model y=aγ+wy = a\gamma + w with wN(0,σ2)w \sim \mathcal{N}(0, \sigma^2). Adjust the prior type and parameters to see how the posterior changes.

Left panel: Prior π(γ)\pi(\gamma) (blue), likelihood p(yγ)p(y \mid \gamma) (green), and unnormalized posterior p(yγ)π(γ)\propto p(y \mid \gamma)\pi(\gamma) (red). Watch how the posterior balances data evidence against prior belief.

Right panel: MAP estimate (red dot) and MMSE estimate (orange dot) with 95%95\% credible interval shaded. For Gaussian priors the two coincide; for Laplace priors they diverge — MAP produces exact zeros, MMSE does not.

Parameters
1
0.5
1.5

Key Takeaway

  1. The Bayesian inverse problem treats the unknown as a random variable and seeks the full posterior p(γy)p(\boldsymbol{\gamma} \mid \mathbf{y}), not just a point estimate.

  2. Bayes' theorem combines the likelihood p(yγ)p(\mathbf{y} \mid \boldsymbol{\gamma}) and prior π(γ)\pi(\boldsymbol{\gamma}) to yield the posterior, up to the normalizing constant Z(y)\mathcal{Z}(\mathbf{y}).

  3. For Gaussian noise, the negative log-likelihood equals the least-squares fidelity 12σ2Aγy2\frac{1}{2\sigma^2}\|\mathbf{A}\boldsymbol{\gamma} - \mathbf{y}\|^2.

  4. The MAP estimate equals variational regularization: logπ(γ)-\log\pi(\boldsymbol{\gamma}) plays the role of the penalty λR(γ)\lambda R(\boldsymbol{\gamma}).

  5. The MMSE estimate (posterior mean) minimises expected squared error and accounts for the full posterior shape — not just its mode.

Quick Check

For a Gaussian prior γN(0,γ2I)\boldsymbol{\gamma} \sim \mathcal{N}(\mathbf{0}, \gamma^2 \mathbf{I}) and Gaussian noise wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}), the MAP and MMSE estimates are:

Equal to each other and equal to the Tikhonov solution with λ=σ2/γ2\lambda = \sigma^2/\gamma^2

Equal to each other but NOT equal to the Tikhonov solution

Different from each other because the Gaussian posterior is multimodal

Only defined when σ2<γ2\sigma^2 < \gamma^2

Historical Note: Thomas Bayes and Inverse Reasoning

1763-2010

Thomas Bayes (1702-1761) never published his famous theorem himself. His essay "An Essay towards Solving a Problem in the Doctrine of Chances" was communicated to the Royal Society posthumously in 1763 by his friend Richard Price. The theorem was largely dormant until Pierre-Simon Laplace independently developed it in the 1780s and applied it broadly to problems of parameter estimation — making him the true founding figure of what we now call Bayesian inference.

The application of Bayes' theorem to inverse problems in physics and imaging developed slowly over the 20th century. Harold Jeffreys's 1939 monograph "Theory of Probability" applied Bayesian methods to geophysical inverse problems. The modern statistical framework for Bayesian inverse problems was synthesized by Kaipio and Somersalo in their 2005 monograph, and the rigorous infinite-dimensional theory was established by Stuart in 2010 — providing the foundation for this chapter.

Why This Matters: From Posterior to Target Detection

In RF imaging, the posterior p(γy)p(\boldsymbol{\gamma} \mid \mathbf{y}) directly enables Bayesian detection: declare target present at pixel ii if P(γi0y)>τP(\gamma_i \neq 0 \mid \mathbf{y}) > \tau for some threshold τ\tau. This is precisely the Neyman-Pearson detector applied to the posterior probability rather than the raw likelihood ratio — the Bayesian analogue of CFAR detection (§Computational Complexity and Kronecker Exploitation). For a Bernoulli-Gaussian prior (§Sparsity-Promoting Priors), this posterior probability has a closed form and provides the optimal detector under the assumed model.

See full treatment in Computational Complexity and Kronecker Exploitation

⚠️Engineering Note

Misspecified Noise Models in Practice

The Gaussian noise assumption wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(0, \sigma^2 \mathbf{I}) underlies the least-squares likelihood. In real RF systems, noise is approximately Gaussian by the central limit theorem (many independent thermal noise sources), but:

  • Clutter (reflections from non-target objects) is non-Gaussian and can dominate. It requires a heavy-tailed likelihood (e.g., Student-t) or an explicit clutter model.
  • Phase errors from imperfect hardware synchronization add structured noise that is not captured by the i.i.d. Gaussian model.
  • Quantization noise from finite ADC resolution (8-12 bits in typical radar) introduces a bounded, nearly uniform component at low SNR.

A misspecified noise model leads to overconfident or miscalibrated uncertainty estimates. Robustness checks (e.g., computing the posterior under different noise models and comparing) are standard practice before deploying Bayesian UQ in safety-critical systems.