The Bayesian Framework for Inverse Problems
Why the Bayesian Framework?
Variational regularization (§Regularization: Concept and General Theory) selects a single point estimate by minimizing a penalized objective. While effective, this approach does not quantify uncertainty in the reconstruction. When an algorithm declares that the reflectivity at pixel is , how confident should we be? Is the true value almost certainly in , or could it be anywhere in ?
The Bayesian framework treats the unknown scene as a random variable, encodes prior knowledge through , and produces the full posterior distribution — from which any point estimate, interval, or decision can be derived. Uncertainty quantification is not optional for safety-critical applications (medical imaging, autonomous driving, surveillance) where a confidently wrong answer is far more dangerous than an honest admission of uncertainty.
Definition: Bayesian Inverse Problem
Bayesian Inverse Problem
Given the linear forward model with additive noise , the Bayesian inverse problem consists of three ingredients:
-
Prior: A probability distribution encoding knowledge about the scene before observing data — e.g., that it is sparse.
-
Likelihood: The probability of observing given , determined by the noise model:
-
Posterior: The conditional distribution of given , obtained via Bayes' theorem.
The solution to the Bayesian inverse problem is the full posterior , not a single point estimate.
Theorem: Bayes' Theorem for Inverse Problems
Under the forward model with prior and likelihood , the posterior distribution is
where the evidence (marginal likelihood) is
The posterior is well-defined whenever .
Direct application of conditional probability
By the definition of conditional probability,
The denominator is a normalizing constant ensuring .
Definition: Gaussian Likelihood for Additive Noise
Gaussian Likelihood for Additive Noise
When the noise is Gaussian, , the likelihood takes the form
where is the number of measurements. The negative log-likelihood is proportional to the least-squares data fidelity:
This connects the Bayesian likelihood directly to the data-fidelity term in variational regularization (§Regularization: Concept and General Theory).
Definition: Maximum A Posteriori (MAP) Estimate
Maximum A Posteriori (MAP) Estimate
The MAP estimate is the mode of the posterior distribution:
For Gaussian noise and a log-concave prior , the MAP estimate solves
which is precisely the variational regularization problem from §Regularization: Concept and General Theory. Variational regularization is MAP estimation under the Bayesian interpretation.
Definition: Minimum Mean Squared Error (MMSE) Estimate
Minimum Mean Squared Error (MMSE) Estimate
The MMSE estimate (posterior mean) minimises the expected squared error under the posterior:
It satisfies:
For symmetric unimodal posteriors (e.g., Gaussian), MAP and MMSE coincide. For sparse, non-Gaussian posteriors they can differ substantially.
MAP vs MMSE — When Do They Differ?
The MAP and MMSE estimates coincide when the posterior is symmetric and unimodal (e.g., Gaussian). They diverge when:
- The posterior is skewed — the mean is pulled away from the mode.
- The posterior is multimodal — MAP selects one mode while MMSE averages over all modes, potentially landing between them.
- The problem is high-dimensional — in with large , the MAP estimate can lie in a region of low posterior probability mass (the "typical set" phenomenon from concentration of measure).
For RF imaging: MAP tends to produce sparser, edge-preserving images; MMSE produces smoother reconstructions but is more computationally demanding.
Posterior distribution
The conditional probability distribution of the unknown scene given the measurements . It combines the likelihood (data fit) and prior (regularization) through Bayes' theorem and encodes all information about the unknown after observing data.
Related: Prior Distribution, Gaussian Likelihood for Additive Noise, Maximum A Posteriori (MAP) Estimate
Evidence (marginal likelihood)
The normalizing constant of the posterior. It is the marginal probability of the data under the model and plays a central role in Bayesian model comparison and hyperparameter selection.
Theorem: Gaussian Prior Yields Gaussian Posterior
Let with and prior . Then the posterior is Gaussian:
with posterior mean and covariance
Moreover, MAP MMSE since the Gaussian posterior is symmetric and unimodal.
Completing the square in the exponent
The log-posterior (up to a constant) is
where . Expanding and collecting terms quadratic and linear in :
This is the exponent of with the stated parameters.
Tikhonov Regularization Is Gaussian MAP
Setting and , the MAP/MMSE estimate becomes
which is the Tikhonov regularized solution (§Regularization: Concept and General Theory). The regularization parameter (where ) has a clear probabilistic interpretation as the noise-to-signal variance ratio. This provides a principled way to set : if you know and have a prior estimate of , no cross-validation is needed.
Common Mistake: The Prior Is Not "Objective"
Mistake:
A common misconception is that placing a Gaussian prior and computing the MAP estimate is somehow more objective than choosing a regularization parameter by hand — the math is more elegant, so the result must be less arbitrary.
Correction:
The prior encodes subjective beliefs about the scene before observing data. A Gaussian prior with still requires choosing , and the Tikhonov MAP just shifts that choice from "" to "." The Bayesian framework makes the choice explicit and allows it to be informed by domain knowledge (e.g., known typical reflectivity ranges, spatial correlation length from physics), but it does not eliminate the need for a choice.
1D Posterior for Gaussian Likelihood and Various Priors
This plot illustrates Bayesian inference for the scalar model with . Adjust the prior type and parameters to see how the posterior changes.
Left panel: Prior (blue), likelihood (green), and unnormalized posterior (red). Watch how the posterior balances data evidence against prior belief.
Right panel: MAP estimate (red dot) and MMSE estimate (orange dot) with credible interval shaded. For Gaussian priors the two coincide; for Laplace priors they diverge — MAP produces exact zeros, MMSE does not.
Parameters
Key Takeaway
-
The Bayesian inverse problem treats the unknown as a random variable and seeks the full posterior , not just a point estimate.
-
Bayes' theorem combines the likelihood and prior to yield the posterior, up to the normalizing constant .
-
For Gaussian noise, the negative log-likelihood equals the least-squares fidelity .
-
The MAP estimate equals variational regularization: plays the role of the penalty .
-
The MMSE estimate (posterior mean) minimises expected squared error and accounts for the full posterior shape — not just its mode.
Quick Check
For a Gaussian prior and Gaussian noise , the MAP and MMSE estimates are:
Equal to each other and equal to the Tikhonov solution with
Equal to each other but NOT equal to the Tikhonov solution
Different from each other because the Gaussian posterior is multimodal
Only defined when
Correct. The Gaussian posterior is symmetric and unimodal, so MAP = MMSE. Setting and gives exactly the Tikhonov solution with .
Historical Note: Thomas Bayes and Inverse Reasoning
1763-2010Thomas Bayes (1702-1761) never published his famous theorem himself. His essay "An Essay towards Solving a Problem in the Doctrine of Chances" was communicated to the Royal Society posthumously in 1763 by his friend Richard Price. The theorem was largely dormant until Pierre-Simon Laplace independently developed it in the 1780s and applied it broadly to problems of parameter estimation — making him the true founding figure of what we now call Bayesian inference.
The application of Bayes' theorem to inverse problems in physics and imaging developed slowly over the 20th century. Harold Jeffreys's 1939 monograph "Theory of Probability" applied Bayesian methods to geophysical inverse problems. The modern statistical framework for Bayesian inverse problems was synthesized by Kaipio and Somersalo in their 2005 monograph, and the rigorous infinite-dimensional theory was established by Stuart in 2010 — providing the foundation for this chapter.
Why This Matters: From Posterior to Target Detection
In RF imaging, the posterior directly enables Bayesian detection: declare target present at pixel if for some threshold . This is precisely the Neyman-Pearson detector applied to the posterior probability rather than the raw likelihood ratio — the Bayesian analogue of CFAR detection (§Computational Complexity and Kronecker Exploitation). For a Bernoulli-Gaussian prior (§Sparsity-Promoting Priors), this posterior probability has a closed form and provides the optimal detector under the assumed model.
See full treatment in Computational Complexity and Kronecker Exploitation
Misspecified Noise Models in Practice
The Gaussian noise assumption underlies the least-squares likelihood. In real RF systems, noise is approximately Gaussian by the central limit theorem (many independent thermal noise sources), but:
- Clutter (reflections from non-target objects) is non-Gaussian and can dominate. It requires a heavy-tailed likelihood (e.g., Student-t) or an explicit clutter model.
- Phase errors from imperfect hardware synchronization add structured noise that is not captured by the i.i.d. Gaussian model.
- Quantization noise from finite ADC resolution (8-12 bits in typical radar) introduces a bounded, nearly uniform component at low SNR.
A misspecified noise model leads to overconfident or miscalibrated uncertainty estimates. Robustness checks (e.g., computing the posterior under different noise models and comparing) are standard practice before deploying Bayesian UQ in safety-critical systems.