The Bayesian Framework
Why Go Bayesian?
In Chapter 6 we treated the parameter as a deterministic but unknown constant and produced the maximum likelihood estimator. That worldview is powerful when we have no prior knowledge, but it is also wasteful when we do. In a communication system we know, for instance, that a Rayleigh fading coefficient is circularly symmetric complex Gaussian with known variance — ignoring that information and running the MLE throws away statistical structure that the receiver could exploit.
The Bayesian framework treats itself as a random variable with a prior distribution reflecting what we know before seeing the data. The observation then updates the prior into the posterior distribution, which encodes everything the data tell us about . Every Bayesian estimator — MAP, MMSE, or any other — is nothing more than a summary statistic of the posterior chosen to minimize a specific cost.
Definition: Bayesian Estimation Model
Bayesian Estimation Model
A Bayesian estimation problem is specified by a joint distribution on the parameter–observation pair , where is the quantity of interest and is the observation. The joint distribution is determined by:
- a prior density on ,
- a likelihood , the conditional density of the observation given .
An estimator is a measurable function . A Bayes risk associates a cost to each outcome, and the Bayes-optimal estimator minimizes the expected cost
Both and are random here. Expectations are taken with respect to the joint distribution .
Theorem: Bayes' Rule for the Posterior
For any Bayesian model with strictly positive marginal , the posterior density is The denominator is independent of , so up to this normalizing constant,
The posterior is the prior reweighted by how well each value of explains the observed data. High-likelihood values of that were already likely a priori dominate; values that disagree with either the prior or the data are suppressed.
Definition of conditional density
By definition, whenever .
Factor the joint density
The joint density factors as . Substituting gives the stated formula.
Normalization
Integrating numerator and denominator over shows , so the posterior is a proper density.
Example: Gaussian Prior, Gaussian Likelihood
Let (prior) and let the observation be with independent of . Compute the posterior density of given .
Write out the likelihood and prior
The likelihood is . The prior is .
Multiply and collect quadratic terms
The posterior is proportional to . Expanding the exponent and completing the square in produces a Gaussian in with mean and variance
Interpret
The posterior mean is a convex combination of the prior mean and the observation , weighted by the precisions and . The posterior variance is the harmonic sum of prior and noise variances, i.e. : precisions add.
Prior, Likelihood, and Posterior
Change the prior mean , prior variance , noise variance , and observation , and watch how the posterior interpolates between the prior and the likelihood.
Parameters
Key Takeaway
Every Bayesian estimator depends on the data only through the posterior density . Once the posterior is known, the choice of estimator is reduced to choosing which summary statistic of the posterior best serves the application.
Prior distribution
The distribution assigned to the parameter before any observation is made. The prior encodes side information (physical constraints, statistical models of the environment, previous measurements).
Related: Posterior distribution, Likelihood and Log-Likelihood
Posterior distribution
The conditional distribution of given the observation . It summarizes everything the data, combined with the prior, tell us about the parameter.
Common Mistake: Flat Priors Are Not Innocent
Mistake:
A newcomer argues that choosing a "flat" prior on an unbounded parameter space is the neutral, assumption-free choice — after all, it seems to treat every value of equally.
Correction:
A uniform density on an unbounded set is improper (it cannot be normalized to integrate to one), so it is not a valid prior in the strict probabilistic sense. Improper priors sometimes yield valid posteriors — but not always, and whether they do must be checked. Moreover, a flat prior in one parameterization becomes non-flat after a nonlinear change of variables: "uniform on " and "uniform on " are not the same prior. When the likelihood is strong, the prior matters little and flat priors are harmless; when the likelihood is weak, the implicit parameterization choice can dominate the inference.
Historical Note: Bayes, Price, and Laplace
1763–1812The posterior formula is named for Thomas Bayes, whose 1763 essay An Essay towards solving a Problem in the Doctrine of Chances was published posthumously by his friend Richard Price. Bayes worked out the special case of a uniform prior on the parameter of a Bernoulli distribution. The general form of the rule, and the full machinery of inverse probability, is due to Pierre-Simon Laplace, who rediscovered it independently in 1774 and developed it into a working inferential calculus over the following four decades. The modern name "Bayesian" was coined only in the 1950s.