MAP and MMSE Estimators
From Posterior to Estimate
A posterior density is not yet a usable estimate — a transmitter cannot decode a density, it needs a number. Converting a posterior into a single estimate requires choosing a cost function. Two cost functions dominate practice: the zero–one cost (credit only for hitting the right value exactly) and the squared-error cost (credit decays quadratically with the miss). These two choices lead to the MAP and MMSE estimators respectively.
Definition: Maximum a Posteriori (MAP) Estimator
Maximum a Posteriori (MAP) Estimator
Given a Bayesian model with posterior , the maximum a posteriori estimator is where the second equality follows from Bayes' rule because the marginal does not depend on .
With a flat prior (or in the limit ), the MAP estimator reduces to the MLE: . MAP is the Bayesian estimator for the degenerate zero–one cost in the limit , so it is the "most likely parameter" answer but not the "best on average" answer.
Definition: Minimum Mean-Square Error (MMSE) Estimator
Minimum Mean-Square Error (MMSE) Estimator
The minimum mean-square error estimator is the function minimizing the Bayes risk under squared-error cost: The minimization ranges over all (measurable) functions of the observation. We write the optimal estimator as .
The squared-error cost penalizes large deviations disproportionately and enjoys clean algebraic properties (quadratic, convex, differentiable) that make closed-form analysis possible.
Theorem: MMSE Estimator Equals the Conditional Mean
For any jointly distributed with , the MMSE estimator is In particular, the MMSE estimator depends on the posterior only through its mean.
For each fixed value , the MMSE task reduces to choosing a constant to minimize . A standard calculation shows that the minimizer of over scalars is ; the vector case is identical component by component.
Use the tower property: .
For a fixed value of , the inner expectation is minimized pointwise.
Add and subtract inside the norm and expand.
Condition on the observation
By the tower property, The outer expectation is a positive-weighted average over , so minimizing the full expectation is equivalent to minimizing the inner conditional expectation for almost surely.
Reduce to a constant-minimization problem
Fix . The inner problem is where stands in for .
Complete the square in $\mathbf{a}$
Let . Adding and subtracting , The cross term vanishes because .
Identify the minimizer
The first term is independent of . The last term is nonnegative and equals zero iff . Hence .
Theorem: Value of the MMSE
With , the minimum achievable mean-square error equals the expected posterior variance:
Observing removes all variability of that is predictable from ; what remains is the residual variance inside each "slice" . Averaging these residual variances over the marginal distribution of gives the MMSE.
Apply the law of total variance
For each component , .
Sum over components
Summing gives .
Identify the residual term
The MMSE is exactly the first summand: the average over of the residual posterior variance.
Example: MMSE for the Scalar Gaussian Model
Let and with independent of . Compute , , and the MMSE.
Posterior from Example 1
From EGaussian Prior, Gaussian Likelihood with , the posterior is Gaussian with mean and variance .
MMSE estimator
The MMSE estimator is the posterior mean, , with . This is a shrinkage estimator: the observation is pulled toward the prior mean by the factor .
MAP estimator
The Gaussian posterior is symmetric and unimodal, so the MAP equals the posterior mean: as well. For Gaussian priors and Gaussian likelihoods, MAP = MMSE.
Achieved MMSE
The MMSE is the (constant) posterior variance, . As , and the MMSE tends to (noise-limited). As , and the MMSE tends to (prior-limited).
Example: Binary Signal in Gaussian Noise: MMSE is a Sigmoid
Let equiprobable and with . Compute the MMSE estimator .
Posterior probabilities
By Bayes' rule, .
Simplify via a hyperbolic identity
Dividing numerator and denominator by the numerator and expanding the difference of squares gives .
Conclude
Since , the conditional mean equals that difference: The MMSE estimator is nonlinear in and automatically saturates in , unlike any linear estimator.
MMSE vs. Linear Estimator for Binary
Compare the true MMSE estimator (sigmoid) with the best linear estimator for in Gaussian noise. The linear curve extrapolates outside ; the true MMSE saturates.
Parameters
MAP vs. MMSE Estimators
| Aspect | MAP | MMSE |
|---|---|---|
| Cost function | Zero–one (peak of posterior) | Squared error |
| Formula | ||
| Shape of output | A mode of the posterior | The mean of the posterior |
| Flat-prior limit | Reduces to MLE | Conditional mean under flat prior |
| Computation | Optimization problem | Integral |
| Unimodal symmetric posterior | Coincides with mean | Coincides with mode |
| Multi-modal posterior | Can jump discontinuously in | Smooth in (averaging) |
| Discrete | Natural (a single element) | Unnatural (returns non-integer mean) |
MAP as Regularized MLE
Taking the logarithm of the MAP objective, The first term is the log-likelihood from Chapter 6; the second is a regularizer that penalizes parameter values the prior considers unlikely. This is exactly the structure of ridge regression (Gaussian prior, quadratic penalty) and of the LASSO (Laplace prior, penalty), and it is why MAP is sometimes called "regularized ML".
Quick Check
The posterior is the uniform density on the interval (for some fixed ). Which estimator returns the value ?
MAP only
MMSE only
Both MAP and MMSE
Neither
The MMSE estimator is the posterior mean, which for a uniform density on is exactly . The MAP is any point in since the posterior is flat, so it is not uniquely defined.
Common Mistake: Don't Use MMSE on a Discrete Parameter
Mistake:
A student applies the MMSE formula to estimate a symbol and reports an answer like for a hard-decision receiver.
Correction:
The MMSE estimator minimizes squared error, not symbol error. For a discrete parameter taking finitely many values, the MMSE estimate generally lies outside the parameter set, which is meaningless for a decoder that must output a symbol. For discrete , use the MAP estimator (maximum posterior probability) or, equivalently under uniform priors, the MLE. The MMSE value of a discrete parameter is still useful as a soft information statistic (e.g., for belief propagation), but it is not itself a decoded symbol.