The Estimation Problem
Why Estimation Theory?
Before we can equalize a channel, synchronize a receiver, localize a user, or form a sensing beam, we have to estimate something from noisy data. Every receiver in every wireless system is an estimator in disguise. The question this chapter answers is: given a set of observations drawn from a family of distributions , what is the best we can hope to learn about , and which estimator realizes that limit?
Two separate worlds live inside this question. In the Bayesian world the parameter is a random variable with a prior and the natural loss is a Bayes risk. In the frequentist world the parameter is a fixed but unknown deterministic quantity, and we quantify performance through the sampling distribution of the estimator. This chapter develops the frequentist framework: bias, variance, MSE, sufficiency, the Cramer--Rao bound, and the Rao--Blackwell procedure. The Bayesian picture is the subject of Chapter 7.
Definition: The Estimation Problem
The Estimation Problem
Let be an observation taking values in , distributed according to one member of a parametric family , with parameter domain . An estimator is any measurable function In the frequentist (non-Bayesian) setting is a fixed unknown; we write for expectation when and design so that the random variable is concentrated near for every .
The parameter does not have a distribution. Probabilities and expectations are taken over the observation noise only, with held fixed. This is what distinguishes frequentist from Bayesian inference.
Definition: Bias, Variance, Mean-Squared Error
Bias, Variance, Mean-Squared Error
For a scalar parameter and estimator , define The estimator is unbiased if for every . For vector parameters, replace with the covariance matrix and the scalar MSE by the MSE matrix .
Theorem: MSE Decomposition: Bias-Variance Identity
For any estimator of a scalar parameter , For a vector estimator ,
Error at a realization has two sources. One is the systematic offset of the estimator's mean from the true value (bias). The other is the stochastic fluctuation of the estimator around its own mean (variance). The two add in quadrature. This is why an unbiased estimator need not be the best: a little bias may be worth a lot less variance. We return to this tension throughout the book.
Add and subtract the mean
Let and . Write
Square and take expectation
Squaring and taking , the cross term vanishes because . What remains is .
Key Takeaway
Two knobs, one budget. . Tuning an estimator is a trade between these two terms, not a pursuit of either in isolation. Shrinkage, regularization, and ridge-type estimators all exploit this identity.
Example: Sample Mean and Sample Variance of a Gaussian
Let be i.i.d. with both parameters unknown. Consider the sample mean and the (biased) sample variance . Compute the bias and variance of each.
Sample mean: unbiased, variance $\sigma^2/n$
By linearity , so is unbiased for . Independence of the summands gives . Hence .
Sample variance: biased by $-\sigma^2/n$
A standard computation (expand , sum over , and take expectation) yields , so the bias is . The unbiased version is . Its variance is .
Why $S_n^2$ sometimes beats $S_{n-1}^2$ on MSE
Rescaling by introduces a small bias but shrinks variance by a factor . The MSE of equals , which is strictly smaller than for every finite . This is a first concrete instance of the shrinkage principle.
Historical Note: Fisher and the Birth of Estimation Theory (1922--1925)
1922--1925The modern framework of parametric estimation --- likelihood, score, information, sufficiency, efficiency --- was laid down in two remarkable papers by R. A. Fisher: On the Mathematical Foundations of Theoretical Statistics (1922) and Theory of Statistical Estimation (1925). Fisher invented the words likelihood, sufficiency, consistency, and efficiency, and proposed maximum likelihood as the canonical method. His notion of "information in a sample" is exactly what we now call Fisher information. The word parameter, in its statistical sense, is also Fisher's. An unusually high density of permanent ideas per paper.
Bias--Variance Trade-off: James--Stein-type Shrinkage
We estimate from and compare the unbiased mean with the shrinkage estimator for . Move the shrinkage slider to see MSE, bias, and variance trade against each other.
Parameters
Unbiased Estimator
An estimator is unbiased for if for every .
Related: Bias, Variance, Mean-Squared Error, A Procedure for Building the MVUE
Minimum-Variance Unbiased Estimator (MVUE)
An unbiased estimator whose variance is no larger than that of any other unbiased estimator, simultaneously for every . The MVUE is unique (almost surely) when it exists, and Lehmann--Scheffe gives a constructive route to it through complete sufficient statistics.
Related: Unbiased Estimator, Sufficient Statistic, Rao--Blackwell Theorem
Common Mistake: Unbiased Does Not Mean Best
Mistake:
It is tempting to restrict attention to unbiased estimators and then pick the one with smallest variance, treating the resulting MVUE as "the" optimal estimator.
Correction:
Unbiasedness is a constraint we impose for analytical tractability and interpretability --- not a criterion that the physical problem demands. Biased estimators routinely beat the MVUE in MSE (ridge regression, shrinkage, maximum-a-posteriori estimators, empirical Bayes). Use the MVUE as a benchmark, not a mandate.
Why This Matters: Every Receiver is an Estimator
Channel estimation, timing acquisition, carrier-frequency offset correction, angle-of-arrival estimation --- each problem presents a receiver with noisy observations and asks for an unknown parameter (a complex gain, a delay, a frequency, an angle). Matched filters, correlators, and FFT bins are not ad-hoc circuits: they are the maximum-likelihood or MVU estimators for specific parametric models. The bias, variance, and Cramer--Rao bounds we compute in this chapter are the hardware specifications of these blocks.
Quick Check
An estimator is unbiased with variance . Another estimator has bias and variance . Which has smaller MSE?
(lower variance)
, because
They have the same MSE
Cannot decide without knowing
, . A small bias can buy a large variance reduction.