Estimation Theory Fundamentals
From Detection to Estimation
Detection chooses among a discrete set of hypotheses; estimation determines a continuous parameter from noisy observations. In communications, the parameter to estimate is often the channel itself: its gain, phase, delay, or frequency response. This section develops the mathematical framework for optimal estimation, starting with the fundamental limit (Cramer-Rao bound) and the two main paradigms: frequentist (ML) and Bayesian (MMSE/LMMSE).
Definition: Estimator, Bias, and Efficiency
Estimator, Bias, and Efficiency
An estimator is a function of the observed data that produces an estimate of an unknown parameter .
-
Bias: . An estimator is unbiased if for all .
-
Mean-square error: .
-
Consistency: as the number of observations .
-
Efficiency: An unbiased estimator is efficient if it achieves the Cramer-Rao lower bound (CRLB) with equality for all .
The MSE decomposition is fundamental: sometimes a biased estimator with lower variance can achieve lower MSE than the best unbiased estimator.
Theorem: Cramer-Rao Lower Bound (CRLB)
For any unbiased estimator of a scalar parameter , the variance is lower bounded by
where is the Fisher information:
For a vector parameter , the CRLB generalises to the matrix inequality:
where is the Fisher information matrix (FIM).
The Fisher information measures how "peaky" the likelihood function is around : high curvature means the data are informative about , so the estimation variance can be small. Low curvature means the data weakly constrain , and the variance must be large.
Score function
Define the score . Under regularity conditions, .
Covariance inequality
For an unbiased estimator, , so , giving
By the Cauchy-Schwarz inequality:
Since : .
Definition: Fisher Information
Fisher Information
The Fisher information about a parameter contained in an observation is
Key properties:
- Additivity: for i.i.d. observations,
- For with :
- The CRLB for observations becomes
The Fisher information determines the fundamental precision achievable for a given measurement model and noise level.
Definition: Maximum Likelihood (ML) Estimator
Maximum Likelihood (ML) Estimator
The ML estimator maximises the likelihood of the observed data:
Properties of the ML estimator:
- Consistent: as
- Asymptotically efficient: achieves the CRLB as
- Asymptotically Gaussian: for large
- Invariant: if is the ML estimate of , then is the ML estimate of
The ML estimator does not require prior knowledge of (frequentist viewpoint) and is often computationally tractable via gradient methods.
For finite , the ML estimator may be biased (e.g., the ML estimate of variance uses instead of ), but the bias vanishes as .
Definition: MMSE Estimator (Bayesian)
MMSE Estimator (Bayesian)
The minimum mean-square error (MMSE) estimator minimises the Bayesian MSE where the expectation is over both and :
The MMSE estimator is the conditional mean of given the observations. It requires a prior distribution .
The MMSE is
the expected posterior variance.
When and are jointly Gaussian, the conditional mean is a linear function of , and the MMSE estimator coincides with the LMMSE estimator.
The MMSE estimator is optimal in the MSE sense among all estimators (linear and nonlinear). The cost is that it requires knowledge of the prior and computation of the posterior , which may be intractable for complex models.
Theorem: LMMSE Estimator for Jointly Gaussian Case
For the linear observation model
where and are independent, the LMMSE estimator is
The MSE matrix is
For the jointly Gaussian case, this equals the MMSE estimator.
Scalar case: with and :
The LMMSE estimator is a regularised version of the LS estimator. At high SNR (), it reduces to LS. At low SNR (), it shrinks toward the prior mean , relying more on prior knowledge than on the noisy data.
Orthogonality principle
The LMMSE estimator satisfies the orthogonality condition:
This means the estimation error is orthogonal to the observation.
Derivation
Writing and applying the orthogonality principle:
and .
Example: ML Estimation of Signal Amplitude in AWGN
A constant signal is observed times in AWGN:
(a) Find the ML estimate of .
(b) Is it unbiased? Compute its variance.
(c) Does it achieve the CRLB?
ML estimate
The log-likelihood is
Setting :
The ML estimate is the sample mean.
Bias and variance
: unbiased.
.
CRLB comparison
Fisher information: .
CRLB: .
Since , the ML estimator achieves the CRLB exactly. It is efficient.
Example: LMMSE Channel Estimation
A single-tap channel is estimated from pilot observations:
where are known pilots with and .
(a) Find the LS and LMMSE estimates of .
(b) Compare their MSE.
LS estimate
E[|\hat{h}_{\text{LS}} - h|^2] = \sigma_w^2 / (N E_p)$.
LMMSE estimate
\alpha = \sigma_h^2 / (\sigma_h^2 + \sigma_w^2/(N E_p))$ approaches 1 at high SNR and 0 at low SNR.
MSE comparison
= N E_p \sigma_h^2 / \sigma_w^2 = 10_{\text{LS}} = \sigma_w^2/(N E_p)_{\text{LMMSE}} = \sigma_h^2/11 \approx 0.091, \sigma_h^2_{\text{LMMSE}} \leq_{\text{LS}}\blacksquare$
MMSE vs LS Estimation
Compare the LS and MMSE estimators for a frequency-selective channel. The LS estimator is unbiased but noisy; the MMSE estimator uses channel correlation to smooth the estimate. Observe how the MSE gap between LS and MMSE increases at low SNR, where prior knowledge has the greatest value.
Parameters
Quick Check
The Fisher information for estimating a channel gain from pilot observations at SNR is . What happens to the CRLB as the number of pilots doubles?
The CRLB halves (estimation variance floor decreases by 3 dB)
The CRLB doubles
The CRLB remains unchanged
The CRLB decreases to zero
CRLB . Doubling halves the CRLB. In dB: the MSE floor decreases by dB. This is the familiar " improvement" in the standard deviation.
Common Mistake: Biased Estimators Can Have Lower MSE Than Unbiased Ones
Mistake:
Always preferring unbiased estimators because "bias is bad."
Correction:
The MSE decomposes as MSE variance bias. A biased estimator with significantly lower variance can achieve lower MSE than the minimum-variance unbiased estimator (MVUE).
Example: the LMMSE estimator of a zero-mean channel gain is biased (it shrinks toward zero), yet it has lower MSE than the unbiased LS estimator at every SNR.
This is the essence of the bias-variance trade-off: accepting some bias can dramatically reduce variance, especially when data are limited or noisy. The MMSE criterion explicitly optimises the total MSE, not just the variance.
Bayesian vs Frequentist Estimation
Frequentist (classical) estimation treats as a fixed but unknown constant. The CRLB and ML estimator belong to this paradigm. Performance is measured by worst-case or average behaviour over the data distribution .
Bayesian estimation treats as a random variable with a known prior . The MMSE and MAP estimators belong to this paradigm. Performance is measured by averaging over both and .
In wireless communications, the Bayesian viewpoint is natural: the channel is indeed random (due to fading), and its statistics (delay spread, Doppler, correlation) are often known from measurements or standards. The LMMSE channel estimator is the most prominent example of Bayesian estimation in practice.
Why This Matters: LMMSE Channel Estimation in OFDM
In OFDM systems (4G LTE, 5G NR, Wi-Fi), the channel is estimated at pilot subcarrier locations and then interpolated to data subcarriers. The LS estimator at pilot positions is
where is the known pilot symbol. The LMMSE estimator exploits the frequency correlation of the channel:
where is the channel frequency correlation matrix, determined by the power delay profile. The MSE gain of LMMSE over LS is typically 3-5 dB in practical scenarios, translating directly to improved detection performance.
Why This Matters: Full Estimation Theory in the FSI Book
This section provides a condensed treatment of estimation theory sufficient for channel estimation and detection. For the complete theory — including sufficient statistics, Rao-Blackwell theorem, exponential families, MMSE with non-Gaussian priors, and the connection to Wiener and Kalman filtering — see the FSI book (Fundamentals of Statistical Inference), which is based on Caire's FSI course at TU Berlin.
Key extensions in the FSI book:
- Ch 2: MMSE estimation with general priors, Wiener filter
- Ch 3: MLE for complex models, EM algorithm, sufficient statistics
- Chs 8-10: Compressed sensing and sparse estimation
- Chs 11-13: Factor graphs, belief propagation, AMP/OAMP
Cramer-Rao Lower Bound (CRLB)
A lower bound on the variance of any unbiased estimator: . It is the fundamental limit on estimation precision for a given statistical model.
Related: Fisher Information, Maximum Likelihood (ML) Estimator, Estimator, Bias, and Efficiency
Fisher Information
A measure of the information that an observation carries about an unknown parameter. Defined as the expected curvature of the log-likelihood: . Higher Fisher information means more precise estimation is possible.
Related: Cramer-Rao Lower Bound (CRLB), Maximum Likelihood (ML) Estimator, Score Function
LMMSE Estimator
The linear minimum mean-square error estimator: the best estimator of the form that minimises . For jointly Gaussian variables, the LMMSE coincides with the (nonlinear) MMSE estimator.
Related: MMSE Estimator (Bayesian), Bayesian Estimation, Wiener Filter