The Cramer--Rao Lower Bound
How Good Can an Estimator Be?
Given a bias specification (say, unbiased) we ask the engineering question: how small can the variance of any such estimator be, for a given sample size and noise level? The answer is the Cramer--Rao lower bound. Unlike a performance bound for a particular algorithm, the CRB is a hard physical limit imposed by the statistical model --- nothing you can build will do better. This is why, when you plot the MSE of a real estimator alongside the CRB, the distance between the two is a receipt you can take to the designer: it says exactly how much slack is left and whether closing it is worth the complexity.
Definition: Score Function and Regularity
Score Function and Regularity
Suppose is a scalar, is an open interval, and the support of does not depend on . The score function is The family is regular if differentiation under the integral sign is permitted, so that
Regularity fails in sharp ways for distributions whose support moves with (e.g., ). The CRB is not applicable in such cases --- faster-than- rates become possible.
Definition: Fisher Information
Fisher Information
Under the regularity conditions of DScore Function and Regularity, the Fisher information (scalar case) is If in addition the density is twice differentiable and the second derivative can be interchanged with the integral, then For independent observations the log-likelihood is a sum, and . In the i.i.d. case, .
The two expressions --- variance of the score and negative expected curvature --- are equal only on average. Pointwise they differ. The equivalence is a consequence of regularity and plays the same role here that the second-moment manipulation plays in the proof of the variance decomposition.
Theorem: Equivalence of the Two Fisher-Information Expressions
Under the regularity conditions of DScore Function and Regularity and DFisher Information,
Differentiate $\log f_\theta$ twice
Let . Then and
Take expectation and use regularity
Taking of both sides, . Exchanging derivative and integral twice (regularity), .
Conclude
Hence .
Theorem: Cramer--Rao Lower Bound (Scalar)
Let be a regular family with scalar . Then for any unbiased estimator , Equality holds for every if and only if i.e., the score is an affine function of the estimator. In that case is called efficient, and it is the MVUE.
The inequality comes from Cauchy--Schwarz applied to two random variables: the centered estimator and the score . Their correlation is forced to be by unbiasedness (after differentiating in ), and the variance of the score is . Cauchy--Schwarz then lower-bounds the variance of the estimator by the reciprocal of the Fisher information. This is the CRB proof pattern: every CRB you will see in this book is an instance of it --- vector, curved, Bayesian, functional --- with the same two-random-variable Cauchy--Schwarz at its core.
Start by differentiating the unbiasedness identity in , under the integral.
Show that .
Apply Cauchy--Schwarz to the pair and read off the CRB.
Differentiate unbiasedness
Unbiasedness gives . Differentiating both sides with respect to under the integral (regularity),
Identify the covariance
Multiply and divide by : . Because , we can subtract from the left side to get
Cauchy--Schwarz
For any two centered random variables , . Take (centered by unbiasedness) and (centered by regularity). Then
Equality condition
Cauchy--Schwarz is tight iff and are co-linear almost surely: for some . Substituting into the identity gives , so . Equivalently, .
Historical Note: Cramer (1946), Rao (1945), and a Near-Simultaneous Discovery
1943--1946The inequality was derived independently by C. R. Rao (Calcutta, 1945) and H. Cramer (Stockholm, 1946), and also by M. Frechet (Paris, 1943) in a less general form. Rao's derivation used the method that became the standard textbook proof; Cramer's 1946 monograph brought it to a wide audience. In the Soviet tradition the same bound carries A. Bhattacharya's name, because of an extension to higher-order derivatives he published in 1946--1948. In modern practice the two-name "Cramer--Rao" label prevails. The coincidence is not an accident: the ingredients (likelihood, Fisher information, Cauchy--Schwarz) were all airborne in the 1940s statistics community, waiting for someone to assemble them.
Example: Efficient Estimator: Mean of a Gaussian
Let be i.i.d. with known. Compute the Fisher information and show that the sample mean is an efficient estimator of .
Compute the Fisher information
The log-likelihood is . Hence and (constant in ). Therefore .
Verify efficiency of $\bar{Y}$
From the score, , which is exactly the CRB equality condition. Hence is efficient and its variance equals . This is the first and cleanest example of an efficient estimator in the book.
Example: Amplitude Estimation in AWGN
Observe for , with known and i.i.d. . The unknown parameter is the amplitude . Compute the CRB and the efficient estimator.
Log-likelihood and score
. The score is .
Fisher information and CRB
, so . Therefore for any unbiased .
Matched-filter estimator attains the bound
Rewriting the score as with shows the CRB equality condition. The matched-filter estimator is efficient with variance . This is the same correlator that shows up in BPSK detection --- same operator, different question. Its Fisher-information content is , which is the familiar -like quantity.
Definition: Fisher Information Matrix
Fisher Information Matrix
Let . Under vector regularity (support independent of and derivatives exchangeable with integration), the Fisher information matrix is the matrix As the covariance of the score vector, ; it is strictly positive definite iff no component of is unidentifiable.
Theorem: Cramer--Rao Lower Bound (Vector Parameter)
For any unbiased estimator of with positive-definite FIM , Componentwise, . For any (Frechet-differentiable) reparameterization with Jacobian ,
Read as the CRB on the -th component, not as . The difference is caused by cross-terms: when another parameter is also being estimated, it steals information from . This is why estimating amplitude and phase jointly is harder than estimating either alone.
Centered score and estimator
Let (centered by unbiasedness) and (centered by regularity). Their covariances are and .
Cross-covariance is the identity
Differentiating component-wise under the integral gives .
Block Schur complement
The joint covariance . Its Schur complement with respect to the block yields , which is the matrix CRB. The reparameterization bound follows by the chain rule .
Example: Joint CRB: Mean and Variance of a Gaussian
Let be i.i.d. with both and unknown, . Compute and the resulting componentwise CRBs.
Log-likelihood partials
With ,
Expected negative Hessian
, , . Taking expectations and using ,
Read off the CRBs
The FIM is diagonal, so the off-diagonal inverse elements vanish: and . The sample mean attains the first bound exactly; the unbiased sample variance has variance , which exceeds the CRB by a factor as --- so is asymptotically efficient but not efficient.
Scalar CRB vs. Vector CRB
| Aspect | Scalar | Vector |
|---|---|---|
| Bound object | Variance | Covariance matrix (PSD ordering ) |
| Information | (scalar) | ( PSD) |
| Componentwise bound | , NOT | |
| Attainment condition | Score is affine in | Score is affine in (simultaneously) |
| Reparameterization |
Common Mistake:
Mistake:
When computing the CRB on a single component of a vector parameter, it is easy to write --- the "scalar formula applied to the -th row".
Correction:
The correct bound is , which is always , with equality only when the FIM is diagonal. The inflation factor quantifies the price of estimating jointly with the nuisance parameters.
Common Mistake: CRB Applies to Unbiased Estimators
Mistake:
"My estimator has MSE below --- I beat the CRB!"
Correction:
The CRB bounds the variance of unbiased estimators. A biased estimator can have smaller variance --- and smaller MSE --- than the CRB. For biased with , the correct Cramer--Rao-type inequality is . The CRB should be compared against the variance of unbiased competitors; MSE comparisons need a different bound (e.g., Bayesian Cramer--Rao, van Trees).
CRB vs. Monte Carlo for Amplitude Estimation in AWGN
Compare the empirical variance of the matched-filter estimator against the CRB as you sweep the SNR and the number of samples. The estimator sits exactly on the CRB, consistent with its efficiency.
Parameters
Fisher Information as Curvature of the Log-Likelihood
For a single Gaussian sample , view the log-likelihood as a function of and watch how its curvature at the peak --- that is, --- rises as shrinks. Averaging this curvature over gives the Fisher information.
Parameters
Channel Estimation: Pilot SNR and the CRB
In a pilot-based channel estimator with orthogonal pilot symbols, the FIM for the complex channel coefficient scales as , and hence the CRB on the real and imaginary parts scales as . This is why 3GPP NR allocates a fraction of OFDM REs as DMRS: the pilot overhead directly buys CRB reduction. Increasing squeezes the bound linearly; the catch is that it also linearly decreases the throughput available for data. Every channel-estimator design is negotiating this trade.
- •
5G NR DMRS density: 1 per 6 subcarriers (frequency), 1--4 OFDM symbols per slot (time), per TS 38.211
- •
CRB on phase noise estimation scales as — motivates PTRS insertion at high FR2 frequencies
- •
For narrowband IoT, long pilot sequences trade latency for CRB improvement
CRB as One Pillar of the Sensing--Communication Tradeoff
Integrated sensing and communication (ISAC) systems re-use a single waveform to both convey data and estimate target parameters. The resulting performance region has two axes: communication rate and sensing accuracy, where the latter is quantified via a CRB-type matrix on the target parameters (range, angle, Doppler). The work of Liu and Caire shows that the frontier of this region is a Pareto-optimal tradeoff between the capacity expression from ITA Chapter 18 and the CRB derived exactly as in this chapter. The CRB is the operational "distortion" in the sensing rate--distortion formulation of ISAC.
Quick Check
An unbiased estimator of a scalar attains the CRB. What can you say about the score function?
It is zero everywhere
It equals
It depends only on
Its mean is
Equality in Cauchy--Schwarz forces score and centered estimator to be co-linear, with proportionality constant .