Ferkans — Interactive Telecom Tutor

How Good Can an Estimator Be?

Given a bias specification (say, unbiased) we ask the engineering question: how small can the variance of any such estimator be, for a given sample size and noise level? The answer is the Cramer--Rao lower bound. Unlike a performance bound for a particular algorithm, the CRB is a hard physical limit imposed by the statistical model --- nothing you can build will do better. This is why, when you plot the MSE of a real estimator alongside the CRB, the distance between the two is a receipt you can take to the designer: it says exactly how much slack is left and whether closing it is worth the complexity.

Definition:
Score Function and Regularity

Suppose $\theta \in \Lambda \subseteq \mathbb{R}$ is a scalar, $\Lambda$ is an open interval, and the support of $f(\mathbf{y};\theta) = f_\theta(\mathbf{y})$ does not depend on $\theta$ . The score function is $s(\mathbf{y}; \theta) \triangleq \frac{\partial}{\partial \theta} \log f_\theta(\mathbf{y}) = \frac{1}{f_\theta(\mathbf{y})} \frac{\partial f_\theta(\mathbf{y})}{\partial \theta}.$ The family $\{f_\theta\}$ is regular if differentiation under the integral sign is permitted, so that $\mathbb{E}_\theta\!\left[ s(\mathbf{Y};\theta) \right] \;=\; \int \frac{\partial f_\theta(\mathbf{y})}{\partial \theta} \, d\mathbf{y} \;=\; \frac{\partial}{\partial \theta} \int f_\theta(\mathbf{y}) \, d\mathbf{y} \;=\; 0.$

Regularity fails in sharp ways for distributions whose support moves with $\theta$ (e.g., $\mathrm{Unif}[0, \theta]$ ). The CRB is not applicable in such cases --- faster-than- $1/n$ rates become possible.

Definition:
Fisher Information

Under the regularity conditions of DScore Function and Regularity, the Fisher information (scalar case) is $J(\theta) \triangleq \text{Var}_\theta\bigl(s(\mathbf{Y};\theta)\bigr) \;=\; \mathbb{E}_\theta\!\left[\left(\frac{\partial \log f_\theta(\mathbf{Y})}{\partial \theta}\right)^2\right].$ If in addition the density is twice differentiable and the second derivative can be interchanged with the integral, then $J(\theta) \;=\; -\,\mathbb{E}_\theta\!\left[ \frac{\partial^2 \log f_\theta(\mathbf{Y})}{\partial \theta^2} \right].$ For independent observations $Y_1,\ldots,Y_n$ the log-likelihood is a sum, and $J(\theta) = \sum_i J_{i}(\theta)$ . In the i.i.d. case, $J(\theta) = n\,J_{1}(\theta)$ .

The two expressions --- variance of the score and negative expected curvature --- are equal only on average. Pointwise they differ. The equivalence is a consequence of regularity and plays the same role here that the second-moment manipulation plays in the proof of the variance decomposition.

Theorem: Equivalence of the Two Fisher-Information Expressions

Under the regularity conditions of DScore Function and Regularity and DFisher Information, $\mathbb{E}_\theta\!\left[\left(\frac{\partial \log f_\theta(\mathbf{Y})}{\partial \theta}\right)^2\right] = - \mathbb{E}_\theta\!\left[\frac{\partial^2 \log f_\theta(\mathbf{Y})}{\partial \theta^2}\right].$

Proof

Differentiate $\log f_\theta$ twice

Let $\ell_\theta(\mathbf{y}) = \log f_\theta(\mathbf{y})$ . Then $\partial_\theta \ell_\theta = \partial_\theta f_\theta / f_\theta$ and $\partial^2_\theta \ell_\theta = \frac{\partial^2_\theta f_\theta}{f_\theta} - \left(\frac{\partial_\theta f_\theta}{f_\theta}\right)^2 = \frac{\partial^2_\theta f_\theta}{f_\theta} - (\partial_\theta \ell_\theta)^2.$

Take expectation and use regularity

Taking $\mathbb{E}_\theta$ of both sides, $\mathbb{E}_\theta[\partial^2_\theta \ell_\theta] = \int \partial^2_\theta f_\theta\, d\mathbf{y} - \mathbb{E}_\theta[(\partial_\theta \ell_\theta)^2]$ . Exchanging derivative and integral twice (regularity), $\int \partial^2_\theta f_\theta\, d\mathbf{y} = \partial^2_\theta \int f_\theta \, d\mathbf{y} = 0$ .

Conclude

Hence $-\mathbb{E}_\theta[\partial^2_\theta \ell_\theta] = \mathbb{E}_\theta[(\partial_\theta \ell_\theta)^2] = J(\theta)$ . $\blacksquare$

Theorem: Cramer--Rao Lower Bound (Scalar)

Let $\{f_\theta : \theta \in \Lambda\}$ be a regular family with scalar $\theta$ . Then for any unbiased estimator $\hat{\theta}(\mathbf{Y})$ , $\boxed{\; \text{Var}_\theta\bigl(\hat{\theta}(\mathbf{Y})\bigr) \;\geq\; \frac{1}{J(\theta)} \;} \qquad \forall \theta \in \Lambda.$ Equality holds for every $\theta \in \Lambda$ if and only if $\frac{\partial}{\partial \theta} \log f_\theta(\mathbf{y}) \;=\; J(\theta) \bigl(\hat{\theta}(\mathbf{y}) - \theta\bigr),$ i.e., the score is an affine function of the estimator. In that case $\hat{\theta}$ is called efficient, and it is the MVUE.

The inequality comes from Cauchy--Schwarz applied to two random variables: the centered estimator $\hat{\theta}(\mathbf{Y}) - \theta$ and the score $s(\mathbf{Y};\theta)$ . Their correlation is forced to be $1$ by unbiasedness (after differentiating $\mathbb{E}_\theta[\hat{\theta}(\mathbf{Y})] = \theta$ in $\theta$ ), and the variance of the score is $J(\theta)$ . Cauchy--Schwarz then lower-bounds the variance of the estimator by the reciprocal of the Fisher information. This is the CRB proof pattern: every CRB you will see in this book is an instance of it --- vector, curved, Bayesian, functional --- with the same two-random-variable Cauchy--Schwarz at its core.

Show Hint

Start by differentiating the unbiasedness identity $\mathbb{E}_\theta[\hat{\theta}(\mathbf{Y})] = \theta$ in $\theta$ , under the integral.

Show that $\mathbb{E}_\theta[(\hat{\theta}(\mathbf{Y}) - \theta) \cdot s(\mathbf{Y};\theta)] = 1$ .

Apply Cauchy--Schwarz to the pair $(\hat{\theta}(\mathbf{Y}) - \theta,\, s(\mathbf{Y};\theta))$ and read off the CRB.

Proof

Differentiate unbiasedness

Unbiasedness gives $\int \hat{\theta}(\mathbf{y}) f_\theta(\mathbf{y})\, d\mathbf{y} = \theta$ . Differentiating both sides with respect to $\theta$ under the integral (regularity), $\int \hat{\theta}(\mathbf{y}) \, \partial_\theta f_\theta(\mathbf{y})\, d\mathbf{y} = 1.$

Identify the covariance

Multiply and divide by $f_\theta$ : $\int \hat{\theta}(\mathbf{y}) \, \partial_\theta \log f_\theta(\mathbf{y})\, f_\theta(\mathbf{y})\, d\mathbf{y} = 1$ . Because $\mathbb{E}_\theta[s(\mathbf{Y};\theta)] = 0$ , we can subtract $\theta \cdot \mathbb{E}_\theta[s(\mathbf{Y};\theta)] = 0$ from the left side to get $\mathbb{E}_\theta\!\bigl[\,(\hat{\theta}(\mathbf{Y}) - \theta)\, s(\mathbf{Y};\theta)\bigr] \;=\; 1.$

Cauchy--Schwarz

For any two centered random variables $U, V$ , $(\mathbb{E}[UV])^2 \leq \mathbb{E}[U^2]\,\mathbb{E}[V^2]$ . Take $U = \hat{\theta}(\mathbf{Y}) - \theta$ (centered by unbiasedness) and $V = s(\mathbf{Y};\theta)$ (centered by regularity). Then $1 = (\mathbb{E}_\theta[UV])^2 \leq \text{Var}_\theta(\hat{\theta}(\mathbf{Y})) \cdot J(\theta).$

Equality condition

Cauchy--Schwarz is tight iff $U$ and $V$ are co-linear almost surely: $\hat{\theta}(\mathbf{y}) - \theta = \kappa(\theta)\, s(\mathbf{y};\theta)$ for some $\kappa(\theta)$ . Substituting into the identity $\mathbb{E}_\theta[UV] = 1$ gives $\kappa(\theta) \cdot J(\theta) = 1$ , so $\kappa(\theta) = 1/J(\theta)$ . Equivalently, $s(\mathbf{y};\theta) = J(\theta) (\hat{\theta}(\mathbf{y}) - \theta)$ . $\blacksquare$

, ,

Historical Note: Cramer (1946), Rao (1945), and a Near-Simultaneous Discovery

1943--1946

The inequality was derived independently by C. R. Rao (Calcutta, 1945) and H. Cramer (Stockholm, 1946), and also by M. Frechet (Paris, 1943) in a less general form. Rao's derivation used the method that became the standard textbook proof; Cramer's 1946 monograph brought it to a wide audience. In the Soviet tradition the same bound carries A. Bhattacharya's name, because of an extension to higher-order derivatives he published in 1946--1948. In modern practice the two-name "Cramer--Rao" label prevails. The coincidence is not an accident: the ingredients (likelihood, Fisher information, Cauchy--Schwarz) were all airborne in the 1940s statistics community, waiting for someone to assemble them.

Example: Efficient Estimator: Mean of a Gaussian

Let $Y_1, \ldots, Y_n$ be i.i.d. $\mathcal{N}(\theta, \sigma^2)$ with $\sigma^2$ known. Compute the Fisher information and show that the sample mean $\bar{Y}$ is an efficient estimator of $\theta$ .

Solution

Compute the Fisher information

The log-likelihood is $\ell_\theta(\mathbf{y}) = -\tfrac{n}{2}\log(2\pi\sigma^2) - \tfrac{1}{2\sigma^2}\sum_i (y_i - \theta)^2$ . Hence $\partial_\theta \ell_\theta(\mathbf{y}) = \tfrac{1}{\sigma^2} \sum_i (y_i - \theta)$ and $\partial^2_\theta \ell_\theta(\mathbf{y}) = -n/\sigma^2$ (constant in $\mathbf{y}$ ). Therefore $J(\theta) = -\mathbb{E}[\partial^2_\theta \ell_\theta] = n/\sigma^2$ .

Verify efficiency of $\bar{Y}$

From the score, $\partial_\theta \ell_\theta(\mathbf{y}) = \tfrac{n}{\sigma^2}(\bar{y} - \theta) = J(\theta)(\bar{y} - \theta)$ , which is exactly the CRB equality condition. Hence $\bar{Y}$ is efficient and its variance $\sigma^2/n$ equals $1/J(\theta)$ . This is the first and cleanest example of an efficient estimator in the book.

Example: Amplitude Estimation in AWGN

Observe $Y_i = A s_i + Z_i$ for $i = 1, \ldots, n$ , with $s_i$ known and $Z_i \sim$ i.i.d. $\mathcal{N}(0, \sigma^2)$ . The unknown parameter is the amplitude $A \in \mathbb{R}$ . Compute the CRB and the efficient estimator.

Solution

Log-likelihood and score

$\ell_A(\mathbf{y}) = -\tfrac{n}{2}\log(2\pi\sigma^2) - \tfrac{1}{2\sigma^2}\sum_i (y_i - A s_i)^2$ . The score is $\partial_A \ell_A(\mathbf{y}) = \tfrac{1}{\sigma^2}\sum_i s_i(y_i - A s_i)$ .

Fisher information and CRB

$\partial^2_A \ell_A = -\tfrac{1}{\sigma^2}\sum_i s_i^2$ , so $J(A) = \|\mathbf{s}\|^2/\sigma^2$ . Therefore $\text{Var}_{A}(\hat{A}) \geq \sigma^2 / \|\mathbf{s}\|^2$ for any unbiased $\hat{A}$ .

Matched-filter estimator attains the bound

Rewriting the score as $J(A)(\hat{A}_{\text{MF}}(\mathbf{y}) - A)$ with $\hat{A}_{\text{MF}}(\mathbf{y}) = \mathbf{s}^T \mathbf{y} / \|\mathbf{s}\|^2$ shows the CRB equality condition. The matched-filter estimator is efficient with variance $\sigma^2/\|\mathbf{s}\|^2$ . This is the same correlator that shows up in BPSK detection --- same operator, different question. Its Fisher-information content is $\|\mathbf{s}\|^2 / \sigma^2$ , which is the familiar $E_s/N_0$ -like quantity.

Definition:
Fisher Information Matrix

Let $\boldsymbol{\theta} = (\theta_1, \ldots, \theta_m)^T \in \Lambda \subseteq \mathbb{R}^m$ . Under vector regularity (support independent of $\boldsymbol{\theta}$ and derivatives exchangeable with integration), the Fisher information matrix is the $m \times m$ matrix $[\mathbf{J}(\boldsymbol{\theta})]_{ij} \triangleq \mathbb{E}_{\boldsymbol{\theta}}\!\left[ \frac{\partial \log f_{\boldsymbol{\theta}}(\mathbf{Y})}{\partial \theta_i} \cdot \frac{\partial \log f_{\boldsymbol{\theta}}(\mathbf{Y})}{\partial \theta_j} \right] = -\mathbb{E}_{\boldsymbol{\theta}}\!\left[ \frac{\partial^2 \log f_{\boldsymbol{\theta}}(\mathbf{Y})}{\partial \theta_i \, \partial \theta_j} \right].$ As the covariance of the score vector, $\mathbf{J}(\boldsymbol{\theta}) \succeq 0$ ; it is strictly positive definite iff no component of $\boldsymbol{\theta}$ is unidentifiable.

Theorem: Cramer--Rao Lower Bound (Vector Parameter)

For any unbiased estimator $\hat{\boldsymbol{\theta}}(\mathbf{Y})$ of $\boldsymbol{\theta} \in \mathbb{R}^m$ with positive-definite FIM $\mathbf{J}(\boldsymbol{\theta})$ , $\text{Cov}_{\boldsymbol{\theta}}\bigl(\hat{\boldsymbol{\theta}}(\mathbf{Y})\bigr) \;\succeq\; \mathbf{J}(\boldsymbol{\theta})^{-1}.$ Componentwise, $\text{Var}_{\boldsymbol{\theta}}(\hat{\theta}_i) \geq [\mathbf{J}(\boldsymbol{\theta})^{-1}]_{ii}$ . For any (Frechet-differentiable) reparameterization $\boldsymbol{\alpha}(\boldsymbol{\theta}) : \mathbb{R}^m \to \mathbb{R}^r$ with Jacobian $\mathbf{D}(\boldsymbol{\theta}) = \partial \boldsymbol{\alpha}/\partial \boldsymbol{\theta}^T$ , $\text{Cov}_{\boldsymbol{\theta}}(\hat{\boldsymbol{\alpha}}(\mathbf{Y})) \;\succeq\; \mathbf{D}(\boldsymbol{\theta})\, \mathbf{J}(\boldsymbol{\theta})^{-1}\, \mathbf{D}(\boldsymbol{\theta})^T.$

Read $[\mathbf{J}^{-1}]_{ii}$ as the CRB on the $i$ -th component, not as $1/[\mathbf{J}]_{ii}$ . The difference is caused by cross-terms: when another parameter is also being estimated, it steals information from $\theta_i$ . This is why estimating amplitude and phase jointly is harder than estimating either alone.

Proof

Centered score and estimator

Let $\mathbf{U} = \hat{\boldsymbol{\theta}}(\mathbf{Y}) - \boldsymbol{\theta}$ (centered by unbiasedness) and $\mathbf{V} = \nabla_{\boldsymbol{\theta}} \log f_{\boldsymbol{\theta}}(\mathbf{Y})$ (centered by regularity). Their covariances are $\text{Cov}(\mathbf{U}) = \text{Cov}_{\boldsymbol{\theta}}(\hat{\boldsymbol{\theta}})$ and $\text{Cov}(\mathbf{V}) = \mathbf{J}(\boldsymbol{\theta})$ .

Cross-covariance is the identity

Differentiating $\mathbb{E}_{\boldsymbol{\theta}}[\hat{\boldsymbol{\theta}}(\mathbf{Y})] = \boldsymbol{\theta}$ component-wise under the integral gives $\mathbb{E}_{\boldsymbol{\theta}}[\mathbf{U} \mathbf{V}^T] = \mathbf{I}_m$ .

Block Schur complement

The joint covariance $\begin{pmatrix} \text{Cov}(\mathbf{U}) & \mathbf{I}_m \\ \mathbf{I}_m & \mathbf{J}(\boldsymbol{\theta}) \end{pmatrix} \succeq 0$ . Its Schur complement with respect to the $(2,2)$ block yields $\text{Cov}(\mathbf{U}) - \mathbf{J}(\boldsymbol{\theta})^{-1} \succeq 0$ , which is the matrix CRB. The reparameterization bound follows by the chain rule $\mathbf{U}_\alpha = \mathbf{D}(\boldsymbol{\theta})\mathbf{U}$ . $\blacksquare$

,

Example: Joint CRB: Mean and Variance of a Gaussian

Let $Y_1, \ldots, Y_n$ be i.i.d. $\mathcal{N}(\mu, \sigma^2)$ with both $\mu$ and $\sigma^2$ unknown, $\boldsymbol{\theta} = (\mu, \sigma^2)^T$ . Compute $\mathbf{J}(\boldsymbol{\theta})$ and the resulting componentwise CRBs.

Solution

Log-likelihood partials

With $\ell_{\boldsymbol{\theta}}(\mathbf{y}) = -\tfrac{n}{2}\log(2\pi\sigma^2) - \tfrac{1}{2\sigma^2}\sum_i(y_i - \mu)^2$ , $\partial_\mu \ell = \tfrac{1}{\sigma^2}\sum_i(y_i - \mu), \quad \partial_{\sigma^2} \ell = -\tfrac{n}{2\sigma^2} + \tfrac{1}{2\sigma^4}\sum_i(y_i - \mu)^2.$

Expected negative Hessian

$\partial^2_\mu \ell = -n/\sigma^2$ , $\partial^2_{\mu,\sigma^2}\ell = -(1/\sigma^4)\sum_i(y_i - \mu)$ , $\partial^2_{\sigma^2} \ell = n/(2\sigma^4) - (1/\sigma^6)\sum_i(y_i - \mu)^2$ . Taking expectations and using $\mathbb{E}[(Y_i-\mu)^2] = \sigma^2$ , $\mathbf{J}(\boldsymbol{\theta}) = \begin{pmatrix} n/\sigma^2 & 0 \\ 0 & n/(2\sigma^4) \end{pmatrix}.$

Read off the CRBs

The FIM is diagonal, so the off-diagonal inverse elements vanish: $[\mathbf{J}^{-1}]_{\mu\mu} = \sigma^2/n$ and $[\mathbf{J}^{-1}]_{\sigma^2\sigma^2} = 2\sigma^4/n$ . The sample mean attains the first bound exactly; the unbiased sample variance $S^2_{n-1}$ has variance $2\sigma^4/(n-1)$ , which exceeds the CRB by a factor $n/(n-1) \to 1$ as $n \to \infty$ --- so $S^2_{n-1}$ is asymptotically efficient but not efficient.

Scalar CRB vs. Vector CRB

Aspect	Scalar $(\theta \in \mathbb{R})$	Vector $(\boldsymbol{\theta} \in \mathbb{R}^m)$
Bound object	Variance	Covariance matrix (PSD ordering $\succeq$ )
Information	$J(\theta)$ (scalar)	$\mathbf{J}(\boldsymbol{\theta})$ ( $m\times m$ PSD)
Componentwise bound	$\text{Var}(\hat{\theta})\geq 1/J(\theta)$	$\text{Var}(\hat{\theta}_i)\geq [\mathbf{J}^{-1}]_{ii}$ , NOT $1/[\mathbf{J}]_{ii}$
Attainment condition	Score is affine in $\hat{\theta}$	Score is affine in $\hat{\boldsymbol{\theta}}$ (simultaneously)
Reparameterization $\alpha=u(\theta)$	$\text{Var}(\hat{\alpha})\geq u'(\theta)^2/J(\theta)$	$\text{Cov}(\hat{\boldsymbol{\alpha}})\succeq \mathbf{D}\,\mathbf{J}^{-1}\mathbf{D}^T$

Common Mistake: $[J^{-1}]_{ii} \neq 1/[J]_{ii}$

Mistake:

When computing the CRB on a single component of a vector parameter, it is easy to write $1/[\mathbf{J}(\boldsymbol{\theta})]_{ii}$ --- the "scalar formula applied to the $i$ -th row".

Correction:

The correct bound is $[\mathbf{J}(\boldsymbol{\theta})^{-1}]_{ii}$ , which is always $\geq 1/[\mathbf{J}(\boldsymbol{\theta})]_{ii}$ , with equality only when the FIM is diagonal. The inflation factor quantifies the price of estimating $\theta_i$ jointly with the nuisance parameters.

Common Mistake: CRB Applies to Unbiased Estimators

Mistake:

"My estimator has MSE below $1/J(\theta)$ --- I beat the CRB!"

Correction:

The CRB bounds the variance of unbiased estimators. A biased estimator can have smaller variance --- and smaller MSE --- than the CRB. For biased $\hat{\theta}$ with $\mathbb{E}_\theta[\hat{\theta}] = \theta + b(\theta)$ , the correct Cramer--Rao-type inequality is $\text{Var}_\theta(\hat{\theta}) \geq (1 + b'(\theta))^2/J(\theta)$ . The CRB should be compared against the variance of unbiased competitors; MSE comparisons need a different bound (e.g., Bayesian Cramer--Rao, van Trees).

CRB vs. Monte Carlo for Amplitude Estimation in AWGN

Compare the empirical variance of the matched-filter estimator $\hat{A}_{\text{MF}} = \mathbf{s}^T\mathbf{y}/\|\mathbf{s}\|^2$ against the CRB $\sigma^2/\|\mathbf{s}\|^2$ as you sweep the SNR and the number of samples. The estimator sits exactly on the CRB, consistent with its efficiency.

Parameters

SNR

_{\min}

(dB)-5

SNR

_{\max}

(dB)20

Samples

n

32

Fisher Information as Curvature of the Log-Likelihood

For a single Gaussian sample $Y \sim \mathcal{N}(\theta, \sigma^2)$ , view the log-likelihood $\ell_\theta(y) = \log f_\theta(y)$ as a function of $\theta$ and watch how its curvature at the peak --- that is, $-\ell''_\theta(y) = 1/\sigma^2$ --- rises as $\sigma$ shrinks. Averaging this curvature over $Y$ gives the Fisher information.

Parameters

\sigma

1

observed

y

0.5

⚠️Engineering Note

Channel Estimation: Pilot SNR and the CRB

In a pilot-based channel estimator with $T_p$ orthogonal pilot symbols, the FIM for the complex channel coefficient scales as $T_p \cdot E_p / N_0$ , and hence the CRB on the real and imaginary parts scales as $\sigma^2 / T_p$ . This is why 3GPP NR allocates a fraction of OFDM REs as DMRS: the pilot overhead directly buys CRB reduction. Increasing $T_p$ squeezes the bound linearly; the catch is that it also linearly decreases the throughput available for data. Every channel-estimator design is negotiating this trade.

Practical Constraints

•
5G NR DMRS density: 1 per 6 subcarriers (frequency), 1--4 OFDM symbols per slot (time), per TS 38.211
•
CRB on phase noise estimation scales as $1 / (T \cdot \text{SNR})$ — motivates PTRS insertion at high FR2 frequencies
•
For narrowband IoT, long pilot sequences trade latency for CRB improvement

📋 Ref: 3GPP TS 38.211, Section 6.4

🎓CommIT Contribution(2023)

CRB as One Pillar of the Sensing--Communication Tradeoff

F. Liu, G. Caire — IEEE Trans. Information Theory, vol. 69, no. 9

Integrated sensing and communication (ISAC) systems re-use a single waveform to both convey data and estimate target parameters. The resulting performance region has two axes: communication rate and sensing accuracy, where the latter is quantified via a CRB-type matrix on the target parameters (range, angle, Doppler). The work of Liu and Caire shows that the frontier of this region is a Pareto-optimal tradeoff between the capacity expression from ITA Chapter 18 and the CRB derived exactly as in this chapter. The CRB is the operational "distortion" in the sensing rate--distortion formulation of ISAC.

isaccrbsensinginformation-theoryView Paper →

Quick Check

An unbiased estimator of a scalar $\theta$ attains the CRB. What can you say about the score function?

It is zero everywhere

It equals $J(\theta)(\hat{\theta}(\mathbf{y}) - \theta)$

It depends only on $\theta$

Its mean is $\theta$

Correction:

It equals

J(\theta)(\hat{\theta}(\mathbf{y}) - \theta)

Equality in Cauchy--Schwarz forces score and centered estimator to be co-linear, with proportionality constant $J(\theta)$ .

The CRB as Cauchy--Schwarz: Score and Estimator Co-linear at Efficiency

A visual proof of the scalar CRB: we treat the centered estimator

\hat{\theta}(\mathbf{Y}) - \theta

and the score

s(\mathbf{Y};\theta)

as vectors in

L^2(f_\theta)

, show that their inner product equals one, and read off the Cauchy--Schwarz lower bound on

\|\hat{\theta} - \theta\|^2

.

Efficiency is the statement that the estimator lies on the line spanned by the score.

The Cramer--Rao Lower Bound

How Good Can an Estimator Be?

Definition: Score Function and Regularity

Definition: Fisher Information

Theorem: Equivalence of the Two Fisher-Information Expressions

Differentiate $\log f_\theta$ twice

Take expectation and use regularity

Conclude

Theorem: Cramer--Rao Lower Bound (Scalar)

Differentiate unbiasedness

Identify the covariance

Cauchy--Schwarz

Equality condition

Historical Note: Cramer (1946), Rao (1945), and a Near-Simultaneous Discovery

Example: Efficient Estimator: Mean of a Gaussian

Compute the Fisher information

Verify efficiency of $\bar{Y}$

Example: Amplitude Estimation in AWGN

Log-likelihood and score

Fisher information and CRB

Matched-filter estimator attains the bound

Definition: Fisher Information Matrix

Theorem: Cramer--Rao Lower Bound (Vector Parameter)

Centered score and estimator

Cross-covariance is the identity

Block Schur complement

Example: Joint CRB: Mean and Variance of a Gaussian

Log-likelihood partials

Expected negative Hessian

Read off the CRBs

Scalar CRB vs. Vector CRB

Common Mistake: [J−1]ii≠1/[J]ii[J^{-1}]_{ii} \neq 1/[J]_{ii}[J−1]ii​=1/[J]ii​

Common Mistake: CRB Applies to Unbiased Estimators

CRB vs. Monte Carlo for Amplitude Estimation in AWGN

Parameters

Fisher Information as Curvature of the Log-Likelihood

Parameters

Channel Estimation: Pilot SNR and the CRB

CRB as One Pillar of the Sensing--Communication Tradeoff

Quick Check

The CRB as Cauchy--Schwarz: Score and Estimator Co-linear at Efficiency

Definition:
Score Function and Regularity

Definition:
Fisher Information

Definition:
Fisher Information Matrix

Common Mistake: $[J^{-1}]_{ii} \neq 1/[J]_{ii}$