Ferkans — Interactive Telecom Tutor

A Procedure for Building the MVUE

The previous section gave us sufficient statistics --- lossless compressors for $\theta$ . This section converts them into estimators. The Rao--Blackwell theorem is the mechanism: it takes any unbiased estimator and projects it onto the sufficient $\sigma$ -algebra to produce a better (smaller-variance) unbiased estimator. When the sufficient statistic is additionally complete, Lehmann--Scheffe certifies the projected estimator as the MVUE --- unique over all unbiased competitors. This is the textbook recipe every receiver designer walks through (often without naming the theorems).

Theorem: Rao--Blackwell Theorem

Let $g(\mathbf{Y})$ be an unbiased estimator of $\theta$ with finite second moment, and let $T(\mathbf{Y})$ be a sufficient statistic for $\theta$ . Define $\tilde{g}(t) \;\triangleq\; \mathbb{E}_\theta[\,g(\mathbf{Y}) \mid T(\mathbf{Y}) = t\,] \qquad \text{(well-defined because $T$ is sufficient)}.$ Then $\tilde{g}(T(\mathbf{Y}))$ is: (i) a statistic (does not depend on $\theta$ ); (ii) unbiased: $\mathbb{E}_\theta[\tilde{g}(T(\mathbf{Y}))] = \theta$ ; (iii) variance-improving: $\text{Var}_\theta(\tilde{g}(T(\mathbf{Y}))) \leq \text{Var}_\theta(g(\mathbf{Y}))$ , with equality iff $g(\mathbf{Y}) = \tilde{g}(T(\mathbf{Y}))$ almost surely.

The conditional expectation is an $L^2$ orthogonal projection: it is the closest function of $T$ to the original estimator. Projection never increases the norm, which is exactly the variance reduction. The sufficiency assumption is what guarantees the conditional expectation does not depend on $\theta$ --- so the projected estimator is an honest statistic, computable from the data alone.

Proof

Why $\tilde{g}$ does not depend on $\theta$

Because $T$ is sufficient, the conditional distribution $\mathbb{P}_\theta(\mathbf{Y} \mid T(\mathbf{Y}) = t)$ does not depend on $\theta$ . Hence neither does $\mathbb{E}_\theta[g(\mathbf{Y}) \mid T(\mathbf{Y}) = t]$ . We may write $\tilde{g}(t)$ without a subscript.

Unbiasedness by tower

By the tower property, $\mathbb{E}_\theta[\tilde{g}(T(\mathbf{Y}))] = \mathbb{E}_\theta[\mathbb{E}_\theta[g(\mathbf{Y}) \mid T(\mathbf{Y})]] = \mathbb{E}_\theta[g(\mathbf{Y})] = \theta$ .

Variance reduction by Jensen (or conditional variance identity)

Using the law of total variance, $\text{Var}_\theta(g(\mathbf{Y})) = \mathbb{E}_\theta[\text{Var}_\theta(g(\mathbf{Y})\mid T(\mathbf{Y}))] + \text{Var}_\theta(\mathbb{E}_\theta[g(\mathbf{Y})\mid T(\mathbf{Y})]) \geq \text{Var}_\theta(\tilde{g}(T(\mathbf{Y})))$ , with equality iff the conditional variance $\text{Var}_\theta(g(\mathbf{Y}) \mid T(\mathbf{Y}))$ is zero a.s., i.e., $g(\mathbf{Y})$ is a.s. a function of $T(\mathbf{Y})$ . $\blacksquare$

, ,

Historical Note: Rao (1945), Blackwell (1947), Scheffe (1950)

1945--1950

C. R. Rao's 1945 paper introduced the inequality and the conditioning trick for a single statistic. D. Blackwell's 1947 note showed the same procedure works for any convex loss function and clarified its orthogonal-projection character. E. L. Lehmann and H. Scheffe closed the circle in 1950 by adding the completeness requirement, producing the uniqueness statement: there is at most one unbiased estimator that is a function of the complete sufficient statistic, and it is the MVUE. These three short papers (Rao's is four pages) gave point estimation its modern skeleton.

Theorem: Lehmann--Scheffe Theorem

Let $T(\mathbf{Y})$ be a complete sufficient statistic for the family $\{f_\theta : \theta \in \Lambda\}$ . If $\hat{\theta}(\mathbf{Y}) = \psi(T(\mathbf{Y}))$ is an unbiased estimator of $\theta$ that is a function of $T$ , then $\hat{\theta}$ is the unique (almost-surely) minimum-variance unbiased estimator (MVUE).

Rao--Blackwell says: for any unbiased competitor $g$ , the projected estimator $\mathbb{E}[g \mid T]$ is a function of $T$ and has weakly smaller variance. Completeness of $T$ says: there is at most one unbiased function of $T$ --- any two would differ by a function of $T$ with zero mean, which completeness kills. Combining: the unbiased function of $T$ is the MVUE.

Proof

Any two unbiased functions of $T$ agree a.s.

Suppose $\psi_1(T(\mathbf{Y}))$ and $\psi_2(T(\mathbf{Y}))$ are both unbiased for $\theta$ . Then $\psi(t) \triangleq \psi_1(t) - \psi_2(t)$ satisfies $\mathbb{E}_\theta[\psi(T(\mathbf{Y}))] = 0$ for all $\theta \in \Lambda$ . Completeness forces $\psi \equiv 0$ a.s., so $\psi_1 = \psi_2$ a.s.

Rao--Blackwell shows every unbiased estimator projects here

Let $g(\mathbf{Y})$ be any unbiased estimator of $\theta$ (not necessarily a function of $T$ ). By Rao--Blackwell, $\tilde{g}(T(\mathbf{Y})) = \mathbb{E}_\theta[g(\mathbf{Y}) \mid T(\mathbf{Y})]$ is an unbiased function of $T$ . By Step 1, $\tilde{g} = \psi$ a.s. Hence $\text{Var}_\theta(g(\mathbf{Y})) \geq \text{Var}_\theta(\psi(T(\mathbf{Y})))$ for every unbiased $g$ . So $\psi(T(\mathbf{Y}))$ is MVUE, and it is unique. $\blacksquare$

,

Recipe for Constructing the MVUE

Complexity: Depends on step 6 — closed form in the exponential family, Monte Carlo elsewhere.

Input: Parametric family

\{f_\theta : \theta \in \Lambda\}

and observation

\mathbf{Y}

.

Output: MVUE

\hat{\theta}(\mathbf{Y})

(when it exists).

1. Write the density

f_\theta(\mathbf{y})

and identify the

\theta

-dependence.

2. Apply Fisher--Neyman to find a sufficient statistic

T(\mathbf{Y})

.

3. Verify

T

is minimal sufficient (Lehmann--Scheffe minimality criterion).

4. Verify

T

is complete (usually by recognizing an exponential family with full-dimensional natural parameter).

5. Find any unbiased estimator

g(\mathbf{Y})

of

\theta

(can be naive: a single sample, or a small subvector).

6. Compute

\hat{\theta}(\mathbf{Y}) = \tilde{g}(T(\mathbf{Y})) = \mathbb{E}_\theta[g(\mathbf{Y}) \mid T(\mathbf{Y})]

.

7. Return

\hat{\theta}

: by Lehmann--Scheffe, this is the unique MVUE.

In practice steps 4--6 collapse when the family is a textbook exponential family: the natural sufficient statistic is automatically complete, and an unbiased function of $T$ is often visible by inspection (e.g., rescaling the sample mean).

Example: MVUE of Amplitude in AWGN via Rao--Blackwell

Observe $\mathbf{Y} = A \mathbf{s} + \mathbf{Z}$ , $\mathbf{Z} \sim$ i.i.d. $\mathcal{N}(0, \sigma^2)$ , $A \in \mathbb{R}$ unknown. Starting from the naive unbiased estimator $g(\mathbf{Y}) = Y_1/s_1$ (for $s_1 \neq 0$ ), construct the MVUE via Rao--Blackwell.

Solution

Naive estimator is unbiased

$\mathbb{E}_A[Y_1/s_1] = \mathbb{E}_A[(A s_1 + Z_1)/s_1] = A$ . Its variance is $\sigma^2/s_1^2$ --- bad, because it ignores the other $n-1$ samples.

Sufficient statistic: $T = \mathbf{s}^T\mathbf{Y}$

From EFactorization: Signal Amplitude in AWGN, $T(\mathbf{Y}) = \mathbf{s}^T\mathbf{Y}$ is sufficient and, as the natural sufficient statistic of an exponential family with one-dimensional natural parameter image, it is complete.

Compute $\tilde{g}(T)$

Under $A$ , $(Y_1/s_1, T)$ is jointly Gaussian with $\mathbb{E}[Y_1/s_1] = A$ , $\mathbb{E}[T] = A\|\mathbf{s}\|^2$ , $\text{Var}(Y_1/s_1) = \sigma^2/s_1^2$ , $\text{Var}(T) = \sigma^2\|\mathbf{s}\|^2$ , $\text{Cov}(Y_1/s_1, T) = \sigma^2$ (the $Z_1 s_1$ cross-term). The Gaussian conditional mean formula gives $\mathbb{E}_A[Y_1/s_1 \mid T = t] \;=\; A + \frac{\sigma^2}{\sigma^2\|\mathbf{s}\|^2}(t - A\|\mathbf{s}\|^2) \;=\; \frac{t}{\|\mathbf{s}\|^2}.$ The $A$ -dependence cancels (as it must, by sufficiency). Hence $\hat{A}(\mathbf{Y}) = \mathbf{s}^T\mathbf{Y}/\|\mathbf{s}\|^2$ with variance $\sigma^2/\|\mathbf{s}\|^2$ , matching the CRB --- the MVUE is efficient.

Example: MVUE of $\sigma^2$ from a Gaussian Sample

Given $Y_1, \ldots, Y_n$ i.i.d. $\mathcal{N}(\mu, \sigma^2)$ with both unknown, find the MVUE of $\sigma^2$ .

Solution

Complete sufficient statistic

From EGaussian as an Exponential Family, $T = (\sum_i Y_i, \sum_i Y_i^2)^T$ is complete sufficient for $(\mu, \sigma^2)$ .

Unbiased function of $T$

The Bessel-corrected variance $S^2_{n-1} = \tfrac{1}{n-1}\sum_i (Y_i - \bar{Y})^2 = \tfrac{1}{n-1} \bigl( \sum_i Y_i^2 - n\bar{Y}^2 \bigr)$ is a function of $T$ (both $\sum_i Y_i^2$ and $\bar{Y} = \tfrac{1}{n}\sum_i Y_i$ are components of $T$ ), and it is unbiased for $\sigma^2$ .

Conclude by Lehmann--Scheffe

By Lehmann--Scheffe, $S^2_{n-1}$ is the MVUE of $\sigma^2$ . Its variance $2\sigma^4/(n-1)$ exceeds the CRB $2\sigma^4/n$ --- there is no efficient unbiased estimator of $\sigma^2$ in this model. Efficiency is strictly stronger than being MVUE.

Key Takeaway

The MVUE workflow. (1) find a complete sufficient statistic; (2) construct any unbiased function of it. That function is the MVUE. No calculus of variations needed --- sufficiency and completeness do the work. The subtlety that efficient $\Rightarrow$ MVUE but not conversely is the reason we need this machinery beyond the CRB.

Efficient vs. MVUE vs. Unbiased

Property	Unbiased	MVUE	Efficient
Bias	$0$	$0$	$0$
Variance	Anything $< \infty$	Minimum among unbiased	Equals CRB $1/J(\theta)$ for all $\theta$
Always exists?	Usually	Not always; needs complete sufficient statistic	Rare — requires affine score
Certificate	Compute mean	Lehmann--Scheffe	Score linear in $\hat{\theta}$
Implication chain	Needed for MVUE	$\Leftarrow$ efficient	$\Rightarrow$ MVUE (if unbiased)

Common Mistake: Sufficient Does Not Imply Complete

Mistake:

Treating any sufficient statistic as a green light to apply Lehmann--Scheffe.

Correction:

Lehmann--Scheffe needs completeness, not just sufficiency. Example: in the location family $\mathcal{N}(\theta, 1)$ with $\theta \in \{-1, +1\}$ (a discrete two-point parameter set), the sample mean is sufficient but not complete --- the parameter image is too small. Without completeness, multiple unbiased functions of $T$ can exist, and none is "the" MVUE. Completeness is usually secured by the full-dimensionality condition in the exponential family theorem.

Common Mistake: Sometimes No MVUE Exists

Mistake:

Assuming the MVUE always exists and that the workflow will terminate.

Correction:

There are families with no uniformly minimum-variance unbiased estimator: different $\theta$ are minimized by different estimators, and no single function dominates the others. This typically happens outside the exponential family, or with nuisance parameters that destroy completeness. In such cases one resorts to locally minimum variance, asymptotic criteria, or Bayesian/risk-based approaches.

Rao--Blackwellization in Action

Start with a noisy naive unbiased estimator $g(\mathbf{Y}) = Y_1 / s_1$ for the amplitude $A$ , and apply Rao--Blackwell with respect to $T(\mathbf{Y}) = \mathbf{s}^T\mathbf{Y}$ . Compare the sampling distributions and variances as you sweep $n$ and SNR.

Parameters

samples

n

20

SNR (dB)0

🔧Engineering Note

Least Squares as the BLUE: An MVUE Within a Class

Under a linear observation model $\mathbf{Y} = \mathbf{A}\boldsymbol{\theta} + \mathbf{W}$ with zero-mean noise of covariance $\boldsymbol{\Sigma}$ , the weighted least-squares estimator $\hat{\boldsymbol{\theta}}_{\text{WLS}} = (\mathbf{A}^T \boldsymbol{\Sigma}^{-1} \mathbf{A})^{-1} \mathbf{A}^T \boldsymbol{\Sigma}^{-1} \mathbf{Y}$ is the best linear unbiased estimator (BLUE) --- the MVUE within the class of linear estimators, regardless of the noise distribution (Gauss--Markov theorem). When the noise is Gaussian, this is also the global MVUE. In channel estimation, this is why LS is the reference point even when the channel covariance is unknown: MMSE improves on it by exploiting priors, but LS is optimal when the prior is absent.

Practical Constraints

•
Requires $\mathbf{A}^T \boldsymbol{\Sigma}^{-1} \mathbf{A}$ invertible (identifiability)
•
Variance floor: $(\mathbf{A}^T \boldsymbol{\Sigma}^{-1} \mathbf{A})^{-1}$ — the vector CRB in Gaussian case
•
Non-Gaussian noise: linear class optimality, but nonlinear estimators may beat it

📋 Ref: IEEE 802.11 pilot-aided channel estimation, Section 17.3.11

Quick Check

Under what condition does Rao--Blackwell give zero variance reduction?

The original estimator $g(\mathbf{Y})$ is already a function of the sufficient statistic $T(\mathbf{Y})$

The sufficient statistic has the same dimension as the data

The parameter is a scalar

The estimator is efficient

Correction:

The original estimator

g(\mathbf{Y})

is already a function of the sufficient statistic

T(\mathbf{Y})

If $g(\mathbf{Y})$ is a function of $T$ , then $\mathbb{E}[g(\mathbf{Y}) \mid T] = g(\mathbf{Y})$ almost surely, so conditioning does nothing. The conditional variance is zero.

Quick Check

Which hypothesis in Lehmann--Scheffe guarantees uniqueness of the MVUE?

Sufficiency of $T$

Completeness of $T$

Unbiasedness of $g$

The exponential-family form

Correction:

Completeness of

T

Completeness says the only unbiased estimator of zero that is a function of $T$ is zero itself; this is what forces uniqueness of any unbiased function of $T$ .

Why This Matters: Channel Estimation: LS is the BLUE, MMSE is Beyond Unbiased

Classical pilot-based channel estimation returns the least-squares estimator $\hat{\mathbf{h}}_{\text{LS}} = (\mathbf{X}^H\mathbf{X})^{-1}\mathbf{X}^H\mathbf{y}$ --- the BLUE by Gauss--Markov, and MVUE in the Gaussian case. MMSE channel estimators trade off unbiasedness for smaller MSE by exploiting the channel covariance $\boldsymbol{\Sigma}_\mathbf{h}$ ; they live in a different optimality framework (Bayesian, Chapter 7). Both sides of this story are already visible in the machinery of this chapter.

Rao--Blackwell, Lehmann--Scheffe, and the MVUE

A Procedure for Building the MVUE

Theorem: Rao--Blackwell Theorem

Why $\tilde{g}$ does not depend on $\theta$

Unbiasedness by tower

Variance reduction by Jensen (or conditional variance identity)

Historical Note: Rao (1945), Blackwell (1947), Scheffe (1950)

Theorem: Lehmann--Scheffe Theorem

Any two unbiased functions of $T$ agree a.s.

Rao--Blackwell shows every unbiased estimator projects here

Recipe for Constructing the MVUE

Example: MVUE of Amplitude in AWGN via Rao--Blackwell

Naive estimator is unbiased

Sufficient statistic: $T = \mathbf{s}^T\mathbf{Y}$

Compute $\tilde{g}(T)$

Example: MVUE of σ2\sigma^2σ2 from a Gaussian Sample

Complete sufficient statistic

Unbiased function of $T$

Conclude by Lehmann--Scheffe

Key Takeaway

Efficient vs. MVUE vs. Unbiased

Common Mistake: Sufficient Does Not Imply Complete

Common Mistake: Sometimes No MVUE Exists

Rao--Blackwellization in Action

Parameters

Least Squares as the BLUE: An MVUE Within a Class

Quick Check

Quick Check

Why This Matters: Channel Estimation: LS is the BLUE, MMSE is Beyond Unbiased

Example: MVUE of $\sigma^2$ from a Gaussian Sample