Rao--Blackwell, Lehmann--Scheffe, and the MVUE

A Procedure for Building the MVUE

The previous section gave us sufficient statistics --- lossless compressors for θ\theta. This section converts them into estimators. The Rao--Blackwell theorem is the mechanism: it takes any unbiased estimator and projects it onto the sufficient σ\sigma-algebra to produce a better (smaller-variance) unbiased estimator. When the sufficient statistic is additionally complete, Lehmann--Scheffe certifies the projected estimator as the MVUE --- unique over all unbiased competitors. This is the textbook recipe every receiver designer walks through (often without naming the theorems).

Theorem: Rao--Blackwell Theorem

Let g(Y)g(\mathbf{Y}) be an unbiased estimator of θ\theta with finite second moment, and let T(Y)T(\mathbf{Y}) be a sufficient statistic for θ\theta. Define g~(t)    Eθ[g(Y)T(Y)=t](well-defined because T is sufficient).\tilde{g}(t) \;\triangleq\; \mathbb{E}_\theta[\,g(\mathbf{Y}) \mid T(\mathbf{Y}) = t\,] \qquad \text{(well-defined because $T$ is sufficient)}. Then g~(T(Y))\tilde{g}(T(\mathbf{Y})) is: (i) a statistic (does not depend on θ\theta); (ii) unbiased: Eθ[g~(T(Y))]=θ\mathbb{E}_\theta[\tilde{g}(T(\mathbf{Y}))] = \theta; (iii) variance-improving: Varθ(g~(T(Y)))Varθ(g(Y))\text{Var}_\theta(\tilde{g}(T(\mathbf{Y}))) \leq \text{Var}_\theta(g(\mathbf{Y})), with equality iff g(Y)=g~(T(Y))g(\mathbf{Y}) = \tilde{g}(T(\mathbf{Y})) almost surely.

The conditional expectation is an L2L^2 orthogonal projection: it is the closest function of TT to the original estimator. Projection never increases the norm, which is exactly the variance reduction. The sufficiency assumption is what guarantees the conditional expectation does not depend on θ\theta --- so the projected estimator is an honest statistic, computable from the data alone.

, ,

Historical Note: Rao (1945), Blackwell (1947), Scheffe (1950)

1945--1950

C. R. Rao's 1945 paper introduced the inequality and the conditioning trick for a single statistic. D. Blackwell's 1947 note showed the same procedure works for any convex loss function and clarified its orthogonal-projection character. E. L. Lehmann and H. Scheffe closed the circle in 1950 by adding the completeness requirement, producing the uniqueness statement: there is at most one unbiased estimator that is a function of the complete sufficient statistic, and it is the MVUE. These three short papers (Rao's is four pages) gave point estimation its modern skeleton.

Theorem: Lehmann--Scheffe Theorem

Let T(Y)T(\mathbf{Y}) be a complete sufficient statistic for the family {fθ:θΛ}\{f_\theta : \theta \in \Lambda\}. If θ^(Y)=ψ(T(Y))\hat{\theta}(\mathbf{Y}) = \psi(T(\mathbf{Y})) is an unbiased estimator of θ\theta that is a function of TT, then θ^\hat{\theta} is the unique (almost-surely) minimum-variance unbiased estimator (MVUE).

Rao--Blackwell says: for any unbiased competitor gg, the projected estimator E[gT]\mathbb{E}[g \mid T] is a function of TT and has weakly smaller variance. Completeness of TT says: there is at most one unbiased function of TT --- any two would differ by a function of TT with zero mean, which completeness kills. Combining: the unbiased function of TT is the MVUE.

,

Recipe for Constructing the MVUE

Complexity: Depends on step 6 — closed form in the exponential family, Monte Carlo elsewhere.
Input: Parametric family {fθ:θΛ}\{f_\theta : \theta \in \Lambda\} and observation Y\mathbf{Y}.
Output: MVUE θ^(Y)\hat{\theta}(\mathbf{Y}) (when it exists).
1. Write the density fθ(y)f_\theta(\mathbf{y}) and identify the θ\theta-dependence.
2. Apply Fisher--Neyman to find a sufficient statistic T(Y)T(\mathbf{Y}).
3. Verify TT is minimal sufficient (Lehmann--Scheffe minimality criterion).
4. Verify TT is complete (usually by recognizing an exponential family with full-dimensional natural parameter).
5. Find any unbiased estimator g(Y)g(\mathbf{Y}) of θ\theta (can be naive: a single sample, or a small subvector).
6. Compute θ^(Y)=g~(T(Y))=Eθ[g(Y)T(Y)]\hat{\theta}(\mathbf{Y}) = \tilde{g}(T(\mathbf{Y})) = \mathbb{E}_\theta[g(\mathbf{Y}) \mid T(\mathbf{Y})].
7. Return θ^\hat{\theta}: by Lehmann--Scheffe, this is the unique MVUE.

In practice steps 4--6 collapse when the family is a textbook exponential family: the natural sufficient statistic is automatically complete, and an unbiased function of TT is often visible by inspection (e.g., rescaling the sample mean).

Example: MVUE of Amplitude in AWGN via Rao--Blackwell

Observe Y=As+Z\mathbf{Y} = A \mathbf{s} + \mathbf{Z}, Z\mathbf{Z} \sim i.i.d. N(0,σ2)\mathcal{N}(0, \sigma^2), ARA \in \mathbb{R} unknown. Starting from the naive unbiased estimator g(Y)=Y1/s1g(\mathbf{Y}) = Y_1/s_1 (for s10s_1 \neq 0), construct the MVUE via Rao--Blackwell.

Example: MVUE of σ2\sigma^2 from a Gaussian Sample

Given Y1,,YnY_1, \ldots, Y_n i.i.d. N(μ,σ2)\mathcal{N}(\mu, \sigma^2) with both unknown, find the MVUE of σ2\sigma^2.

Key Takeaway

The MVUE workflow. (1) find a complete sufficient statistic; (2) construct any unbiased function of it. That function is the MVUE. No calculus of variations needed --- sufficiency and completeness do the work. The subtlety that efficient \Rightarrow MVUE but not conversely is the reason we need this machinery beyond the CRB.

Efficient vs. MVUE vs. Unbiased

PropertyUnbiasedMVUEEfficient
Bias000000
VarianceAnything << \inftyMinimum among unbiasedEquals CRB 1/J(θ)1/J(\theta) for all θ\theta
Always exists?UsuallyNot always; needs complete sufficient statisticRare — requires affine score
CertificateCompute meanLehmann--ScheffeScore linear in θ^\hat{\theta}
Implication chainNeeded for MVUE\Leftarrow efficient\Rightarrow MVUE (if unbiased)

Common Mistake: Sufficient Does Not Imply Complete

Mistake:

Treating any sufficient statistic as a green light to apply Lehmann--Scheffe.

Correction:

Lehmann--Scheffe needs completeness, not just sufficiency. Example: in the location family N(θ,1)\mathcal{N}(\theta, 1) with θ{1,+1}\theta \in \{-1, +1\} (a discrete two-point parameter set), the sample mean is sufficient but not complete --- the parameter image is too small. Without completeness, multiple unbiased functions of TT can exist, and none is "the" MVUE. Completeness is usually secured by the full-dimensionality condition in the exponential family theorem.

Common Mistake: Sometimes No MVUE Exists

Mistake:

Assuming the MVUE always exists and that the workflow will terminate.

Correction:

There are families with no uniformly minimum-variance unbiased estimator: different θ\theta are minimized by different estimators, and no single function dominates the others. This typically happens outside the exponential family, or with nuisance parameters that destroy completeness. In such cases one resorts to locally minimum variance, asymptotic criteria, or Bayesian/risk-based approaches.

Rao--Blackwellization in Action

Start with a noisy naive unbiased estimator g(Y)=Y1/s1g(\mathbf{Y}) = Y_1 / s_1 for the amplitude AA, and apply Rao--Blackwell with respect to T(Y)=sTYT(\mathbf{Y}) = \mathbf{s}^T\mathbf{Y}. Compare the sampling distributions and variances as you sweep nn and SNR.

Parameters
20
0
🔧Engineering Note

Least Squares as the BLUE: An MVUE Within a Class

Under a linear observation model Y=Aθ+W\mathbf{Y} = \mathbf{A}\boldsymbol{\theta} + \mathbf{W} with zero-mean noise of covariance Σ\boldsymbol{\Sigma}, the weighted least-squares estimator θ^WLS=(ATΣ1A)1ATΣ1Y\hat{\boldsymbol{\theta}}_{\text{WLS}} = (\mathbf{A}^T \boldsymbol{\Sigma}^{-1} \mathbf{A})^{-1} \mathbf{A}^T \boldsymbol{\Sigma}^{-1} \mathbf{Y} is the best linear unbiased estimator (BLUE) --- the MVUE within the class of linear estimators, regardless of the noise distribution (Gauss--Markov theorem). When the noise is Gaussian, this is also the global MVUE. In channel estimation, this is why LS is the reference point even when the channel covariance is unknown: MMSE improves on it by exploiting priors, but LS is optimal when the prior is absent.

Practical Constraints
  • Requires ATΣ1A\mathbf{A}^T \boldsymbol{\Sigma}^{-1} \mathbf{A} invertible (identifiability)

  • Variance floor: (ATΣ1A)1(\mathbf{A}^T \boldsymbol{\Sigma}^{-1} \mathbf{A})^{-1} — the vector CRB in Gaussian case

  • Non-Gaussian noise: linear class optimality, but nonlinear estimators may beat it

📋 Ref: IEEE 802.11 pilot-aided channel estimation, Section 17.3.11

Quick Check

Under what condition does Rao--Blackwell give zero variance reduction?

The original estimator g(Y)g(\mathbf{Y}) is already a function of the sufficient statistic T(Y)T(\mathbf{Y})

The sufficient statistic has the same dimension as the data

The parameter is a scalar

The estimator is efficient

Quick Check

Which hypothesis in Lehmann--Scheffe guarantees uniqueness of the MVUE?

Sufficiency of TT

Completeness of TT

Unbiasedness of gg

The exponential-family form

Why This Matters: Channel Estimation: LS is the BLUE, MMSE is Beyond Unbiased

Classical pilot-based channel estimation returns the least-squares estimator h^LS=(XHX)1XHy\hat{\mathbf{h}}_{\text{LS}} = (\mathbf{X}^H\mathbf{X})^{-1}\mathbf{X}^H\mathbf{y} --- the BLUE by Gauss--Markov, and MVUE in the Gaussian case. MMSE channel estimators trade off unbiasedness for smaller MSE by exploiting the channel covariance Σh\boldsymbol{\Sigma}_\mathbf{h}; they live in a different optimality framework (Bayesian, Chapter 7). Both sides of this story are already visible in the machinery of this chapter.