Sufficient Statistics and the Exponential Family
Compressing Data Without Losing Information
A raw observation vector may have thousands of components, but the inference task may only need a handful of summaries: a sum, a sum of squares, an inner product with a known pilot. Sufficiency makes this idea precise: a statistic is sufficient for if, once is known, the remaining variability in says nothing about . Conditioning on is a lossless compression of the data for the purpose of inference about --- this is the rigorous statement behind every matched filter, every correlator, every frequency-bin summary in a receiver.
Definition: Sufficient Statistic
Sufficient Statistic
A statistic is sufficient for in the family if the conditional distribution of given does not depend on : Equivalently, the parameter is conditionally independent of given .
Sufficiency is a property of the statistic, not of an estimator. Every one-to-one transform of a sufficient statistic is sufficient. The trivial statistic is always sufficient --- the interesting question is how far we can compress while preserving sufficiency.
Theorem: Fisher--Neyman Factorization Theorem
A statistic is sufficient for in the family if and only if the density admits the factorization for some measurable and with independent of .
Read the theorem as a certificate of sufficiency: if the likelihood's -dependence enters only through , then is sufficient. In practice we never compute the conditional distribution --- we stare at the density, group every occurrence of , and read off directly.
For the "if" direction, compute by dividing by and use the factorization.
For the "only if" direction, set and and verify independence from .
Sufficient direction: factorization implies sufficiency
Suppose . For discrete (the continuous case is analogous with densities), . Hence , which does not depend on .
Necessary direction: sufficiency implies factorization
Suppose is sufficient: the conditional does not depend on ; call it . Then . Set and . The continuous case requires a version of the Radon--Nikodym theorem but the argument is structurally identical.
Example: Factorization for the Gaussian Location--Scale Family
Let be i.i.d. with both parameters unknown, . Identify a sufficient statistic.
Expand the log-likelihood
. Expanding the square, , so the exponent depends on only through .
Read off the sufficient statistic
Hence is sufficient for : the entire -dimensional observation is compressed into two numbers. Equivalently, is sufficient --- a one-to-one transform of .
Example: Factorization: Signal Amplitude in AWGN
Observe with known and . Find a sufficient statistic for the scalar amplitude .
Density
. Expand: .
Identify the $\theta$-dependence
The -dependence enters only through the inner product . Therefore is a scalar sufficient statistic for . The matched filter is a sufficient statistic! This is the statistical reason matched-filter receivers never lose information about the amplitude, even after collapsing real observations into one scalar.
Definition: Minimal Sufficient Statistic
Minimal Sufficient Statistic
A sufficient statistic is minimal if, for every other sufficient statistic , there exists a function such that almost surely. Equivalently, induces the coarsest partition of among sufficient statistics.
The Lehmann--Scheffe criterion for minimality: is minimal sufficient iff if and only if the likelihood ratio does not depend on . For the Gaussian example above, is minimal sufficient when both and are unknown.
Definition: Exponential Family
Exponential Family
A parametric family is an exponential family in canonical form if there exist measurable functions , , , and such that The vector is the natural parameter, the natural sufficient statistic, and the log-partition (or cumulant) function.
By the Fisher--Neyman factorization, is automatically sufficient for . Members: Bernoulli, binomial, Poisson, Gaussian (with known or unknown variance), exponential, gamma, beta, Dirichlet, multinomial --- most families you will meet in practice.
Theorem: Complete Sufficient Statistic in the Exponential Family
Consider an exponential family with . If the natural-parameter image contains a -dimensional open rectangle, then is a complete sufficient statistic: any function with for all satisfies almost surely.
The exponential family's density is a moment generating function of , evaluated at the natural parameter . If a function of integrates to zero against every density in the family, it integrates to zero against every point of an open set of exponentials --- which forces it to be zero by analytic continuation. Completeness is what upgrades a sufficient statistic into a tool for producing unique unbiased estimators (Lehmann--Scheffe).
From completeness to moment generating functions
Suppose for all . Writing for the density of under , we have for some , so over the open rectangle of 's.
Analytic continuation
Define , an entire analytic function of (by dominated convergence). It vanishes on a -dimensional open rectangle, hence on all of . Inverse Laplace transform then forces a.e., i.e., a.e. on the support of .
Example: Gaussian as an Exponential Family
Write the i.i.d. Gaussian model for samples in canonical exponential form and identify the natural sufficient statistic.
Expand the density
Read off $\eta$, $T$, $A$
Set and . The log-partition is . Since contains an open rectangle in , is a complete sufficient statistic.
Likelihood as a Function of : Sufficient Statistic as Dimensionality Reduction
Draw i.i.d. samples from . Plot the log-likelihood as a function of for several random realizations of . Observe that the curves differ only by a vertical shift determined by : the sufficient statistic.
Parameters
Common Mistake: Sufficient vs. Minimal Sufficient
Mistake:
Treating the full observation as "a sufficient statistic" and concluding nothing interesting has been gained.
Correction:
Sufficiency holds vacuously for the identity statistic. The useful question is minimal sufficiency: how far can we compress before losing information? In the exponential family, the natural sufficient statistic typically has dimension and is minimal. That dimension gap is the compression ratio.
Fisher--Neyman Factorization for the Gaussian Family
Sufficient Statistic
A function such that the conditional distribution of given is free of . Equivalently, it is the only feature of the data the likelihood ever sees.
Related: Minimal Sufficient Statistic, Exponential Family, Complete Statistic
Exponential Family
A family of distributions of the form . Contains most workhorse distributions of practice and enjoys automatic sufficiency and (under mild conditions) completeness.
Related: Sufficient Statistic, Complete Statistic, Conjugate Prior
Complete Statistic
A statistic whose only unbiased estimator of zero is zero itself: for all implies a.s. Completeness is the key ingredient in uniqueness of the MVUE.
Related: Sufficient Statistic, Exponential Family, A Procedure for Building the MVUE
Quick Check
For with and unknown, which of the following is a sufficient statistic for ?
itself
After expanding the Gaussian density, the -dependence enters only through the matched-filter output ; the rest of the data is ancillary.