Ferkans — Interactive Telecom Tutor

Why sufficient statistics matter

We have now seen the matched filter three times --- as LRT, as SNR-maximising linear filter, as continuous-time $L^2$ projection. The point is that every one of these derivations collapsed the full observation $\mathbf{y}\in\mathbb{R}^n$ (or $y(t) \in L^2$ ) into a single scalar statistic $T = \langle \mathbf{y}, \mathbf{s}\rangle$ . That compression is not accidental: $T$ is a sufficient statistic for the detection problem. Once we have it, the raw observation carries no additional information about which hypothesis is true. This section formalises sufficiency, states the Fisher--Neyman factorisation theorem, and uses it to explain why signal-space receivers for digital modulation reduce a waveform to its finite-dimensional projection.

Definition:
Sufficient Statistic

Let $\mathbf{Y}$ be an observation with density $f(\mathbf{y};\theta)$ depending on a parameter $\theta \in \Theta$ (here $\theta$ indexes the hypothesis). A statistic $T(\mathbf{Y})$ is sufficient for $\theta$ if the conditional distribution of $\mathbf{Y}$ given $T(\mathbf{Y}) = t$ does not depend on $\theta$ : $f(\mathbf{y} \mid T(\mathbf{y}) = t;\theta) = f(\mathbf{y} \mid T(\mathbf{y}) = t).$

Theorem: Fisher--Neyman Factorisation

A statistic $T(\mathbf{Y})$ is sufficient for $\theta$ if and only if the density admits the factorisation $f(\mathbf{y};\theta) = g(T(\mathbf{y}),\theta)\, h(\mathbf{y}),$ where $g$ depends on $\mathbf{y}$ only through $T(\mathbf{y})$ and $h$ does not depend on $\theta$ .

Any dependence on $\theta$ enters only through $T$ --- so $T$ captures all the parameter-relevant information.

Proof

Sufficient direction ($\Leftarrow$): factorisation implies sufficiency

Suppose $f(\mathbf{y};\theta) = g(T(\mathbf{y}),\theta)\,h(\mathbf{y})$ . The marginal density of $T$ is obtained by integrating over the level set $\{\mathbf{y}: T(\mathbf{y}) = t\}$ : $f_{T}(t;\theta) = g(t,\theta)\,\int_{\{T=t\}} h(\mathbf{y})\,d\mathbf{y}.$ The conditional density is then $f(\mathbf{y}\mid T = t;\theta) = \frac{g(t,\theta) h(\mathbf{y})}{g(t,\theta)\int_{\{T=t\}} h\,d\mathbf{y}} = \frac{h(\mathbf{y})}{\int_{\{T=t\}} h\,d\mathbf{y}},$ which is independent of $\theta$ .

Necessary direction ($\Rightarrow$): sufficiency implies factorisation

If $T$ is sufficient, write $f(\mathbf{y};\theta) = f(\mathbf{y} \mid T=T(\mathbf{y}))\cdot f_{T}(T(\mathbf{y});\theta)$ . Set $g(t,\theta) = f_{T}(t;\theta)$ and $h(\mathbf{y}) = f(\mathbf{y} \mid T = T(\mathbf{y}))$ . By the sufficiency assumption $h$ is free of $\theta$ , which gives the factorisation.

Consequence for hypothesis testing

In binary testing the parameter is the hypothesis index $\theta \in \{0,1\}$ . The likelihood ratio is then $L(\mathbf{y}) = \frac{f(\mathbf{y};\theta=1)}{f(\mathbf{y};\theta=0)} = \frac{g(T(\mathbf{y}),1)}{g(T(\mathbf{y}),0)},$ so $L$ depends on $\mathbf{y}$ only through $T$ . Any Bayes, Neyman--Pearson, or ML decision rule based on $L$ can equivalently be computed from $T$ alone.

Example: The Matched-Filter Output is Sufficient for Detection in AWGN

Show that for the binary problem $\mathcal{H}_0: \mathbf{Y} = \mathbf{W}$ versus $\mathcal{H}_1: \mathbf{Y} = \mathbf{s} + \mathbf{W}$ with $\mathbf{W} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ , the matched-filter statistic $T(\mathbf{y}) = \mathbf{s}^{\mathsf T}\mathbf{y}$ is sufficient for the hypothesis index.

Solution

Write the density under each hypothesis

$f(\mathbf{y};\theta) = (2\pi\sigma^2)^{-n/2} \exp\!\Big(-\tfrac{1}{2\sigma^2} \|\mathbf{y} - \theta \mathbf{s}\|^2\Big)$ for $\theta \in \{0,1\}$ .

Expand the squared norm

$\|\mathbf{y} - \theta \mathbf{s}\|^2 = \|\mathbf{y}\|^2 - 2\theta\, \mathbf{s}^{\mathsf T}\mathbf{y} + \theta^2 \|\mathbf{s}\|^2$ . Substitute into the density: $f(\mathbf{y};\theta) = (2\pi\sigma^2)^{-n/2} \exp\!\Big(-\tfrac{\|\mathbf{y}\|^2}{2\sigma^2}\Big) \cdot \exp\!\Big(\tfrac{\theta\, \mathbf{s}^{\mathsf T}\mathbf{y}}{\sigma^2} - \tfrac{\theta^2 \|\mathbf{s}\|^2}{2\sigma^2}\Big).$

Identify $g$ and $h$

Let $T(\mathbf{y}) = \mathbf{s}^{\mathsf T}\mathbf{y}$ . The second exponential depends on $\mathbf{y}$ only through $T$ , and the prefactor does not depend on $\theta$ . The factorisation $f(\mathbf{y};\theta) = g(T(\mathbf{y}),\theta) h(\mathbf{y})$ holds with $g(t,\theta) = \exp((\theta t - \theta^2\|\mathbf{s}\|^2/2)/\sigma^2)$ and $h(\mathbf{y}) = (2\pi\sigma^2)^{-n/2}\exp(-\|\mathbf{y}\|^2/(2\sigma^2))$ .

Conclude via Fisher--Neyman

By the Fisher--Neyman theorem, $T(\mathbf{Y}) = \mathbf{s}^{\mathsf T}\mathbf{Y}$ is sufficient for the hypothesis index. No further processing of $\mathbf{y}$ can help: the matched-filter output is all that is needed for any Bayes, Neyman--Pearson, or ML decision.

Dimensionality reduction for free

In the preceding example, the observation lives in $\mathbb{R}^n$ but the sufficient statistic is a scalar. That collapse --- from $n$ dimensions to $1$ --- is not a numerical trick; it is a structural fact about the problem. Sufficient statistics pinpoint the minimum dimensionality needed for optimal inference. When we move to $M$ -ary hypothesis testing in Chapter 3, the sufficient statistic becomes a vector of $M-1$ projections. When we move to parameter estimation in Part II, sufficient statistics tell us how many numbers we need to keep from a dataset of size $n$ .

Theorem: Sufficiency of Signal-Space Projections for Waveform Detection

Consider the $M$ -ary detection problem in continuous-time AWGN: $\mathcal{H}_m: y(t) = s_m(t) + w(t)$ , $m=0,\ldots,M-1$ , $t\in[0,T]$ , with $w(t)$ white Gaussian noise of PSD $N_0/2$ . Let $\{\phi_1,\ldots,\phi_N\}$ with $N \leq M$ be an orthonormal basis of $\operatorname{span}\{s_0,\ldots,s_{M-1}\}$ in $L^2[0,T]$ . The vector of projections $\mathbf{Y}_{\mathrm{proj}} \in \mathbb{R}^N$ with components $Y_k = \int_0^T y(t)\phi_k(t)\,dt$ is a sufficient statistic for the hypothesis index.

Proof

Decompose $y(t)$ into in-subspace and out-of-subspace parts

Write $y(t) = \sum_{k=1}^{N} Y_k \phi_k(t) + y_\perp(t)$ , where $y_\perp$ lies in the orthogonal complement of $\operatorname{span}\{\phi_k\}$ . Since each $s_m \in \operatorname{span}\{\phi_k\}$ , the signal contributes zero to $y_\perp(t)$ : $y_\perp(t)$ depends only on $w(t)$ and is independent of $m$ .

Independence of $Y_{\mathrm{proj}}$ and $y_\perp$

For white Gaussian noise, projections onto orthonormal functions are independent Gaussian random variables (see FSP Ch. 14). Thus $\mathbf{Y}_{\mathrm{proj}}$ and $y_\perp$ are independent, and $y_\perp$ 's distribution does not depend on $m$ .

Conditional density is free of the hypothesis

The density of $y(t)$ given $\mathbf{Y}_{\mathrm{proj}}$ is the density of $y_\perp$ --- independent of $m$ . By definition, $\mathbf{Y}_{\mathrm{proj}}$ is sufficient for the hypothesis.

,

Key Takeaway

The Gram--Schmidt-constructed projections form a sufficient statistic for $M$ -ary signal detection in AWGN. This is why every digital receiver in Chapters 8--10 of the telecom book is drawn as correlator bank + minimum distance decoder: correlation extracts the sufficient statistic, and the remaining scalar noise outside the signal subspace is discarded without loss.

Visualising the Sufficient-Statistic Collapse

A high-dimensional observation vector projected onto the signal subspace. The perpendicular component is noise-only and carries no information about the hypothesis.

Parameters

noise correlation

\rho

0.7

Why This Matters: From Sufficiency to the MIMO Receiver

The sufficiency argument generalises directly to MIMO receivers: the matched-filter bank $\mathbf{H}^H \mathbf{y}$ collects sufficient statistics for detecting the symbol vector, and everything downstream (ZF, MMSE, sphere decoding) operates on these projected observations. Chapter 15 of the telecom book builds on this fact.

Common Mistake: Sufficiency can fail when parameters are unknown

Mistake:

Assuming that the matched-filter output $T = \mathbf{s}^{\mathsf T}\mathbf{y}$ is still sufficient when the signal amplitude $A$ is unknown.

Correction:

When $\mathbf{s}$ is replaced by $A\mathbf{s}$ with $A$ an unknown parameter, the sufficient statistic must carry enough information to infer $A$ as well --- typically $(\mathbf{s}^{\mathsf T}\mathbf{y},\|\mathbf{y}\|^2)$ for Gaussian noise. The GLRT from §2 is exactly the construction that uses this larger sufficient statistic correctly.

Quick Check

Why is the inner product $\mathbf{s}^{\mathsf T}\mathbf{y}$ sufficient for detecting a known signal in white Gaussian noise?

Because it has maximum variance among all linear statistics.

Because the likelihood ratio depends on $\mathbf{y}$ only through this inner product (Fisher--Neyman).

Because the component of $\mathbf{y}$ perpendicular to $\mathbf{s}$ is always zero.

Because the noise is Gaussian.

Correction:

Because the likelihood ratio depends on

\mathbf{y}

only through this inner product (Fisher--Neyman).

The density factorises with $g(T(\mathbf{y}),\theta)$ depending only on the inner product.

Sufficient statistic

A function $T(\mathbf{Y})$ of the observations such that the conditional distribution of $\mathbf{Y}$ given $T$ is free of the parameter being inferred. Sufficient statistics preserve all information about the parameter while reducing dimensionality.

🎓CommIT Contribution(2023)

Subspace Matched Filters for Joint Sensing and Communication

G. Caire, S. Saur, A. Bazzi — IEEE Journal on Selected Areas in Information Theory

The sufficient-statistic view developed here extends naturally to integrated sensing and communication (ISAC) systems, where the same waveform must carry information and probe the environment. The CommIT group has shown that the optimal ISAC receiver decomposes the observation into orthogonal subspaces carrying (i) the communication payload and (ii) the sensing parameters, with each subspace admitting its own matched filter. The Fisher--Neyman machinery from this section is the formal underpinning of that decomposition.

isacsufficient-statisticsmatched-filter

⚠️Engineering Note

Sufficiency determines ADC and sampling requirements

In practice sufficiency guides system design: you only need to sample and digitise in the signal subspace. For a PAM/QAM receiver with $M$ symbols on pulse shape $p(t)$ , a single matched-filter output per symbol is sufficient --- you do not need to oversample and then post-process. This is why real receivers use symbol-rate sampling after the matched filter, cutting the ADC data rate to the signal bandwidth.

Sufficient Statistics for Detection