Ferkans — Interactive Telecom Tutor

One Statistic to Rule Them All

The striking fact that emerged from Section 1.2 is this: for every reasonable cost structure and every pair of priors, the optimal rule compares the same function of the data --- the likelihood ratio $L(y) = f_1(y)/f_0(y)$ --- to a threshold that absorbs all the problem-specific constants. Change the priors, change the costs, or move to the Neyman-Pearson criterion of Section 1.4: only the threshold $\eta$ changes. The likelihood ratio is the universal sufficient statistic for binary testing, and understanding its distribution is tantamount to understanding every binary detector.

Definition:
Likelihood Ratio Test (LRT)

A likelihood ratio test with threshold $\eta > 0$ is the decision rule $g_\eta(y) = \begin{cases} 1, & L(y) > \eta, \\ 0, & L(y) < \eta, \end{cases}$ where $L(y) \;=\; \frac{f_1(y)}{f_0(y)}$ is the likelihood ratio. When $L(y) = \eta$ the decision may be 0, 1, or (for randomised tests) random. The log-likelihood ratio is $\ell(y) \;=\; \log L(y) \;=\; \log f_1(y) - \log f_0(y),$ and the LRT is equivalently $\ell(y) \gtrless \log \eta$ .

Working with $\ell(y)$ is almost always more convenient: for product densities $f_i(y_1,\dots,y_n) = \prod_k f_i(y_k)$ the LLR adds, giving $\ell(y_1,\ldots,y_n) = \sum_k \ell(y_k)$ . Log also converts exponential-family densities to affine statistics.

Theorem: Sufficiency of the Likelihood Ratio

For binary hypothesis testing, the likelihood ratio $L(Y)$ is a sufficient statistic. That is, the conditional distribution of $Y$ given $L(Y) = \ell$ does not depend on which hypothesis is true, and any Bayes-optimal decision rule is a function of $L(Y)$ alone.

Sufficiency means the LR captures every bit of information about $\mathcal{H}$ that $Y$ contains. Knowing $Y$ beyond its likelihood ratio tells you nothing more about which hypothesis is true --- the "extra" randomness is independent of $\mathcal{H}$ . This is why every optimal detector (Bayes, MAP, ML, Neyman-Pearson) compresses $Y$ down to $L(Y)$ before making its call.

Show Hint

Recall the Fisher-Neyman factorization criterion: $T(Y)$ is sufficient iff $f_j(y) = g_j(T(y))\, h(y)$ for each $j$ , with $h$ free of $j$ .

Set $T(y) = L(y)$ and try to express $f_0$ and $f_1$ through $T$ and a common factor.

Proof

Factorization criterion

Define $T(y) = L(y) = f_1(y)/f_0(y)$ . Then trivially $f_1(y) = T(y) \cdot f_0(y), \qquad f_0(y) = 1 \cdot f_0(y).$ Setting $g_0(t) = 1$ , $g_1(t) = t$ , $h(y) = f_0(y)$ we obtain $f_j(y) = g_j(T(y))\, h(y)$ for $j \in \{0,1\}$ , so $T$ is sufficient by the Fisher-Neyman factorization theorem.

Bayes rule factors through the LR

The Bayes rule of TBayes-Optimal Decision Rule is $\mathbb{1}\{f_1(y)/f_0(y) > \tau^\star\} = \mathbb{1}\{T(y) > \tau^\star\}$ , i.e., a measurable function of $T(Y)$ alone. By the Rao-Blackwell principle, no rule using more of $Y$ can improve upon it. $\blacksquare$

Theorem: Invariance under Monotone Transformations

Let $\phi\colon \mathbb{R}_+ \to \mathbb{R}$ be strictly increasing. Then the decision rule $\mathbb{1}\{\phi(L(y)) > \phi(\eta)\}$ is identical to the LRT $\mathbb{1}\{L(y) > \eta\}$ for every observation $y$ . In particular, the LRT is equivalent to any monotone transformation of the LR ( $\log$ , $\log + c$ , linear, $\tanh$ , ...).

Proof

Monotonicity preserves ordering

Because $\phi$ is strictly increasing, $L(y) > \eta$ iff $\phi(L(y)) > \phi(\eta)$ . The decision regions $\{y: L(y) > \eta\}$ and $\{y: \phi(L(y)) > \phi(\eta)\}$ coincide. Consequently the error probabilities $P_f, P_d$ are unchanged. $\blacksquare$

Example: LRT for $n$ i.i.d. Gaussian Samples

Observations $Y_1, \ldots, Y_n$ are i.i.d. with $\mathcal{H}_0: Y_k \sim \mathcal{N}(0, \sigma^2), \qquad \mathcal{H}_1: Y_k \sim \mathcal{N}(\mu, \sigma^2), \quad \mu > 0.$ Derive the LRT as a rule on the sample mean $\bar Y = \frac{1}{n}\sum_k Y_k$ .

Solution

Write the log-likelihood ratio

For each sample, $\log\frac{f_1(y)}{f_0(y)} = -\frac{(y-\mu)^2}{2\sigma^2} + \frac{y^2}{2\sigma^2} = \frac{\mu y}{\sigma^2} - \frac{\mu^2}{2\sigma^2}.$ Summing over $k$ , $\ell(\mathbf{y}) = \frac{\mu}{\sigma^2}\sum_{k=1}^n y_k - \frac{n \mu^2}{2\sigma^2} = \frac{n\mu}{\sigma^2}\Bigl(\bar y - \mu/2\Bigr).$

Reduce to a test on $\bar y$

Because $\mu/\sigma^2 > 0$ , the map $\bar y \mapsto \ell$ is strictly increasing, and the LRT $\ell \gtrless \log \eta$ is equivalent to $\bar y \gtrless \tau, \qquad \tau = \frac{\mu}{2} + \frac{\sigma^2}{n\mu}\log\eta.$ The sufficient statistic is the sample mean --- all other aspects of $(Y_1,\dots,Y_n)$ are irrelevant.

Distribution of $\bar Y$ under each hypothesis

Under $\mathcal{H}_0$ , $\bar Y \sim \mathcal{N}(0, \sigma^2/n)$ ; under $\mathcal{H}_1$ , $\bar Y \sim \mathcal{N}(\mu, \sigma^2/n)$ . Hence $P_f(\tau) = Q\!\left(\frac{\tau \sqrt{n}}{\sigma}\right),\qquad P_d(\tau) = Q\!\left(\frac{(\tau-\mu)\sqrt{n}}{\sigma}\right).$ The effective SNR of the test grows linearly with $n$ : $\text{SNR}_{\text{test}} = n\mu^2/\sigma^2$ .

⚠️Engineering Note

Compute the LLR, Not the LR

In practice, always compute the log-likelihood ratio, never the likelihood ratio directly. Three reasons:

Underflow/overflow. For a block of $n$ i.i.d. samples with typical SNR, $L$ can easily exceed $10^{300}$ or fall below $10^{-300}$ , overflowing double precision. The LLR remains an $O(n)$ -magnitude real.
Addition beats multiplication. i.i.d. samples give $\ell(\mathbf y) = \sum_k \ell(y_k)$ , a single vectorised summation. Each summand is typically $O(1)$ , keeping precision throughout.
LDPC / turbo decoders. Every modern iterative decoder passes LLRs as messages. The LLR representation is the universal interchange format for soft information.

Practical Constraints

•
Represent posteriors/priors as log-domain quantities
•
Use log-sum-exp when marginalising (converts $\log\sum e^x$ to a numerically stable form)

LRT Decision Regions in $\mathbb{R}^2$

Two-dimensional observation $\mathbf{y} \in \mathbb{R}^2$ , with $\mathcal{N}(\boldsymbol{\mu}_0, \mathbf{I})$ vs. $\mathcal{N}(\boldsymbol{\mu}_1, \mathbf{I})$ . The LRT decision boundary is a hyperplane; changing the threshold shifts it parallel to itself.

Parameters

\mu_{1,x}

1.5

\mu_{1,y}

1

\log\eta

0

Log-threshold of the LRT

The Monotone Likelihood Ratio Property

A family of densities $\{f_\theta\}$ has the monotone likelihood ratio (MLR) property in $T(y)$ if $f_{\theta_1}/f_{\theta_0}$ is a non-decreasing function of $T(y)$ whenever $\theta_1 > \theta_0$ . The Gaussian shift family, binomial in $p$ , Poisson in $\lambda$ , and all one-parameter exponential families in their natural parameter possess MLR. When MLR holds, LRTs reduce to simple threshold tests on $T$ , and the Neyman-Pearson lemma extends to one-sided composite alternatives (Chapter 2).

Common Mistake: Two-Sided Alternatives Are Not LRTs on a Single Statistic

Mistake:

Assuming every "reasonable" rule can be written as a threshold test on a scalar statistic.

Correction:

If the alternative is two-sided (e.g., $\mathcal{H}_0: \mu=0$ vs. $\mathcal{H}_1: \mu \neq 0$ ), the LR $f_1(y)/f_0(y)$ is not monotone in $y$ , and the LRT decision region is typically $\{|y| > \tau\}$ --- the absolute value of $y$ is the sufficient statistic, not $y$ itself. Always derive the decision region from the LR directly; do not assume it is a half-line.

Quick Check

Which of the following transformations of the likelihood ratio preserves the LRT decision rule?

$L(y) \mapsto \log L(y)$

$L(y) \mapsto 1 - L(y)$

$L(y) \mapsto -\log L(y)$

$L(y) \mapsto 1/L(y)$

Correction:

L(y) \mapsto \log L(y)

Log is strictly increasing on $(0, \infty)$ , so monotone invariance (TInvariance under Monotone Transformations) applies.

Sufficient statistic

A statistic $T(Y)$ is sufficient for a hypothesis $\mathcal{H}$ if the conditional distribution of $Y$ given $T(Y)$ does not depend on $\mathcal{H}$ . Equivalently, by the Fisher-Neyman factorization, $f_\mathcal{H}(y) = g_\mathcal{H}(T(y)) h(y)$ . For binary testing, the likelihood ratio $L(Y)$ is always sufficient.

Historical Note: Fisher and the Likelihood Principle

1920s

Ronald A. Fisher (1890-1962) introduced the concept of likelihood as distinct from probability in his 1922 paper "On the mathematical foundations of theoretical statistics". Fisher argued that once data are observed, the likelihood function $\theta \mapsto f(y \mid \theta)$ contains all the information the data carry about $\theta$ . Neyman and Pearson then showed that binary decisions based on ratios of likelihoods are optimal --- closing the loop between Fisher's likelihood principle and the operational decision-theoretic framework.

The Likelihood Ratio Test