The Likelihood Ratio Test

One Statistic to Rule Them All

The striking fact that emerged from Section 1.2 is this: for every reasonable cost structure and every pair of priors, the optimal rule compares the same function of the data --- the likelihood ratio L(y)=f1(y)/f0(y)L(y) = f_1(y)/f_0(y) --- to a threshold that absorbs all the problem-specific constants. Change the priors, change the costs, or move to the Neyman-Pearson criterion of Section 1.4: only the threshold Ξ·\eta changes. The likelihood ratio is the universal sufficient statistic for binary testing, and understanding its distribution is tantamount to understanding every binary detector.

Definition:

Likelihood Ratio Test (LRT)

A likelihood ratio test with threshold Ξ·>0\eta > 0 is the decision rule gΞ·(y)={1,L(y)>Ξ·,0,L(y)<Ξ·,g_\eta(y) = \begin{cases} 1, & L(y) > \eta, \\ 0, & L(y) < \eta, \end{cases} where L(y)β€…β€Š=β€…β€Šf1(y)f0(y)L(y) \;=\; \frac{f_1(y)}{f_0(y)} is the likelihood ratio. When L(y)=Ξ·L(y) = \eta the decision may be 0, 1, or (for randomised tests) random. The log-likelihood ratio is β„“(y)β€…β€Š=β€…β€Šlog⁑L(y)β€…β€Š=β€…β€Šlog⁑f1(y)βˆ’log⁑f0(y),\ell(y) \;=\; \log L(y) \;=\; \log f_1(y) - \log f_0(y), and the LRT is equivalently β„“(y)β‰·log⁑η\ell(y) \gtrless \log \eta.

Working with β„“(y)\ell(y) is almost always more convenient: for product densities fi(y1,…,yn)=∏kfi(yk)f_i(y_1,\dots,y_n) = \prod_k f_i(y_k) the LLR adds, giving β„“(y1,…,yn)=βˆ‘kβ„“(yk)\ell(y_1,\ldots,y_n) = \sum_k \ell(y_k). Log also converts exponential-family densities to affine statistics.

Theorem: Sufficiency of the Likelihood Ratio

For binary hypothesis testing, the likelihood ratio L(Y)L(Y) is a sufficient statistic. That is, the conditional distribution of YY given L(Y)=β„“L(Y) = \ell does not depend on which hypothesis is true, and any Bayes-optimal decision rule is a function of L(Y)L(Y) alone.

Sufficiency means the LR captures every bit of information about H\mathcal{H} that YY contains. Knowing YY beyond its likelihood ratio tells you nothing more about which hypothesis is true --- the "extra" randomness is independent of H\mathcal{H}. This is why every optimal detector (Bayes, MAP, ML, Neyman-Pearson) compresses YY down to L(Y)L(Y) before making its call.

Theorem: Invariance under Monotone Transformations

Let ϕ ⁣:R+β†’R\phi\colon \mathbb{R}_+ \to \mathbb{R} be strictly increasing. Then the decision rule 1{Ο•(L(y))>Ο•(Ξ·)}\mathbb{1}\{\phi(L(y)) > \phi(\eta)\} is identical to the LRT 1{L(y)>Ξ·}\mathbb{1}\{L(y) > \eta\} for every observation yy. In particular, the LRT is equivalent to any monotone transformation of the LR (log⁑\log, log⁑+c\log + c, linear, tanh⁑\tanh, ...).

Example: LRT for nn i.i.d. Gaussian Samples

Observations Y1,…,YnY_1, \ldots, Y_n are i.i.d. with H0:Yk∼N(0,Οƒ2),H1:Yk∼N(ΞΌ,Οƒ2),ΞΌ>0.\mathcal{H}_0: Y_k \sim \mathcal{N}(0, \sigma^2), \qquad \mathcal{H}_1: Y_k \sim \mathcal{N}(\mu, \sigma^2), \quad \mu > 0. Derive the LRT as a rule on the sample mean YΛ‰=1nβˆ‘kYk\bar Y = \frac{1}{n}\sum_k Y_k.

⚠️Engineering Note

Compute the LLR, Not the LR

In practice, always compute the log-likelihood ratio, never the likelihood ratio directly. Three reasons:

  1. Underflow/overflow. For a block of nn i.i.d. samples with typical SNR, LL can easily exceed 1030010^{300} or fall below 10βˆ’30010^{-300}, overflowing double precision. The LLR remains an O(n)O(n)-magnitude real.
  2. Addition beats multiplication. i.i.d. samples give β„“(y)=βˆ‘kβ„“(yk)\ell(\mathbf y) = \sum_k \ell(y_k), a single vectorised summation. Each summand is typically O(1)O(1), keeping precision throughout.
  3. LDPC / turbo decoders. Every modern iterative decoder passes LLRs as messages. The LLR representation is the universal interchange format for soft information.
Practical Constraints
  • β€’

    Represent posteriors/priors as log-domain quantities

  • β€’

    Use log-sum-exp when marginalising (converts logβ‘βˆ‘ex\log\sum e^x to a numerically stable form)

LRT Decision Regions in R2\mathbb{R}^2

Two-dimensional observation y∈R2\mathbf{y} \in \mathbb{R}^2, with N(μ0,I)\mathcal{N}(\boldsymbol{\mu}_0, \mathbf{I}) vs. N(μ1,I)\mathcal{N}(\boldsymbol{\mu}_1, \mathbf{I}). The LRT decision boundary is a hyperplane; changing the threshold shifts it parallel to itself.

Parameters
1.5
1
0

Log-threshold of the LRT

The Monotone Likelihood Ratio Property

A family of densities {fΞΈ}\{f_\theta\} has the monotone likelihood ratio (MLR) property in T(y)T(y) if fΞΈ1/fΞΈ0f_{\theta_1}/f_{\theta_0} is a non-decreasing function of T(y)T(y) whenever ΞΈ1>ΞΈ0\theta_1 > \theta_0. The Gaussian shift family, binomial in pp, Poisson in Ξ»\lambda, and all one-parameter exponential families in their natural parameter possess MLR. When MLR holds, LRTs reduce to simple threshold tests on TT, and the Neyman-Pearson lemma extends to one-sided composite alternatives (Chapter 2).

Common Mistake: Two-Sided Alternatives Are Not LRTs on a Single Statistic

Mistake:

Assuming every "reasonable" rule can be written as a threshold test on a scalar statistic.

Correction:

If the alternative is two-sided (e.g., H0:ΞΌ=0\mathcal{H}_0: \mu=0 vs. H1:ΞΌβ‰ 0\mathcal{H}_1: \mu \neq 0), the LR f1(y)/f0(y)f_1(y)/f_0(y) is not monotone in yy, and the LRT decision region is typically {∣y∣>Ο„}\{|y| > \tau\} --- the absolute value of yy is the sufficient statistic, not yy itself. Always derive the decision region from the LR directly; do not assume it is a half-line.

Quick Check

Which of the following transformations of the likelihood ratio preserves the LRT decision rule?

L(y)↦log⁑L(y)L(y) \mapsto \log L(y)

L(y)↦1βˆ’L(y)L(y) \mapsto 1 - L(y)

L(y)β†¦βˆ’log⁑L(y)L(y) \mapsto -\log L(y)

L(y)↦1/L(y)L(y) \mapsto 1/L(y)

Sufficient statistic

A statistic T(Y)T(Y) is sufficient for a hypothesis H\mathcal{H} if the conditional distribution of YY given T(Y)T(Y) does not depend on H\mathcal{H}. Equivalently, by the Fisher-Neyman factorization, fH(y)=gH(T(y))h(y)f_\mathcal{H}(y) = g_\mathcal{H}(T(y)) h(y). For binary testing, the likelihood ratio L(Y)L(Y) is always sufficient.

Related: likelihood ratio, Fisher-Neyman theorem, Rao--Blackwell Theorem

Historical Note: Fisher and the Likelihood Principle

1920s

Ronald A. Fisher (1890-1962) introduced the concept of likelihood as distinct from probability in his 1922 paper "On the mathematical foundations of theoretical statistics". Fisher argued that once data are observed, the likelihood function θ↦f(y∣θ)\theta \mapsto f(y \mid \theta) contains all the information the data carry about ΞΈ\theta. Neyman and Pearson then showed that binary decisions based on ratios of likelihoods are optimal --- closing the loop between Fisher's likelihood principle and the operational decision-theoretic framework.