Ferkans — Interactive Telecom Tutor

Why Binary Hypothesis Testing?

The point is that every receiver --- whether it demodulates a BPSK symbol, detects a radar target, classifies a radiology image, or screens for spam --- ultimately answers a yes/no question from noisy evidence. Binary hypothesis testing is the atomic unit of statistical inference: the smallest non-trivial decision problem, for which we can derive, analyse, and bound the optimal detector in closed form.

Once we understand the binary case completely, $M$ -ary detection, composite testing, and sequential decisions all follow as natural extensions. The likelihood ratio we introduce here will reappear --- renamed, reshaped, vectorised --- in every chapter of this book and in every receiver architecture you will ever design.

Definition:
Binary Hypothesis Testing Problem

A binary hypothesis testing problem consists of:

Two hypotheses $\mathcal{H}_0$ (the null hypothesis) and $\mathcal{H}_1$ (the alternative hypothesis) about the state of nature.
An observation space $\mathcal{Y}$ (typically $\mathbb{R}^n$ or a countable set) together with an observation random variable $Y \in \mathcal{Y}$ .
Two conditional densities (or probability mass functions) $f_0(y) = f(y \mid \mathcal{H}_0)$ and $f_1(y) = f(y \mid \mathcal{H}_1)$ , specifying how $Y$ is distributed under each hypothesis.

A decision rule (or detector) is a measurable function $g\colon \mathcal{Y} \to \{0, 1\},$ that maps each observation to a decision in favour of $\mathcal{H}_0$ or $\mathcal{H}_1$ .

We say the hypotheses are simple when $f_0$ and $f_1$ are completely specified. When either density depends on unknown parameters (e.g., a signal of unknown amplitude), the hypothesis is composite --- treated in Chapter 2.

Definition:
Decision Regions

Every decision rule $g$ partitions the observation space into two disjoint decision regions $\mathcal{Y}_0 = \{y \in \mathcal{Y} : g(y) = 0\}, \qquad \mathcal{Y}_1 = \{y \in \mathcal{Y} : g(y) = 1\},$ with $\mathcal{Y}_0 \cup \mathcal{Y}_1 = \mathcal{Y}$ and $\mathcal{Y}_0 \cap \mathcal{Y}_1 = \emptyset$ . Conversely, any such partition defines a decision rule. Designing a detector is therefore equivalent to choosing a partition of $\mathcal{Y}$ .

Definition:
Type I and Type II Errors

For a decision rule $g$ with decision regions $\mathcal{Y}_0, \mathcal{Y}_1$ , define:

False-alarm probability (Type I error): $P_f(g) \;=\; P(g(Y) = 1 \mid \mathcal{H}_0) \;=\; \int_{\mathcal{Y}_1} f_0(y)\,dy.$
Miss probability (Type II error): $P_M(g) \;=\; P(g(Y) = 0 \mid \mathcal{H}_1) \;=\; \int_{\mathcal{Y}_0} f_1(y)\,dy.$
Detection probability (power): $P_d(g) \;=\; 1 - P_M(g) \;=\; \int_{\mathcal{Y}_1} f_1(y)\,dy.$

The names false alarm, miss, and detection come from radar, where $\mathcal{H}_1$ is the presence of a target. In statistics one speaks of size ( $P_F$ ) and power ( $P_D$ ).

Discrete observations use sums over $\mathcal{Y}_0, \mathcal{Y}_1$ in place of integrals. All our results apply to both cases with the obvious substitution.

Key Takeaway

Shrinking $P_f$ enlarges $\mathcal{Y}_0$ and therefore increases $P_M$ ; shrinking $P_M$ enlarges $\mathcal{Y}_1$ and increases $P_f$ . The fundamental tradeoff of detection is that the two error types are coupled through the same partition of $\mathcal{Y}$ --- you cannot reduce both by tuning the detector alone. Only by acquiring more informative data can both be simultaneously shrunk.

Example: Binary Hypothesis Testing with a Gaussian Mean Shift

Let $Y \sim \mathcal{N}(0, 1)$ under $\mathcal{H}_0$ and $Y \sim \mathcal{N}(\mu, 1)$ under $\mathcal{H}_1$ , with $\mu > 0$ . For a threshold detector $g(y) = \mathbb{1}\{y > \tau\},$ compute $P_f$ and $P_d$ as functions of $\tau$ and $\mu$ .

Solution

False alarm

Under $\mathcal{H}_0$ , $Y \sim \mathcal{N}(0,1)$ , so $P_f(\tau) = P(Y > \tau \mid \mathcal{H}_0) = Q(\tau).$

Detection

Under $\mathcal{H}_1$ , $Y \sim \mathcal{N}(\mu,1)$ , so $(Y-\mu) \sim \mathcal{N}(0,1)$ and $P_d(\tau) = P(Y > \tau \mid \mathcal{H}_1) = P(Y - \mu > \tau - \mu \mid \mathcal{H}_1) = Q(\tau - \mu).$

Tradeoff

As $\tau \uparrow$ , $P_f(\tau) \downarrow$ and $P_d(\tau) \downarrow$ ; as $\tau \downarrow$ , both increase. For any $\tau$ , $P_d(\tau) = Q(Q^{-1}(P_f) - \mu),$ which is the ROC curve for this problem. Larger $\mu$ (stronger signal) pushes the curve toward the upper-left corner.

Overlap of Two Gaussians Under $\mathcal{H}_0$ and $\mathcal{H}_1$

Vary the mean separation $\mu$ and the threshold $\tau$ . The shaded regions are the false-alarm area (blue, right of $\tau$ under $f_0$ ) and the miss area (red, left of $\tau$ under $f_1$ ).

Parameters

\mu

1.5

Mean separation between $\mathcal{H}_0$ and $\mathcal{H}_1$

\sigma

1

Common standard deviation

\tau

0.75

Decision threshold

Historical Note: Neyman, Pearson, and the Birth of Hypothesis Testing

1920s-1930s

Modern hypothesis testing emerged from a famous collaboration between the Polish statistician Jerzy Neyman (1894-1981) and the English statistician Egon Pearson (1895-1980), son of Karl Pearson. Between 1928 and 1933, working by mail across London and Warsaw, they developed the framework of two hypotheses ( $\mathcal{H}_0$ and $\mathcal{H}_1$ ), the distinction between Type I and Type II errors, and the optimality notion that would become the Neyman-Pearson lemma (Section 1.4).

Their approach broke from Ronald Fisher's significance-testing tradition (which considered only the null hypothesis) and supplied the operational vocabulary --- size, power, critical region --- that every radar engineer, quality inspector, and clinical trialist uses today. Neyman emigrated to Berkeley in 1938 and founded the influential Berkeley statistics department; Pearson succeeded his father at University College London.

Common Mistake: The Direction of Error Probabilities

Mistake:

Treating $P_d$ and $P_f$ symmetrically and writing " $P_d + P_f = 1$ ".

Correction:

The correct relation is $P_d + P_M = 1$ because $\{g=1\}$ and $\{g=0\}$ are complementary events under a fixed hypothesis. $P_d$ and $P_f$ , by contrast, are computed under different hypotheses and are coupled only through the decision region. For a random detector that flips a fair coin, $P_f = P_d = 1/2$ , showing $P_f + P_d$ can equal anything in $[0, 2]$ .

Quick Check

A detector has $P_f = 0.1$ and $P_d = 0.8$ . What is its miss probability $P_M$ ?

$P_M = 0.2$

$P_M = 0.9$

$P_M = 0.1$

Cannot be determined without $\pi_0, \pi_1$

Correction:

P_M = 0.2

$P_M = 1 - P_d = 1 - 0.8 = 0.2$ . The miss and detection probabilities always sum to one under $\mathcal{H}_1$ .

Type I error (false alarm)

The event of deciding $\mathcal{H}_1$ when $\mathcal{H}_0$ is true. Its probability $P_f = P(g=1 \mid \mathcal{H}_0)$ is also called the size of the test in classical statistics and the false-alarm rate in radar.

Type II error (miss)

The event of deciding $\mathcal{H}_0$ when $\mathcal{H}_1$ is true. Its probability $P_M = P(g=0 \mid \mathcal{H}_1) = 1 - P_d$ . The complementary quantity $P_d = 1 - P_M$ is called the power of the test.

Why This Matters: From Binary Hypothesis Testing to BPSK Demodulation

In a BPSK receiver, the transmitter sends either $+\sqrt{E_s}$ (bit 1) or $-\sqrt{E_s}$ (bit 0) over an AWGN channel. The receiver observes $Y = \pm\sqrt{E_s} + W$ with $W \sim \mathcal{N}(0, \sigma^2)$ , and must decide which bit was sent. This is exactly the Gaussian mean-shift problem of EBinary Hypothesis Testing with a Gaussian Mean Shift with $\mu = 2\sqrt{E_s}/{\sigma^2}^{1/2}$ (in the centred formulation). The LRT we develop in Section 1.3 collapses to threshold detection at zero, and the error probability $P_e = Q(\sqrt{2E_s/N_0})$ is the celebrated BPSK formula. Chapter 2 returns to this link with the general vector-AWGN theory.

The Decision Problem