The Decision Problem
Why Binary Hypothesis Testing?
The point is that every receiver --- whether it demodulates a BPSK symbol, detects a radar target, classifies a radiology image, or screens for spam --- ultimately answers a yes/no question from noisy evidence. Binary hypothesis testing is the atomic unit of statistical inference: the smallest non-trivial decision problem, for which we can derive, analyse, and bound the optimal detector in closed form.
Once we understand the binary case completely, -ary detection, composite testing, and sequential decisions all follow as natural extensions. The likelihood ratio we introduce here will reappear --- renamed, reshaped, vectorised --- in every chapter of this book and in every receiver architecture you will ever design.
Definition: Binary Hypothesis Testing Problem
Binary Hypothesis Testing Problem
A binary hypothesis testing problem consists of:
- Two hypotheses (the null hypothesis) and (the alternative hypothesis) about the state of nature.
- An observation space (typically or a countable set) together with an observation random variable .
- Two conditional densities (or probability mass functions) and , specifying how is distributed under each hypothesis.
A decision rule (or detector) is a measurable function that maps each observation to a decision in favour of or .
We say the hypotheses are simple when and are completely specified. When either density depends on unknown parameters (e.g., a signal of unknown amplitude), the hypothesis is composite --- treated in Chapter 2.
Definition: Decision Regions
Decision Regions
Every decision rule partitions the observation space into two disjoint decision regions with and . Conversely, any such partition defines a decision rule. Designing a detector is therefore equivalent to choosing a partition of .
Definition: Type I and Type II Errors
Type I and Type II Errors
For a decision rule with decision regions , define:
- False-alarm probability (Type I error):
- Miss probability (Type II error):
- Detection probability (power):
The names false alarm, miss, and detection come from radar, where is the presence of a target. In statistics one speaks of size () and power ().
Discrete observations use sums over in place of integrals. All our results apply to both cases with the obvious substitution.
Key Takeaway
Shrinking enlarges and therefore increases ; shrinking enlarges and increases . The fundamental tradeoff of detection is that the two error types are coupled through the same partition of --- you cannot reduce both by tuning the detector alone. Only by acquiring more informative data can both be simultaneously shrunk.
Example: Binary Hypothesis Testing with a Gaussian Mean Shift
Let under and under , with . For a threshold detector compute and as functions of and .
False alarm
Under , , so
Detection
Under , , so and
Tradeoff
As , and ; as , both increase. For any , which is the ROC curve for this problem. Larger (stronger signal) pushes the curve toward the upper-left corner.
Overlap of Two Gaussians Under and
Vary the mean separation and the threshold . The shaded regions are the false-alarm area (blue, right of under ) and the miss area (red, left of under ).
Parameters
Mean separation between $\mathcal{H}_0$ and $\mathcal{H}_1$
Common standard deviation
Decision threshold
Historical Note: Neyman, Pearson, and the Birth of Hypothesis Testing
1920s-1930sModern hypothesis testing emerged from a famous collaboration between the Polish statistician Jerzy Neyman (1894-1981) and the English statistician Egon Pearson (1895-1980), son of Karl Pearson. Between 1928 and 1933, working by mail across London and Warsaw, they developed the framework of two hypotheses ( and ), the distinction between Type I and Type II errors, and the optimality notion that would become the Neyman-Pearson lemma (Section 1.4).
Their approach broke from Ronald Fisher's significance-testing tradition (which considered only the null hypothesis) and supplied the operational vocabulary --- size, power, critical region --- that every radar engineer, quality inspector, and clinical trialist uses today. Neyman emigrated to Berkeley in 1938 and founded the influential Berkeley statistics department; Pearson succeeded his father at University College London.
Common Mistake: The Direction of Error Probabilities
Mistake:
Treating and symmetrically and writing "".
Correction:
The correct relation is because and are complementary events under a fixed hypothesis. and , by contrast, are computed under different hypotheses and are coupled only through the decision region. For a random detector that flips a fair coin, , showing can equal anything in .
Quick Check
A detector has and . What is its miss probability ?
Cannot be determined without
. The miss and detection probabilities always sum to one under .
Type I error (false alarm)
The event of deciding when is true. Its probability is also called the size of the test in classical statistics and the false-alarm rate in radar.
Related: Type II error (miss), significance level, ROC curve
Type II error (miss)
The event of deciding when is true. Its probability . The complementary quantity is called the power of the test.
Related: Type I error (false alarm), detection probability, Power
Why This Matters: From Binary Hypothesis Testing to BPSK Demodulation
In a BPSK receiver, the transmitter sends either (bit 1) or (bit 0) over an AWGN channel. The receiver observes with , and must decide which bit was sent. This is exactly the Gaussian mean-shift problem of EBinary Hypothesis Testing with a Gaussian Mean Shift with (in the centred formulation). The LRT we develop in Section 1.3 collapses to threshold detection at zero, and the error probability is the celebrated BPSK formula. Chapter 2 returns to this link with the general vector-AWGN theory.