Wald's Sequential Probability Ratio Test (SPRT)
Why Stop Early?
In the fixed-sample-size detection framework of Chapter 1, the statistician collects samples, forms the log-likelihood ratio , and compares it to a threshold. The sample size is chosen in advance to meet target false-alarm and miss probabilities .
Two observations motivate a sequential approach. First, when the true hypothesis is "easy" β for example, the LLR increments are all strongly positive β a few samples suffice, and the remaining observations are wasted. Second, when the true hypothesis is "hard" β the LLR walks near zero β no fixed is large enough to guarantee reliable detection.
Wald's sequential probability ratio test (SPRT) resolves both issues. The sample size becomes a random variable, determined by the data: observe one sample at a time, update the cumulative LLR, and decide as soon as it crosses an upper or lower threshold. For a given target pair , the SPRT uses on average the smallest possible number of samples among all tests achieving those error levels β this is the Wald-Wolfowitz optimality theorem.
This section develops the SPRT from its threshold design through Wald's identity for the ASN, and closes with its connection to early stopping in iterative decoders, where the decision statistic is monitored between iterations.
Definition: Sequential Probability Ratio Test (SPRT)
Sequential Probability Ratio Test (SPRT)
Let be i.i.d. observations under either (density ) or (density ). Define the per-sample log-likelihood ratio and the cumulative LLR
Fix two thresholds . The SPRT is the pair defined by the stopping rule
and the terminal decision
The test continues (takes another sample) while .
The thresholds and are the only design parameters. Choosing them properly yields a test with prescribed error probabilities β the content of Wald's inequalities below.
SPRT (Sequential Probability Ratio Test)
A sequential hypothesis test that accumulates log-likelihood ratios one sample at a time and stops as soon as the cumulative LLR exits the interval . Optimal in ASN among tests with the same error probabilities.
Related: Average Sample Number (ASN), Wald's Identity
Average Sample Number (ASN)
The expected stopping time of a sequential test under hypothesis . Also called the expected sample size. For the SPRT it is computed via Wald's identity.
Wald's Identity
For a random walk with i.i.d. increments and a stopping time with , .
Related: Average Sample Number (ASN)
Theorem: Wald's Threshold Approximation
Let and denote the false-alarm and detection probabilities of the SPRT with thresholds . Ignoring the excess over the boundary at the stopping time,
Consequently, to achieve targets and , Wald's choice is
The thresholds depend only on the target error probabilities, not on the densities or the SNR. Every SPRT that wants false-alarm and miss uses the same thresholds , nats regardless of the signal model.
Partition the sample space by the terminal decision and apply a change of measure from to .
On the event we have , i.e., .
Integrate this inequality against to bound by .
Setup via change of measure
Let denote the acceptance region for . By definition,
Exploit the terminal inequality
On we have , which means . Therefore on .
Derive the bound on $\ntn{pfa}$
Integrating both sides over against Lebesgue measure, so .
Symmetric bound for the miss
Repeat on : on this event , i.e., . Integrating yields , or .
Invert to solve for $(A, B)$
Ignoring the excess over the boundary replaces the inequalities by equalities, giving and . Substituting , yields the stated formulas.
Key Takeaway
The SPRT thresholds , are model-free: they depend only on the target error probabilities, not on the densities . This is the clean separation between detection requirements and signal structure that makes the SPRT a universal sequential framework.
Theorem: Wald's Identity
Let be i.i.d. with finite, and let . If is a stopping time with and , then
The expected "total distance" a random walk covers by a stopping time is the product of the expected number of steps and the expected step. It is the stopping-time version of linearity of expectation.
Express .
The event depends only on , so it is independent of .
Expand $S_N$ as a series
Write . Taking expectations (Fubini is justified by ),
Use the stopping-time property
Since is measurable with respect to , it is independent of . Hence .
Sum the tail series
Using ,
Theorem: Average Sample Number of the SPRT
Under hypothesis , let denote the expected per-sample LLR, so that and (assuming ). Neglecting the excess over the boundary, the SPRT with Wald thresholds satisfies
Under the LLR random walk drifts upward at rate per step, so reaching takes about steps. The exact formula adjusts for the rare case in which the walk exits through the wrong boundary.
Apply Wald's identity to the random walk with stopping time .
Condition on the terminal event to evaluate in terms of , , and the error probabilities.
Apply Wald's identity
The increments are i.i.d.\ with mean under . The stopping time has finite mean (this requires a separate argument, e.g., the LLR random walk has nonzero drift so has exponential tails). By TWald's Identity,
Evaluate the terminal expectation
Ignoring the excess over the boundary, when the SPRT accepts and when it accepts . Under , acceptance of occurs with probability , so Under , acceptance of occurs with probability , so
Solve for the ASN
Combining with Wald's identity, Note that , while the numerator (dominated by ) is also negative, so .
Example: SPRT for a Gaussian Shift-in-Mean
Under , ; under , , with known. Derive the SPRT explicitly and compute the ASN for at (0 dB).
Per-sample LLR
The log-likelihood ratio simplifies to The cumulative LLR is .
Thresholds
For ,
Expected increments
Under , nat. Under , nat.
ASN computation
Using Theorem TAverage Sample Number of the SPRT with , , , The fixed-sample-size NP test would need samples. The SPRT uses roughly one-third as many samples on average.
SPRT Sample Path and Decision Boundaries
Simulate sample paths of the cumulative LLR under and for a Gaussian shift-in-mean problem. The SPRT stops as soon as the path exits the strip . Vary the SNR and the target error levels to see how the thresholds move and how the stopping time changes.
Parameters
Signal-to-noise ratio $\mu^2/\sigma^2$
Target false-alarm probability
Target miss probability
Average Sample Number vs. SNR
Compare the SPRT's ASN to the fixed-sample-size Neyman-Pearson test's required sample count as SNR varies. The ASN curve reveals the Wald-Wolfowitz optimality in operational terms: roughly a factor of two to three reduction at the same error levels.
Parameters
Random Walk of the LLR and SPRT Boundaries
Theorem: Wald-Wolfowitz Optimality of the SPRT
Among all sequential tests (stopping rule + terminal decision) of vs.\ based on i.i.d.\ observations with and , the SPRT with Wald thresholds minimizes both and simultaneously.
No other test β deterministic or randomized, fixed or variable sample size β with the same error guarantees can terminate sooner on average under either hypothesis. The SPRT is a rare example of a doubly-optimal procedure.
Reduction to Bayesian problem
Consider the Bayesian problem with prior , cost per sample, and - loss for wrong decisions. The optimal stopping rule compares the posterior to two thresholds that depend on the sampling cost.
SPRT is the Bayes-optimal rule
A direct calculation shows that the posterior-based thresholds are equivalent to LLR-based thresholds, so the SPRT is the optimal Bayes rule for some .
Minimax argument
By a duality between the Bayes and minimax formulations (cf. Siegmund 1985), the SPRT that achieves error levels with the Wald thresholds also minimizes over all tests with those error levels. Full proof requires the theory of optimal stopping; see Siegmund for a modern treatment.
SPRT Implementation
Complexity: per sample; expected totalTruncation (capping at ) is common in practice to bound worst-case latency. The resulting test is no longer strictly optimal but retains the ASN advantage for most sample paths.
Common Mistake: Excess Over the Boundary
Mistake:
Students apply the Wald threshold formulas and assume the error probabilities are exactly and .
Correction:
Wald's formulas ignore the excess (or ) at the stopping time. The actual errors are typically smaller than β the SPRT over-delivers on the error guarantees. For continuous increments with bounded variance, the overshoot adds nats that shift both thresholds inward in the Wald identity. Siegmund's corrected asymptotic expansions account for this.
Common Mistake: Non-i.i.d. Observations Break the SPRT
Mistake:
Applying the SPRT to dependent observations (e.g., correlated noise, adaptive measurements) using the same thresholds.
Correction:
The Wald inequalities rely on the martingale property of the likelihood ratio under , which in turn requires the per-sample LLRs to be a martingale difference sequence. For correlated data, the LLR must be computed for the joint density, and the thresholds must be reinterpreted. A mechanical i.i.d.\ implementation can produce actual errors orders of magnitude above target.
Historical Note: Wald and Wartime Statistics
1943-1947Abraham Wald developed the SPRT during World War II as part of the Statistical Research Group at Columbia University (1943). The problem that motivated the work was munitions quality control: how to decide whether a batch of shells met specifications while testing as few shells as possible. Sequential testing had been informally proposed by others (notably Dodge and Romig in 1929), but Wald established the rigorous optimality theory.
The SPRT was classified until 1947 because of its potential military applications. When Wald published "Sequential Analysis" that year, it became the founding text of sequential statistical inference. Wald died in a plane crash in 1950 at age 48, cutting short one of the most prolific careers in 20th-century statistics.
Historical Note: The Wald-Wolfowitz Optimality Theorem
1948In 1948, Wald and his collaborator Jacob Wolfowitz proved that the SPRT minimizes both and among all sequential tests with given error probabilities β a doubly-optimal property that has no analogue in the fixed-sample-size theory, where only the error is minimized for a fixed .
The proof uses a Bayesian reduction that was considered unusual at the time, since many statisticians viewed Bayes and frequentist methods as philosophically incompatible. Wald's willingness to use Bayesian tools instrumentally anticipated the modern decision-theoretic viewpoint.
Why This Matters: Early Stopping in Iterative Decoders
Turbo and LDPC decoders execute a fixed maximum number of iterations (typically 8-50) regardless of channel quality. In good channels most codewords decode correctly in 3-5 iterations; the remainder of the iteration budget is wasted power and latency.
Early-stopping criteria are sequential tests: monitor a decision statistic between iterations and stop as soon as confidence exceeds a threshold. Common statistics include the minimum absolute extrinsic LLR, the change in the hard-decision vector, and the syndrome check. Each is a proxy for the cumulative LLR of an SPRT on the hypothesis "the current hard decision is the true codeword." Well-designed early stopping achieves - iteration savings at no BER cost in the waterfall regime.
Truncated SPRT in Wireless Systems
Pure SPRT has unbounded stopping time. Any real receiver must impose a maximum sample count β a hard deadline driven by latency budgets (e.g., the LDPC decoder must produce a bit within one OFDM symbol). The truncated SPRT forces a decision at by comparing to a midpoint threshold (typically 0) if the walk is still in the continuation region.
The truncation introduces a small additional error term that must be budgeted alongside . For the effect is usually negligible. For tight deadlines, the thresholds must be redesigned via direct numerical optimization over the truncated problem.
- β’
3GPP NR turbo/LDPC decoders: = 8-12 iterations per TTI
- β’
LDPC hardware decoders allocate an iteration budget in clock cycles
- β’
Early termination contributes directly to UE battery life
Quick Check
For a sequential test with target errors , , what is (in nats)?
(same as option 1, but coincidental)
nats.
Quick Check
In the Gaussian shift-in-mean SPRT with , how does the ASN under scale with SNR in the low-SNR regime?
does not depend on SNR
, so .
Quick Check
Which statement about the Wald-Wolfowitz theorem is correct?
The SPRT minimizes only.
The SPRT minimizes only.
The SPRT minimizes both and among tests with the same error levels.
The SPRT minimizes only.
This doubly-optimal property is the remarkable content of the Wald-Wolfowitz theorem.