Neyman-Pearson and the ROC
When You Do Not Have a Prior
The Bayesian formulation requires priors , which in many engineering problems are unavailable or contested. In radar, the prior probability of a target is rarely known. In a clinical screening test, the prior depends on the population. The Neyman-Pearson framework sidesteps this entirely: fix the worst tolerable false-alarm rate and find the detector that maximises subject to . No priors, no costs --- just a single design knob .
Definition: Neyman-Pearson Criterion
Neyman-Pearson Criterion
Fix a significance level . A decision rule is most powerful (MP) at level if:
- (size constraint), and
- for every other rule with , .
The NP criterion is asymmetric: is protected (its rejection rate is bounded), while is detected as aggressively as possible under that protection. This matches applications where false alarms carry a regulatory or safety-critical cost.
Theorem: Neyman-Pearson Lemma
Let be densities on . For every there exists a threshold and a randomisation parameter such that the randomised likelihood ratio test has . This rule is most powerful at level : for any other test with , Moreover is essentially unique: any other MP rule must agree with except on .
Think of the observations as items you want to buy, each with a price (its contribution to ) and a value (its contribution to ). You have a budget . The greedy optimum is to buy items in decreasing order of value-per-price --- exactly --- until the budget is spent. This is the LRT.
Consider any competing rule with and look at the difference of powers .
Write both powers as integrals and combine them into a single integral over .
Show the integrand is pointwise non-negative.
Existence of $\eta, \gamma$ achieving $\ntn{pfa} = \alpha$
Define . Then is non-increasing, right-continuous, and . Let . Then If , take . Otherwise set so that .
Variational comparison
Let be any rule with . Let (possibly randomised: ) and . Consider
Integrand is non-negative pointwise
Examine the sign of the integrand at each :
- If : then , so , so . The product is .
- If : then , so , so . Again the product is .
- If : the integrand is zero.
Hence pointwise, and therefore .
Conclude
Expand : Since and , Therefore , proving that is most powerful. Uniqueness (a.e.) follows because equality in forces on .
Why Randomisation?
For continuous distributions the tie set has measure zero and randomisation is cosmetic. For discrete observations (e.g., counting photons, bits, packets), the tie set has positive mass and a deterministic LRT cannot achieve every : the function is a staircase. Randomisation at the tie value fills the gaps, making every achievable. In engineering practice one usually accepts the closest dominating rather than implementing a physical coin flip.
Definition: Receiver Operating Characteristic (ROC)
Receiver Operating Characteristic (ROC)
For a parametric family of detectors indexed by a threshold (the LRT family being canonical), the receiver operating characteristic is the curve The ROC sweeps from at (always decide ) to at (always decide ). The area under the ROC curve (AUC) is a scalar summary of the detector's separability, with corresponding to perfect detection and to random guessing.
Theorem: Properties of the LRT ROC Curve
The ROC curve of the (possibly randomised) LRT family satisfies:
- Monotonicity. is a non-decreasing function of .
- Lies above the diagonal. for all , with equality iff almost everywhere.
- Concavity. The function is concave on .
- Slope is the threshold. Where differentiable, , the LRT threshold at that operating point.
Concavity is the geometric manifestation of optimality: if were ever convex, one could time-share between two operating points on the curve to achieve a strictly higher at an intermediate --- contradicting the optimality of each individual operating point. The slope-is-threshold identity is the dual of the Lagrangian: is the shadow price that the NP optimum pays for an extra unit of false-alarm budget.
Monotonicity
As decreases, the acceptance region grows, so both and grow monotonically. Parameterising by makes non-decreasing.
Above the diagonal
Randomised guessing --- decide with probability regardless of --- attains for every . By the NP lemma the LRT at level dominates this guess in power: . Equality at some interior forces a.e. by the uniqueness conclusion of NP.
Concavity via time-sharing
Fix and let , . Construct the randomised rule which, independent of , runs LRT at level with probability and LRT at level with probability . Then and By the NP lemma, the LRT at cannot be outperformed: . That is concavity.
Slope identity
Differentiating and (where is the density of under ) yields A change of variables gives (the reader should verify this identity from , which is a restatement of ). Hence the slope equals .
ROC Curve for the Gaussian Shift Problem
Explore the ROC of the LRT for vs. with i.i.d. samples. Increasing or pushes the curve toward the top-left corner (perfect detection).
Parameters
Mean shift
Number of i.i.d. samples
Example: ROC for the Gaussian Mean Shift
For vs. with , compute the ROC curve analytically and verify concavity.
LRT reduces to threshold on $y$
From EBayes Rule for Two Gaussians (with , , appropriate recentring), the LRT decides iff .
Parametric ROC
and . Inverting: , so
Concavity
Let , so where is the standard normal density. Then This slope is strictly decreasing in (because is decreasing in and the exponential is increasing in ). Decreasing slope concavity. As a check, the slope equals , which is exactly evaluated at , consistent with TProperties of the LRT ROC Curve.
Limiting cases
: (the diagonal). : the ROC approaches the perfect corner.
AUC as an Operational Figure of Merit
Practitioners often compare detectors by AUC because it summarises performance in a single scalar without fixing a specific - trade. A Gaussian-shift test with separation has closed-form , i.e., AUC rises smoothly with . Caveats: (i) AUC is insensitive to where on the curve performance actually matters --- radar/early-warning systems should care about the low- region, not the whole curve; (ii) AUC is equivalent to the Mann-Whitney U statistic and inherits its interpretation as where .
- •
Prefer partial AUC (up to ) when false alarms are costly
- •
Report the full ROC, not just AUC, for safety-critical systems
Common Mistake: A Convex Bump in a ROC
Mistake:
Accepting a ROC curve that is non-concave (has a convex kink) as representing a valid detector family.
Correction:
A non-concave empirical ROC indicates sub-optimal operating points --- time-sharing between the endpoints of the convex region produces a strictly higher at every interior . Replace the convex arc with its chord (the upper concave envelope) to recover an optimal family. This is what happens in practice when one plots the ROC of a randomised vs. deterministic detector over a discrete observation space.
Quick Check
For the Neyman-Pearson test at level , which statement is true?
Increasing strictly increases (unless is already 1).
The NP test always achieves the smallest .
The NP test requires knowing the priors .
The NP test is unique even at .
ROC concavity is strict in the interior unless a.e.; increasing budget always helps (weakly) and strictly helps unless saturation.
Area under the ROC curve (AUC)
The integral . It equals the probability that the LR of an observation exceeds the LR of an independent observation: . AUC lies in , with 1/2 for random guessing and 1 for perfect separation.
Related: ROC curve, Mann-Whitney statistic
Why This Matters: CFAR Detection and the Neyman-Pearson Framework
Modern radar receivers implement constant false-alarm rate (CFAR) detectors --- precisely Neyman-Pearson tests whose threshold is continuously adapted to maintain despite unknown clutter statistics. The design target (typically to ) is dictated by regulatory requirements: too many false alarms overload the tracker and waste radar resources. In radar, "operating point on the ROC" is a concrete engineering knob, and the ROC concavity property underpins the detection gain from integrating multiple pulses (Chapter 2).