Neyman-Pearson and the ROC

When You Do Not Have a Prior

The Bayesian formulation requires priors π0,π1\pi_0, \pi_1, which in many engineering problems are unavailable or contested. In radar, the prior probability of a target is rarely known. In a clinical screening test, the prior depends on the population. The Neyman-Pearson framework sidesteps this entirely: fix the worst tolerable false-alarm rate α\alpha and find the detector that maximises PdP_d subject to PfαP_f \leq \alpha. No priors, no costs --- just a single design knob α\alpha.

Definition:

Neyman-Pearson Criterion

Fix a significance level α(0,1)\alpha \in (0, 1). A decision rule gg^\star is most powerful (MP) at level α\alpha if:

  1. Pf(g)αP_f(g^\star) \leq \alpha (size constraint), and
  2. for every other rule gg with Pf(g)αP_f(g) \leq \alpha, Pd(g)Pd(g)P_d(g^\star) \geq P_d(g).

The NP criterion is asymmetric: H0\mathcal{H}_0 is protected (its rejection rate is bounded), while H1\mathcal{H}_1 is detected as aggressively as possible under that protection. This matches applications where false alarms carry a regulatory or safety-critical cost.

Theorem: Neyman-Pearson Lemma

Let f0,f1f_0, f_1 be densities on Y\mathcal{Y}. For every α(0,1)\alpha \in (0,1) there exists a threshold η0\eta \geq 0 and a randomisation parameter γ[0,1]\gamma \in [0,1] such that the randomised likelihood ratio test g(y)={1if L(y)>η,γif L(y)=η,0if L(y)<η,g^\star(y) = \begin{cases} 1 & \text{if } L(y) > \eta, \\ \gamma & \text{if } L(y) = \eta, \\ 0 & \text{if } L(y) < \eta, \end{cases} has Pf(g)=αP_f(g^\star) = \alpha. This rule is most powerful at level α\alpha: for any other test gg with Pf(g)αP_f(g) \leq \alpha, Pd(g)Pd(g).P_d(g^\star) \geq P_d(g). Moreover gg^\star is essentially unique: any other MP rule must agree with gg^\star except on {L(y)=η}\{L(y) = \eta\}.

Think of the observations yy as items you want to buy, each with a price f0(y)f_0(y) (its contribution to PfP_f) and a value f1(y)f_1(y) (its contribution to PdP_d). You have a budget α\alpha. The greedy optimum is to buy items in decreasing order of value-per-price --- exactly L(y)=f1(y)/f0(y)L(y) = f_1(y)/f_0(y) --- until the budget is spent. This is the LRT.

,

Why Randomisation?

For continuous distributions the tie set {f1=ηf0}\{f_1 = \eta f_0\} has measure zero and randomisation is cosmetic. For discrete observations (e.g., counting photons, bits, packets), the tie set has positive mass and a deterministic LRT cannot achieve every α(0,1)\alpha \in (0,1): the function αPf\alpha \mapsto P_f is a staircase. Randomisation at the tie value fills the gaps, making every α\alpha achievable. In engineering practice one usually accepts the closest dominating α\alpha rather than implementing a physical coin flip.

Definition:

Receiver Operating Characteristic (ROC)

For a parametric family of detectors indexed by a threshold η\eta (the LRT family being canonical), the receiver operating characteristic is the curve ROC  =  {(Pf(η),Pd(η)):η[0,]}[0,1]2.\text{ROC} \;=\; \bigl\{ (P_f(\eta), P_d(\eta)) : \eta \in [0, \infty]\bigr\} \subset [0,1]^2. The ROC sweeps from (1,1)(1, 1) at η=0\eta = 0 (always decide H1\mathcal{H}_1) to (0,0)(0, 0) at η=\eta = \infty (always decide H0\mathcal{H}_0). The area under the ROC curve (AUC) is a scalar summary of the detector's separability, with AUC=1\mathrm{AUC} = 1 corresponding to perfect detection and AUC=1/2\mathrm{AUC} = 1/2 to random guessing.

Theorem: Properties of the LRT ROC Curve

The ROC curve of the (possibly randomised) LRT family satisfies:

  1. Monotonicity. PdP_d is a non-decreasing function of PfP_f.
  2. Lies above the diagonal. Pd(Pf)PfP_d(P_f) \geq P_f for all Pf[0,1]P_f \in [0,1], with equality iff f0=f1f_0 = f_1 almost everywhere.
  3. Concavity. The function PfPd(Pf)P_f \mapsto P_d(P_f) is concave on [0,1][0,1].
  4. Slope is the threshold. Where differentiable, dPd/dPf=ηdP_d/dP_f = \eta, the LRT threshold at that operating point.

Concavity is the geometric manifestation of optimality: if Pd(Pf)P_d(P_f) were ever convex, one could time-share between two operating points on the curve to achieve a strictly higher PdP_d at an intermediate PfP_f --- contradicting the optimality of each individual operating point. The slope-is-threshold identity is the dual of the Lagrangian: η\eta is the shadow price that the NP optimum pays for an extra unit of false-alarm budget.

ROC Curve for the Gaussian Shift Problem

Explore the ROC of the LRT for H0:YN(0,1)\mathcal{H}_0: Y \sim \mathcal{N}(0,1) vs. H1:YN(μ,1)\mathcal{H}_1: Y \sim \mathcal{N}(\mu, 1) with nn i.i.d. samples. Increasing μ\mu or nn pushes the curve toward the top-left corner (perfect detection).

Parameters
1

Mean shift

1

Number of i.i.d. samples

Example: ROC for the Gaussian Mean Shift

For H0:YN(0,1)\mathcal{H}_0: Y \sim \mathcal{N}(0,1) vs. H1:YN(μ,1)\mathcal{H}_1: Y \sim \mathcal{N}(\mu, 1) with μ>0\mu > 0, compute the ROC curve analytically and verify concavity.

🔧Engineering Note

AUC as an Operational Figure of Merit

Practitioners often compare detectors by AUC because it summarises performance in a single scalar without fixing a specific α\alpha-PdP_d trade. A Gaussian-shift test with separation d=μd = \mu has closed-form AUC=Q(d/2)=1Q(d/2)\mathrm{AUC} = Q(-d/\sqrt{2}) = 1 - Q(d/\sqrt{2}), i.e., AUC rises smoothly with dd. Caveats: (i) AUC is insensitive to where on the curve performance actually matters --- radar/early-warning systems should care about the low-α\alpha region, not the whole curve; (ii) AUC is equivalent to the Mann-Whitney U statistic and inherits its interpretation as P(L(Y1)>L(Y0))P(L(Y_1) > L(Y_0)) where YjfjY_j \sim f_j.

Practical Constraints
  • Prefer partial AUC (up to αmax\alpha_{\max}) when false alarms are costly

  • Report the full ROC, not just AUC, for safety-critical systems

Common Mistake: A Convex Bump in a ROC

Mistake:

Accepting a ROC curve that is non-concave (has a convex kink) as representing a valid detector family.

Correction:

A non-concave empirical ROC indicates sub-optimal operating points --- time-sharing between the endpoints of the convex region produces a strictly higher PdP_d at every interior PfP_f. Replace the convex arc with its chord (the upper concave envelope) to recover an optimal family. This is what happens in practice when one plots the ROC of a randomised vs. deterministic detector over a discrete observation space.

Quick Check

For the Neyman-Pearson test at level α\alpha, which statement is true?

Increasing α\alpha strictly increases PdP_d (unless PdP_d is already 1).

The NP test always achieves the smallest PeP_e.

The NP test requires knowing the priors π0,π1\pi_0, \pi_1.

The NP test is unique even at L(y)=ηL(y) = \eta.

Area under the ROC curve (AUC)

The integral 01Pd(Pf)dPf\int_0^1 P_d(P_f)\,dP_f. It equals the probability that the LR of an H1\mathcal{H}_1 observation exceeds the LR of an independent H0\mathcal{H}_0 observation: AUC=P(L(Y1)>L(Y0))\mathrm{AUC} = P(L(Y_1) > L(Y_0)). AUC lies in [1/2,1][1/2, 1], with 1/2 for random guessing and 1 for perfect separation.

Related: ROC curve, Mann-Whitney statistic

Why This Matters: CFAR Detection and the Neyman-Pearson Framework

Modern radar receivers implement constant false-alarm rate (CFAR) detectors --- precisely Neyman-Pearson tests whose threshold is continuously adapted to maintain PfαP_f \leq \alpha despite unknown clutter statistics. The design target α\alpha (typically 10610^{-6} to 10410^{-4}) is dictated by regulatory requirements: too many false alarms overload the tracker and waste radar resources. In radar, "operating point on the ROC" is a concrete engineering knob, and the ROC concavity property underpins the detection gain from integrating multiple pulses (Chapter 2).