Ferkans — Interactive Telecom Tutor

When You Do Not Have a Prior

The Bayesian formulation requires priors $\pi_0, \pi_1$ , which in many engineering problems are unavailable or contested. In radar, the prior probability of a target is rarely known. In a clinical screening test, the prior depends on the population. The Neyman-Pearson framework sidesteps this entirely: fix the worst tolerable false-alarm rate $\alpha$ and find the detector that maximises $P_d$ subject to $P_f \leq \alpha$ . No priors, no costs --- just a single design knob $\alpha$ .

Definition:
Neyman-Pearson Criterion

Fix a significance level $\alpha \in (0, 1)$ . A decision rule $g^\star$ is most powerful (MP) at level $\alpha$ if:

$P_f(g^\star) \leq \alpha$ (size constraint), and
for every other rule $g$ with $P_f(g) \leq \alpha$ , $P_d(g^\star) \geq P_d(g)$ .

The NP criterion is asymmetric: $\mathcal{H}_0$ is protected (its rejection rate is bounded), while $\mathcal{H}_1$ is detected as aggressively as possible under that protection. This matches applications where false alarms carry a regulatory or safety-critical cost.

Theorem: Neyman-Pearson Lemma

Let $f_0, f_1$ be densities on $\mathcal{Y}$ . For every $\alpha \in (0,1)$ there exists a threshold $\eta \geq 0$ and a randomisation parameter $\gamma \in [0,1]$ such that the randomised likelihood ratio test $g^\star(y) = \begin{cases} 1 & \text{if } L(y) > \eta, \\ \gamma & \text{if } L(y) = \eta, \\ 0 & \text{if } L(y) < \eta, \end{cases}$ has $P_f(g^\star) = \alpha$ . This rule is most powerful at level $\alpha$ : for any other test $g$ with $P_f(g) \leq \alpha$ , $P_d(g^\star) \geq P_d(g).$ Moreover $g^\star$ is essentially unique: any other MP rule must agree with $g^\star$ except on $\{L(y) = \eta\}$ .

Think of the observations $y$ as items you want to buy, each with a price $f_0(y)$ (its contribution to $P_f$ ) and a value $f_1(y)$ (its contribution to $P_d$ ). You have a budget $\alpha$ . The greedy optimum is to buy items in decreasing order of value-per-price --- exactly $L(y) = f_1(y)/f_0(y)$ --- until the budget is spent. This is the LRT.

Show Hint

Consider any competing rule $g$ with $P_f(g) \leq \alpha$ and look at the difference of powers $P_d(g^\star) - P_d(g)$ .

Write both powers as integrals and combine them into a single integral over $\mathcal{Y}$ .

Show the integrand $[\mathbb{1}_{g^\star=1} - \mathbb{1}_{g=1}] (f_1 - \eta f_0)$ is pointwise non-negative.

Proof

Existence of $\eta, \gamma$ achieving $\ntn{pfa} = \alpha$

Define $F(\eta) = P(L(Y) > \eta \mid \mathcal{H}_0)$ . Then $F$ is non-increasing, right-continuous, $F(0^-) = 1$ and $F(\infty) = 0$ . Let $\eta^\star = \inf\{\eta : F(\eta) \leq \alpha\}$ . Then $F(\eta^\star) \leq \alpha \leq F(\eta^{\star-}) = P(L(Y) \geq \eta^\star \mid \mathcal{H}_0).$ If $\alpha = F(\eta^\star)$ , take $\gamma = 0$ . Otherwise set $\gamma = \frac{\alpha - F(\eta^\star)} {P(L(Y)=\eta^\star \mid \mathcal{H}_0)} \in [0,1],$ so that $P_f(g^\star) = F(\eta^\star) + \gamma P(L(Y)=\eta^\star \mid \mathcal{H}_0) = \alpha$ .

Variational comparison

Let $g$ be any rule with $P_f(g) \leq \alpha$ . Let $\phi^\star(y) = \mathbb{1}\{g^\star(y) = 1\}$ (possibly randomised: $\phi^\star(y) \in \{0, \gamma, 1\}$ ) and $\phi(y) = \mathbb{1}\{g(y)=1\}$ . Consider $\Delta \;=\; \int_\mathcal{Y} \bigl[\phi^\star(y) - \phi(y)\bigr] \bigl[f_1(y) - \eta f_0(y)\bigr]\,dy.$

Integrand is non-negative pointwise

Examine the sign of the integrand at each $y$ :

If $f_1(y) > \eta f_0(y)$ : then $L(y) > \eta$ , so $\phi^\star(y) = 1 \geq \phi(y)$ , so $\phi^\star - \phi \geq 0$ . The product is $\geq 0$ .
If $f_1(y) < \eta f_0(y)$ : then $L(y) < \eta$ , so $\phi^\star(y) = 0 \leq \phi(y)$ , so $\phi^\star - \phi \leq 0$ . Again the product is $\geq 0$ .
If $f_1(y) = \eta f_0(y)$ : the integrand is zero.

Hence $\Delta \geq 0$ pointwise, and therefore $\Delta \geq 0$ .

Conclude

Expand $\Delta$ : $\Delta = \underbrace{\int \phi^\star f_1 - \phi f_1}_{P_d(g^\star) - P_d(g)} - \eta \underbrace{\int (\phi^\star - \phi) f_0}_{P_f(g^\star) - P_f(g)} \geq 0.$ Since $P_f(g^\star) = \alpha \geq P_f(g)$ and $\eta \geq 0$ , $P_d(g^\star) - P_d(g) \geq \eta\bigl[P_f(g^\star) - P_f(g)\bigr] \geq 0.$ Therefore $P_d(g^\star) \geq P_d(g)$ , proving that $g^\star$ is most powerful. Uniqueness (a.e.) follows because equality in $\Delta \geq 0$ forces $\phi = \phi^\star$ on $\{f_1 \neq \eta f_0\}$ . $\blacksquare$

,

Why Randomisation?

For continuous distributions the tie set $\{f_1 = \eta f_0\}$ has measure zero and randomisation is cosmetic. For discrete observations (e.g., counting photons, bits, packets), the tie set has positive mass and a deterministic LRT cannot achieve every $\alpha \in (0,1)$ : the function $\alpha \mapsto P_f$ is a staircase. Randomisation at the tie value fills the gaps, making every $\alpha$ achievable. In engineering practice one usually accepts the closest dominating $\alpha$ rather than implementing a physical coin flip.

Definition:
Receiver Operating Characteristic (ROC)

For a parametric family of detectors indexed by a threshold $\eta$ (the LRT family being canonical), the receiver operating characteristic is the curve $\text{ROC} \;=\; \bigl\{ (P_f(\eta), P_d(\eta)) : \eta \in [0, \infty]\bigr\} \subset [0,1]^2.$ The ROC sweeps from $(1, 1)$ at $\eta = 0$ (always decide $\mathcal{H}_1$ ) to $(0, 0)$ at $\eta = \infty$ (always decide $\mathcal{H}_0$ ). The area under the ROC curve (AUC) is a scalar summary of the detector's separability, with $\mathrm{AUC} = 1$ corresponding to perfect detection and $\mathrm{AUC} = 1/2$ to random guessing.

Theorem: Properties of the LRT ROC Curve

The ROC curve of the (possibly randomised) LRT family satisfies:

Monotonicity. $P_d$ is a non-decreasing function of $P_f$ .
Lies above the diagonal. $P_d(P_f) \geq P_f$ for all $P_f \in [0,1]$ , with equality iff $f_0 = f_1$ almost everywhere.
Concavity. The function $P_f \mapsto P_d(P_f)$ is concave on $[0,1]$ .
Slope is the threshold. Where differentiable, $dP_d/dP_f = \eta$ , the LRT threshold at that operating point.

Concavity is the geometric manifestation of optimality: if $P_d(P_f)$ were ever convex, one could time-share between two operating points on the curve to achieve a strictly higher $P_d$ at an intermediate $P_f$ --- contradicting the optimality of each individual operating point. The slope-is-threshold identity is the dual of the Lagrangian: $\eta$ is the shadow price that the NP optimum pays for an extra unit of false-alarm budget.

Proof

Monotonicity

As $\eta$ decreases, the acceptance region $\{L > \eta\}$ grows, so both $P_f(\eta) = \int_{\{L > \eta\}} f_0$ and $P_d(\eta) = \int_{\{L > \eta\}} f_1$ grow monotonically. Parameterising by $P_f$ makes $P_d$ non-decreasing.

Above the diagonal

Randomised guessing --- decide $\mathcal{H}_1$ with probability $\alpha$ regardless of $y$ --- attains $(\alpha, \alpha)$ for every $\alpha$ . By the NP lemma the LRT at level $\alpha$ dominates this guess in power: $P_d(\alpha) \geq \alpha$ . Equality at some interior $\alpha$ forces $f_0 = f_1$ a.e. by the uniqueness conclusion of NP.

Concavity via time-sharing

Fix $0 \leq \alpha_1 < \alpha_2 \leq 1$ and let $\lambda \in [0,1]$ , $\alpha_\lambda = \lambda \alpha_1 + (1-\lambda)\alpha_2$ . Construct the randomised rule $g_\lambda$ which, independent of $y$ , runs LRT at level $\alpha_1$ with probability $\lambda$ and LRT at level $\alpha_2$ with probability $1-\lambda$ . Then $P_f(g_\lambda) = \alpha_\lambda$ and $P_d(g_\lambda) = \lambda P_d(\alpha_1) + (1-\lambda)P_d(\alpha_2).$ By the NP lemma, the LRT at $\alpha_\lambda$ cannot be outperformed: $P_d(\alpha_\lambda) \geq \lambda P_d(\alpha_1) + (1-\lambda)P_d(\alpha_2)$ . That is concavity.

Slope identity

Differentiating $P_f(\eta) = \int_\eta^\infty g_0(\ell)\,d\ell$ and $P_d(\eta) = \int_\eta^\infty g_1(\ell)\,d\ell$ (where $g_j$ is the density of $L(Y)$ under $\mathcal{H}_j$ ) yields $\frac{dP_d}{dP_f} = \frac{dP_d/d\eta}{dP_f/d\eta} = \frac{-g_1(\eta)}{-g_0(\eta)} = \frac{g_1(\eta)}{g_0(\eta)}.$ A change of variables gives $g_1(\ell)/g_0(\ell) = \ell$ (the reader should verify this identity from $g_1(\ell) = \ell g_0(\ell)$ , which is a restatement of $dF_1/dF_0 = L$ ). Hence the slope equals $\eta$ . $\blacksquare$

ROC Curve for the Gaussian Shift Problem

Explore the ROC of the LRT for $\mathcal{H}_0: Y \sim \mathcal{N}(0,1)$ vs. $\mathcal{H}_1: Y \sim \mathcal{N}(\mu, 1)$ with $n$ i.i.d. samples. Increasing $\mu$ or $n$ pushes the curve toward the top-left corner (perfect detection).

Parameters

\mu

1

Mean shift

n

1

Number of i.i.d. samples

Example: ROC for the Gaussian Mean Shift

For $\mathcal{H}_0: Y \sim \mathcal{N}(0,1)$ vs. $\mathcal{H}_1: Y \sim \mathcal{N}(\mu, 1)$ with $\mu > 0$ , compute the ROC curve analytically and verify concavity.

Solution

LRT reduces to threshold on $y$

From EBayes Rule for Two Gaussians (with $m = \mu/2$ , $\sigma = 1$ , appropriate recentring), the LRT decides $\mathcal{H}_1$ iff $y > \tau$ .

Parametric ROC

$P_f(\tau) = Q(\tau)$ and $P_d(\tau) = Q(\tau - \mu)$ . Inverting: $\tau = Q^{-1}(P_f)$ , so $P_d(P_f) = Q\!\bigl(Q^{-1}(P_f) - \mu\bigr).$

Concavity

Let $u = Q^{-1}(P_f)$ , so $du/dP_f = -1/\phi(u)$ where $\phi$ is the standard normal density. Then $\frac{d P_d}{d P_f} = \frac{\phi(u-\mu)}{\phi(u)} = e^{\mu u - \mu^2/2}.$ This slope is strictly decreasing in $P_f$ (because $u = Q^{-1}(P_f)$ is decreasing in $P_f$ and the exponential is increasing in $u$ ). Decreasing slope $\Leftrightarrow$ concavity. As a check, the slope equals $\eta = e^{\mu u - \mu^2/2}$ , which is exactly $L(y)$ evaluated at $y = u$ , consistent with TProperties of the LRT ROC Curve.

Limiting cases

$\mu = 0$ : $P_d = P_f$ (the diagonal). $\mu \to \infty$ : the ROC approaches the perfect $(0,1)$ corner.

🔧Engineering Note

AUC as an Operational Figure of Merit

Practitioners often compare detectors by AUC because it summarises performance in a single scalar without fixing a specific $\alpha$ - $P_d$ trade. A Gaussian-shift test with separation $d = \mu$ has closed-form $\mathrm{AUC} = Q(-d/\sqrt{2}) = 1 - Q(d/\sqrt{2})$ , i.e., AUC rises smoothly with $d$ . Caveats: (i) AUC is insensitive to where on the curve performance actually matters --- radar/early-warning systems should care about the low- $\alpha$ region, not the whole curve; (ii) AUC is equivalent to the Mann-Whitney U statistic and inherits its interpretation as $P(L(Y_1) > L(Y_0))$ where $Y_j \sim f_j$ .

Practical Constraints

•
Prefer partial AUC (up to $\alpha_{\max}$ ) when false alarms are costly
•
Report the full ROC, not just AUC, for safety-critical systems

Common Mistake: A Convex Bump in a ROC

Mistake:

Accepting a ROC curve that is non-concave (has a convex kink) as representing a valid detector family.

Correction:

A non-concave empirical ROC indicates sub-optimal operating points --- time-sharing between the endpoints of the convex region produces a strictly higher $P_d$ at every interior $P_f$ . Replace the convex arc with its chord (the upper concave envelope) to recover an optimal family. This is what happens in practice when one plots the ROC of a randomised vs. deterministic detector over a discrete observation space.

Quick Check

For the Neyman-Pearson test at level $\alpha$ , which statement is true?

Increasing $\alpha$ strictly increases $P_d$ (unless $P_d$ is already 1).

The NP test always achieves the smallest $P_e$ .

The NP test requires knowing the priors $\pi_0, \pi_1$ .

The NP test is unique even at $L(y) = \eta$ .

Correction:

Increasing

\alpha

strictly increases

P_d

(unless

P_d

is already 1).

ROC concavity is strict in the interior unless $f_0 = f_1$ a.e.; increasing budget always helps (weakly) and strictly helps unless saturation.

Area under the ROC curve (AUC)

The integral $\int_0^1 P_d(P_f)\,dP_f$ . It equals the probability that the LR of an $\mathcal{H}_1$ observation exceeds the LR of an independent $\mathcal{H}_0$ observation: $\mathrm{AUC} = P(L(Y_1) > L(Y_0))$ . AUC lies in $[1/2, 1]$ , with 1/2 for random guessing and 1 for perfect separation.

Why This Matters: CFAR Detection and the Neyman-Pearson Framework

Modern radar receivers implement constant false-alarm rate (CFAR) detectors --- precisely Neyman-Pearson tests whose threshold is continuously adapted to maintain $P_f \leq \alpha$ despite unknown clutter statistics. The design target $\alpha$ (typically $10^{-6}$ to $10^{-4}$ ) is dictated by regulatory requirements: too many false alarms overload the tracker and waste radar resources. In radar, "operating point on the ROC" is a concrete engineering knob, and the ROC concavity property underpins the detection gain from integrating multiple pulses (Chapter 2).

Neyman-Pearson and the ROC

When You Do Not Have a Prior

Definition: Neyman-Pearson Criterion

Theorem: Neyman-Pearson Lemma

Existence of $\eta, \gamma$ achieving $\ntn{pfa} = \alpha$

Variational comparison

Integrand is non-negative pointwise

Conclude

Why Randomisation?

Definition: Receiver Operating Characteristic (ROC)

Theorem: Properties of the LRT ROC Curve

Monotonicity

Above the diagonal

Concavity via time-sharing

Slope identity

ROC Curve for the Gaussian Shift Problem

Parameters

Example: ROC for the Gaussian Mean Shift

LRT reduces to threshold on $y$

Parametric ROC

Concavity

Limiting cases

AUC as an Operational Figure of Merit

Common Mistake: A Convex Bump in a ROC

Quick Check

Area under the ROC curve (AUC)

Why This Matters: CFAR Detection and the Neyman-Pearson Framework

Definition:
Neyman-Pearson Criterion

Definition:
Receiver Operating Characteristic (ROC)