Ferkans — Interactive Telecom Tutor

ex-fsi-ch01-01

Easy

Under $\mathcal{H}_0$ , $Y \sim \mathcal{N}(0, 4)$ ; under $\mathcal{H}_1$ , $Y \sim \mathcal{N}(2, 4)$ . For the detector $g(y) = \mathbb{1}\{y > 1\}$ , compute $P_f$ , $P_d$ , and $P_M$ .

Show Hint

Write $P_f = P(Y > 1 \mid \mathcal{H}_0)$ with $Y/2 \sim \mathcal{N}(0,1)$ .

Use the Q-function: $P(Z > a) = Q(a)$ for $Z \sim \mathcal{N}(0,1)$ .

Solution

False alarm

Under $\mathcal{H}_0$ , $Y \sim \mathcal{N}(0,4)$ , so $Y/2 \sim \mathcal{N}(0,1)$ : $P_f = P(Y > 1 \mid \mathcal{H}_0) = P(Y/2 > 1/2) = Q(0.5) \approx 0.3085.$

Detection

Under $\mathcal{H}_1$ , $Y \sim \mathcal{N}(2,4)$ , so $(Y-2)/2 \sim \mathcal{N}(0,1)$ : $P_d = P(Y > 1 \mid \mathcal{H}_1) = P((Y-2)/2 > -1/2) = Q(-0.5) \approx 0.6915.$

Miss

$P_M = 1 - P_d \approx 0.3085$ .

ex-fsi-ch01-02

Easy

Compute the likelihood ratio $L(y)$ for the exponential problem $\mathcal{H}_0: Y \sim \text{Exp}(\lambda_0)$ vs. $\mathcal{H}_1: Y \sim \text{Exp}(\lambda_1)$ with $\lambda_0 > \lambda_1$ . Show that the LRT is equivalent to a threshold test on $y$ .

Show Hint

Use the exponential density $f(y \mid \lambda) = \lambda e^{-\lambda y}$ for $y \geq 0$ .

The LRT inequality is $L(y) > \eta$ . Isolate $y$ .

Solution

LR

$L(y) = \frac{\lambda_1 e^{-\lambda_1 y}}{\lambda_0 e^{-\lambda_0 y}} = \frac{\lambda_1}{\lambda_0} e^{(\lambda_0 - \lambda_1) y}.$ $

Reduce to threshold on $y$

$L(y) > \eta$ iff $(\lambda_0-\lambda_1)y > \log(\eta \lambda_0/\lambda_1)$ . Since $\lambda_0 > \lambda_1$ , divide by $(\lambda_0-\lambda_1) > 0$ : $y > \tau,\qquad \tau = \frac{\log(\eta\lambda_0/\lambda_1)} {\lambda_0 - \lambda_1}.$ The LRT is a one-sided threshold test on $y$ --- larger $y$ favours the smaller-rate hypothesis $\mathcal{H}_1$ .

ex-fsi-ch01-03

Easy

For equal priors and 0-1 costs with $f_0 = \mathcal{N}(-1, 1), f_1 = \mathcal{N}(+1, 1)$ , compute the MAP error probability $P_e^\star$ .

Show Hint

By symmetry the MAP rule is $g(y) = \mathbb{1}\{y > 0\}$ .

$P_e^\star = (1/2)(P_f + P_M)$ and by symmetry $P_f = P_M$ .

Solution

MAP rule

Symmetry: $f_1(y)/f_0(y) > 1$ iff $y > 0$ , so the MAP rule is $g(y) = \mathbb{1}\{y > 0\}$ .

Error probability

$P_f = P(Y > 0 \mid \mathcal{H}_0) = P(Y + 1 > 1) = Q(1)$ . By symmetry $P_M = Q(1)$ . $P_e^\star = \frac{1}{2}Q(1) + \frac{1}{2}Q(1) = Q(1) \approx 0.1587.$

ex-fsi-ch01-04

Easy

Show that the Bhattacharyya coefficient $\rho_B = \int\sqrt{f_0 f_1}\,dy$ satisfies $0 \leq \rho_B \leq 1$ , with $\rho_B = 1$ iff $f_0 = f_1$ a.e.

Show Hint

Apply Cauchy-Schwarz to $\int \sqrt{f_0}\cdot\sqrt{f_1}\,dy$ .

Equality in Cauchy-Schwarz happens when the integrands are proportional.

Solution

Upper bound via Cauchy-Schwarz

$\rho_B = \int \sqrt{f_0(y)}\cdot\sqrt{f_1(y)}\,dy \leq \sqrt{\int f_0(y)\,dy}\cdot\sqrt{\int f_1(y)\,dy} = \sqrt{1}\cdot\sqrt{1} = 1.$ $

Equality condition

Equality in Cauchy-Schwarz requires $\sqrt{f_0(y)} = c\sqrt{f_1(y)}$ a.e., i.e., $f_0 = c^2 f_1$ a.e. Integrating, $c^2 = 1$ , so $c = 1$ (densities are non-negative) and $f_0 = f_1$ a.e.

Lower bound

$\sqrt{f_0 f_1} \geq 0$ so $\rho_B \geq 0$ , with equality iff $f_0 f_1 = 0$ a.e. (disjoint supports).

ex-fsi-ch01-05

Easy

For BPSK over AWGN, $Y = s + W$ with $s \in \{-\sqrt{E_s}, +\sqrt{E_s}\}$ equiprobable and $W \sim \mathcal{N}(0, N_0/2)$ . Derive the MAP decision rule and compute $P_e$ .

Show Hint

The two means are $\pm\sqrt{E_s}$ , variance $N_0/2$ .

Use EBayes Rule for Two Gaussians with $m = \sqrt{E_s}$ , $\sigma^2 = N_0/2$ , equal priors.

Solution

MAP rule

Equal priors, 0-1 costs, $\sigma^2 = N_0/2$ , means $\pm\sqrt{E_s}$ . Threshold $\tau = 0$ by symmetry; decide $\hat s = +\sqrt{E_s}$ iff $y > 0$ .

Error probability

$P_e = P(Y < 0 \mid s = +\sqrt{E_s}) = P(W < -\sqrt{E_s}) = Q(\sqrt{E_s}/\sqrt{N_0/2}) = Q(\sqrt{2E_s/N_0})$ . This is the classical BPSK error formula.

ex-fsi-ch01-06

Medium

For $\mathcal{H}_0: Y \sim \text{Unif}(0, 1)$ vs. $\mathcal{H}_1: Y \sim \text{Unif}(0, 2)$ , derive the Neyman-Pearson test at level $\alpha = 0.1$ and compute its power $P_d$ .

Show Hint

Compute the LR on each region $[0,1]$ and $[1,2]$ .

The LR is a step function; the NP test is thus a simple region test.

Solution

Likelihood ratio

$f_0(y) = 1$ on $[0,1]$ , $0$ elsewhere; $f_1(y) = 1/2$ on $[0,2]$ , $0$ elsewhere. Therefore $L(y) = \begin{cases} 1/2, & y \in [0,1], \\ \infty, & y \in (1, 2]. \end{cases}$

Two-level LRT

For $\eta < 1/2$ : decide $\mathcal{H}_1$ everywhere in $[0,2]$ , so $P_f = 1$ . For $1/2 < \eta < \infty$ : decide $\mathcal{H}_1$ only on $(1,2]$ , giving $P_f = 0$ , $P_d = 1/2$ . To achieve $P_f = 0.1$ , randomise at $L = 1/2$ : decide $\mathcal{H}_1$ on $(1,2]$ with probability 1 and on $[0,1]$ with probability $\gamma$ , so $P_f = \gamma = 0.1$ .

Power

$P_d = \int_{1}^{2}\tfrac{1}{2}\,dy + 0.1\cdot\int_0^1 \tfrac{1}{2}\,dy = 0.5 + 0.05 = 0.55$ .

ex-fsi-ch01-07

Medium

Let $Y_1,\ldots,Y_n$ be i.i.d. Bernoulli( $p$ ) under $\mathcal{H}_0$ with $p=p_0$ , and under $\mathcal{H}_1$ with $p=p_1 > p_0$ . Derive the LRT and show the sufficient statistic is $S = \sum_k Y_k$ .

Show Hint

The joint pmf is $p^S(1-p)^{n-S}$ where $S = \sum Y_k$ .

Take logs and keep terms depending on $S$ .

Solution

LLR

$\ell(\mathbf{y}) = \log\frac{p_1^S(1-p_1)^{n-S}} {p_0^S(1-p_0)^{n-S}} = S \log\!\frac{p_1(1-p_0)}{p_0(1-p_1)} + n\log\!\frac{1-p_1}{1-p_0}.$ $

Sufficient statistic

The coefficient of $S$ is $\log[p_1(1-p_0)/(p_0(1-p_1))] > 0$ (since $p_1 > p_0$ ). Thus the LLR is strictly increasing in $S$ , so the LRT reduces to $S \gtrless k,\qquad k = \frac{\log\eta + n\log[(1-p_0)/(1-p_1)]} {\log[p_1(1-p_0)/(p_0(1-p_1))]}.$ $S$ is the sufficient statistic: the decision depends on the data only through the count of ones.

ex-fsi-ch01-08

Medium

Prove that the Chernoff exponent $\mu(s)$ equals the Kullback-Leibler divergence in the limits: $\mu'(0) = D(f_0 \| f_1)$ and $\mu'(1) = -D(f_1 \| f_0)$ . (Signs depend on the sign convention; establish yours carefully.)

Show Hint

Use $\mu(s) = -\log \mathbb{E}_{f_0}[e^{sX}]$ with $X = \ell(Y)$ .

At $s=0$ , $\mu'(0) = -\mathbb{E}_{f_0}[X]$ . Compute $\mathbb{E}_{f_0}[\log(f_1/f_0)]$ .

Solution

Derivative at $s=0$

$\mu'(s) = -\mathbb{E}_{f_0}[X e^{sX}]/\mathbb{E}_{f_0}[e^{sX}]$ . At $s=0$ , $\mathbb{E}_{f_0}[e^{sX}] = 1$ , so $\mu'(0) = -\mathbb{E}_{f_0}[X] = -\mathbb{E}_{f_0}[\log(f_1/f_0)] = \mathbb{E}_{f_0}[\log(f_0/f_1)] = D(f_0\|f_1).$

Derivative at $s=1$

Using $f_s \propto f_0^{1-s}f_1^s$ , $f_1 = f_1$ , so $\mu'(1) = -\mathbb{E}_{f_1}[X] = -\mathbb{E}_{f_1}[\log(f_1/f_0)] = -D(f_1\|f_0)$ .

Interpretation

$\mu$ starts at zero with positive slope $D(f_0\|f_1)$ , rises to a peak $C(f_0,f_1)$ , and returns to zero at $s=1$ with negative slope $-D(f_1\|f_0)$ . The two KL divergences are the Stein exponents of the one-sided errors in the NP framework.

ex-fsi-ch01-09

Medium

Consider the Gaussian pair $f_0 = \mathcal{N}(0, \sigma_0^2)$ , $f_1 = \mathcal{N}(0, \sigma_1^2)$ with $\sigma_1 > \sigma_0$ (variance shift). Derive the LRT sufficient statistic and show it is not a threshold test on $y$ .

Show Hint

Compute $\ell(y)$ and look at its dependence on $y$ .

You should find a dependence on $y^2$ .

Solution

LLR

$\ell(y) = \frac{y^2}{2\sigma_0^2} - \frac{y^2}{2\sigma_1^2} + \log\frac{\sigma_0}{\sigma_1} = \frac{y^2}{2}\!\left(\frac{1}{\sigma_0^2} - \frac{1}{\sigma_1^2}\right) + \log\frac{\sigma_0}{\sigma_1}.$ $

Sufficient statistic

Because $\sigma_1 > \sigma_0$ , the coefficient of $y^2$ is positive, so the LLR is monotone increasing in $y^2$ . The sufficient statistic is therefore $T = y^2$ , and the LRT is a two-sided test $|y| > \tau.$ A one-sided threshold on $y$ would not be optimal: both very large positive and very large negative $y$ are evidence for $\mathcal{H}_1$ .

ex-fsi-ch01-10

Medium

A radar system must achieve $P_f \leq 10^{-4}$ while maximising $P_d$ . The observation is $n = 100$ i.i.d. samples under $\mathcal{H}_0: Y_k \sim \mathcal{N}(0,1)$ vs. $\mathcal{H}_1: Y_k \sim \mathcal{N}(\mu,1)$ . What is the minimum $\mu$ required to achieve $P_d \geq 0.9$ ?

Show Hint

NP test is a threshold on $\bar Y$ . Under $\mathcal{H}_0$ , $\bar Y \sim \mathcal{N}(0,1/n)$ .

Set $Q(\tau\sqrt{n}) = 10^{-4}$ and $Q((\tau-\mu)\sqrt{n}) = 0.1$ .

Solution

Threshold from size constraint

$P_f = Q(\tau\sqrt{n}) = 10^{-4}$ , so $\tau\sqrt{n} = Q^{-1}(10^{-4}) \approx 3.719$ . With $n = 100$ , $\tau \approx 0.3719$ .

Power constraint

$P_d = Q((\tau-\mu)\sqrt{n}) = 0.9$ requires $(\tau-\mu)\sqrt{n} = Q^{-1}(0.9) \approx -1.2816$ . Thus $\tau - \mu = -0.12816$ and $\mu \geq 0.3719 + 0.12816 \approx 0.500$ .

Interpretation

We need a mean shift of $\mu \approx 0.5$ per sample (roughly $d = 0.5$ standard deviations) to achieve the target operating point with $n = 100$ samples. Each extra sample contributes a factor $\sqrt{n}$ to detection SNR.

ex-fsi-ch01-11

Medium

Compute the Bhattacharyya coefficient between $f_0 = \text{Exp}(\lambda_0)$ and $f_1 = \text{Exp}(\lambda_1)$ , and show that $\rho_B = 2\sqrt{\lambda_0\lambda_1}/(\lambda_0+\lambda_1)$ .

Show Hint

Compute $\sqrt{f_0(y)f_1(y)}$ and integrate from $0$ to $\infty$ .

The integrand is proportional to an exponential density.

Solution

Geometric mean

$\sqrt{f_0(y)f_1(y)} = \sqrt{\lambda_0\lambda_1}\, e^{-(\lambda_0+\lambda_1)y/2}$ for $y \geq 0$ .

Integrate

$\rho_B = \sqrt{\lambda_0\lambda_1}\int_0^\infty e^{-(\lambda_0+\lambda_1)y/2}\,dy = \sqrt{\lambda_0\lambda_1} \cdot \frac{2}{\lambda_0+\lambda_1} = \frac{2\sqrt{\lambda_0\lambda_1}}{\lambda_0+\lambda_1}.$ $

Check

When $\lambda_0 = \lambda_1$ , $\rho_B = 1$ (correct, since $f_0 = f_1$ ). The ratio $\rho_B$ is the geometric-to-arithmetic mean ratio of the rates, always $\leq 1$ by AM-GM, with equality iff $\lambda_0 = \lambda_1$ .

ex-fsi-ch01-12

Medium

Show that for i.i.d. observations the Chernoff bound becomes $P_e^\star \leq \pi_0^{1-s}\pi_1^s e^{-n\mu(s)}$ where $\mu(s)$ is computed for a single sample. Interpret the exponent.

Show Hint

Use independence: $f_j^{(n)}(\mathbf{y}) = \prod_k f_j(y_k)$ .

The Chernoff-bound integral factors over samples.

Solution

Factor the Chernoff integrand

$f_0^{(n)}(\mathbf{y})^{1-s}f_1^{(n)}(\mathbf{y})^s = \prod_k f_0(y_k)^{1-s}f_1(y_k)^s$ .

Integrate

By Fubini, $\int f_0^{(n)\,1-s}f_1^{(n)\,s}\,d\mathbf{y} = \prod_k \int f_0(y_k)^{1-s}f_1(y_k)^s\,dy_k = \bigl(e^{-\mu(s)}\bigr)^n = e^{-n\mu(s)}.$ Thus $P_e^\star \leq \pi_0^{1-s}\pi_1^s e^{-n\mu(s)}$ .

Interpretation

Every sample contributes $\mu(s)$ nats to the error exponent. The optimal tilt $s^\star$ is independent of $n$ , but the maximised exponent $C(f_0,f_1)$ multiplies $n$ . This is the key large-deviations phenomenon: exponential error decay with rate equal to the Chernoff information.

ex-fsi-ch01-13

Hard

Prove that the ROC curve of any LRT is concave without using the slope identity. (Use only time-sharing and the Neyman-Pearson lemma.)

Show Hint

Take two operating points $(\alpha_1, {P_d}_{1})$ and $(\alpha_2, {P_d}_{2})$ on the ROC.

Construct a randomised detector that time-shares between the two LRTs.

Apply NP at the intermediate level $\alpha = \lambda\alpha_1 + (1-\lambda)\alpha_2$ .

Solution

Time-sharing construction

Let $g_{i}$ be the LRT at level $\alpha_i$ , with power ${P_d}_{i} = P_d(g_{i})$ . Define the randomised detector $g_\lambda$ that runs $g_{1}$ with probability $\lambda$ and $g_{2}$ with probability $1-\lambda$ , independently of $y$ .

Compute $\ntn{pfa}, \ntn{pd}$ of the mixture

By the law of total probability conditioned on the randomisation, $P_f(g_\lambda) = \lambda\alpha_1 + (1-\lambda)\alpha_2$ and $P_d(g_\lambda) = \lambda{P_d}_{1} + (1-\lambda){P_d}_{2}$ .

Apply the NP lemma

Let $\alpha_\lambda = \lambda\alpha_1 + (1-\lambda)\alpha_2$ . The LRT at level $\alpha_\lambda$ has power $P_d(\alpha_\lambda)$ , and by NP this is at least the power of any rule with $P_f \leq \alpha_\lambda$ , including $g_\lambda$ . Therefore $P_d(\alpha_\lambda) \geq P_d(g_\lambda) = \lambda P_d(\alpha_1) + (1-\lambda)P_d(\alpha_2).$ This is the concavity inequality. $\blacksquare$

ex-fsi-ch01-14

Hard

Let $f_0, f_1$ be densities on $\mathbb{R}$ and let $T(y)$ be any real-valued statistic. Prove the data-processing inequality for binary testing: the error exponent based on $T(Y)$ alone cannot exceed the exponent based on the full observation $Y$ : $C(f_0^{T}, f_1^{T}) \leq C(f_0, f_1),$ where $f_j^T$ is the law of $T(Y)$ under $\mathcal{H}_j$ .

Show Hint

Show that for any $s$ , $\int (f_0^T)^{1-s}(f_1^T)^s\,dt \geq \int f_0^{1-s}f_1^s\,dy$ .

Condition on $T(Y) = t$ and use Jensen's inequality on the convex function $x \mapsto x^{1-s}$ .

Solution

Conditional densities

Let $f_0(y | t), f_1(y | t)$ be the conditional densities of $Y$ given $T(Y)=t$ . Then $f_j(y) = f_j^T(t(y)) f_j(y | t(y))$ .

Apply Hölder

For $s \in (0,1)$ : $\int f_0^{1-s}f_1^s\,dy = \int f_j^T(t)\int f_0(y|t)^{1-s}f_1(y|t)^s\,dy\,dt.$ By Hölder (or Jensen, since $x^{1-s}y^s$ is concave on the unit simplex), $\int f_0(y|t)^{1-s}f_1(y|t)^s\,dy \leq 1$ (with equality iff $f_0(\cdot|t) = f_1(\cdot|t)$ a.e.). Hence $\int f_0^{1-s}f_1^s\,dy \leq \int f_0^T(t)^{1-s}f_1^T(t)^s\,dt$ .

Take logarithms

Therefore $\mu(s) = -\log\int f_0^{1-s}f_1^s\,dy \geq -\log\int (f_0^T)^{1-s}(f_1^T)^s\,dt = \mu^T(s)$ . Maximising over $s$ : $C(f_0,f_1) \geq C(f_0^T,f_1^T)$ . $\blacksquare$

When is equality achieved?

Equality holds iff $f_0(y|t) = f_1(y|t)$ a.e. for a.e. $t$ , i.e., the statistic $T$ is sufficient. This is another manifestation of sufficiency: a sufficient statistic preserves the full Chernoff information; any non-sufficient compression strictly loses.

ex-fsi-ch01-15

Hard

For two Gaussians with different means and variances, $f_0 = \mathcal{N}(\mu_0, \sigma_0^2)$ , $f_1 = \mathcal{N}(\mu_1, \sigma_1^2)$ , derive $\mu(s)$ and express the Chernoff information as a transcendental equation.

Show Hint

$f_0^{1-s}f_1^s$ is proportional to a Gaussian density; complete the square.

The integrating constant gives $e^{-\mu(s)}$ .

Solution

Exponent algebra

$\log f_0^{1-s}f_1^s$ contains $-(1-s)(y-\mu_0)^2/(2\sigma_0^2) - s(y-\mu_1)^2/(2\sigma_1^2)$ and log-prefactor $(1-s)\log(1/( \sqrt{2\pi}\sigma_0)) + s\log(1/(\sqrt{2\pi}\sigma_1))$ .

Completing the square

Let $A = (1-s)/\sigma_0^2 + s/\sigma_1^2$ (effective precision). Define $\sigma_s^2 = 1/A$ . Completing the square, the exponent is $-\tfrac{1}{2\sigma_s^2}(y - m_s)^2 + \text{const}$ with $m_s = \sigma_s^2[(1-s)\mu_0/\sigma_0^2 + s\mu_1/\sigma_1^2]$ .

Compute $\mu(s)$

Integrating yields $e^{-\mu(s)} = \frac{\sigma_s}{\sigma_0^{1-s}\sigma_1^s} \exp\!\left(-\frac{s(1-s)(\mu_1-\mu_0)^2}{2(\sigma_s^2/(s(1-s)))}\right).$ After simplification, $\mu(s) = \frac{s(1-s)(\mu_1-\mu_0)^2}{2[(1-s)\sigma_1^2+s\sigma_0^2]} + \frac{1}{2}\log\!\frac{(1-s)\sigma_1^2+s\sigma_0^2} {\sigma_0^{2(1-s)}\sigma_1^{2s}}.$

Chernoff information

$C = \max_{s \in [0,1]}\mu(s)$ solves $\partial \mu/\partial s = 0$ , a transcendental equation in $s$ . For $\sigma_0 = \sigma_1 = \sigma$ it reduces to $s^\star = 1/2$ and $C = (\mu_1-\mu_0)^2/(8\sigma^2)$ , recovering EChernoff Information for Two Gaussians. For $\mu_0 = \mu_1$ (pure variance shift), the first term vanishes and $C$ is a log function of the variance ratio.

ex-fsi-ch01-16

Challenge

Stein's lemma. Show that for i.i.d. observations, under the Neyman-Pearson criterion with $P_f \leq \alpha$ fixed, the miss probability decays as $\lim_{n\to\infty} \frac{1}{n}\log P_M^\star(n,\alpha) = -D(f_0 \| f_1).$ Sketch the proof via the LLR and the weak law of large numbers.

Show Hint

Under $\mathcal{H}_0$ , $\frac{1}{n}\ell(\mathbf{Y}) \to \mathbb{E}_{f_0}[\ell] = -D(f_0\|f_1)$ a.s.

For the converse, use the information-theoretic inequality $D(\alpha \| 1-\beta) \leq nD(f_0\|f_1)$ .

Solution

Achievability (upper bound on $P_M^\star$)

The NP test at level $\alpha$ uses threshold $\eta_n$ chosen so that $P_{f_0}(\ell(\mathbf{Y}) > \log \eta_n) = \alpha$ . Under $\mathcal{H}_0$ , $\ell(\mathbf{Y})/n \to -D(f_0\|f_1)$ a.s., so $\log\eta_n \approx -nD(f_0\|f_1) + o(n)$ . Under $\mathcal{H}_1$ the LLR concentrates at $+D(f_1\|f_0)$ , so $P_M = P_{f_1}(\ell(\mathbf{Y}) \leq \log\eta_n) \to 0$ with exponential rate $D(f_0\|f_1)$ (large deviations of $\ell/n$ under $\mathcal{H}_1$ toward $-D(f_0\|f_1)$ ).

Converse (lower bound on $P_M^\star$)

For any rule with size $\leq \alpha$ and power $1 - P_M$ , the data-processing inequality for KL gives $D(\alpha \| 1-P_M) \leq n D(f_0 \| f_1),$ where the left side is the binary KL of the two Bernoulli laws on $\{\text{decide }0,\text{decide }1\}$ . Solving for $P_M$ : $-\log P_M / n \leq D(f_0\|f_1) + o(1)$ . Combining with the achievability yields Stein's exponent.

Comment

Stein's lemma says: in the asymmetric (NP) framework, one-sided error exponents are KL divergences. The symmetric (Bayes) counterpart is Chernoff information. Both theorems tie detection theory to information-theoretic divergences.

ex-fsi-ch01-17

Challenge

Minimax detection. Consider a binary test where the priors are unknown. The minimax rule chooses $g$ to minimise $\max(P_f(g), P_M(g))$ . Show that the minimax rule is the LRT whose threshold $\eta^\star$ equalises the two error probabilities, and describe how to compute $\eta^\star$ numerically.

Show Hint

The minimax criterion is equivalent to Bayes risk with the least-favourable prior.

Parametrise $\pi_1 \in [0,1]$ and the corresponding LRT threshold $\eta(\pi_1)$ ; find $\pi_1^\star$ such that $P_f(\eta(\pi_1^\star)) = P_M(\eta(\pi_1^\star))$ .

Solution

Minimax-Bayes duality

For any prior $\pi_1$ and 0-1 costs, the Bayes rule is the LRT with threshold $\eta = \pi_0/\pi_1$ . Its risk is $r(\pi_1) = \pi_0 P_f(\eta) + \pi_1 P_M(\eta)$ . The risk $r(\pi_1)$ is a concave function of $\pi_1$ (as the minimum of affine functions). Let $\pi_1^\star$ maximise $r(\pi_1)$ ; this prior is least favourable.

Equalisation at $\pi_1^\star$

At the maximiser, $\partial r/\partial \pi_1 = 0$ , i.e., $P_M(\eta^\star) - P_f(\eta^\star) = 0$ , so $P_f(\eta^\star) = P_M(\eta^\star)$ .

Minimax optimality

By the minimax theorem, the LRT at the least-favourable prior minimises the worst-case risk over all priors. Since $\max(P_f, P_M) \geq \pi_0 P_f + \pi_1 P_M$ for any rule at any prior (both terms are $\geq 0$ , one dominates), the LRT achieves $\max = P_f = P_M$ at $\pi_1^\star$ . Any other rule violates this for some prior.

Numerical procedure

Parametrise $\eta \in [0,\infty)$ .
Compute $P_f(\eta) = \int_{\{L>\eta\}} f_0$ and $P_M(\eta) = \int_{\{L\leq\eta\}} f_1$ (both monotone in $\eta$ but in opposite directions).
Solve $P_f(\eta^\star) = P_M(\eta^\star)$ by bisection or Newton's method on the increasing function $g(\eta) = P_f(\eta) - P_M(\eta)$ .

Exercises

ex-fsi-ch01-01

False alarm

Detection

Miss

ex-fsi-ch01-02

LR

Reduce to threshold on $y$

ex-fsi-ch01-03

MAP rule

Error probability

ex-fsi-ch01-04

Upper bound via Cauchy-Schwarz

Equality condition

Lower bound

ex-fsi-ch01-05

MAP rule

Error probability

ex-fsi-ch01-06

Likelihood ratio

Two-level LRT

Power

ex-fsi-ch01-07

LLR

Sufficient statistic

ex-fsi-ch01-08

Derivative at $s=0$

Derivative at $s=1$

Interpretation

ex-fsi-ch01-09

LLR

Sufficient statistic

ex-fsi-ch01-10

Threshold from size constraint

Power constraint

Interpretation

ex-fsi-ch01-11

Geometric mean

Integrate

Check

ex-fsi-ch01-12

Factor the Chernoff integrand

Integrate

Interpretation

ex-fsi-ch01-13

Time-sharing construction

Compute $\ntn{pfa}, \ntn{pd}$ of the mixture

Apply the NP lemma

ex-fsi-ch01-14

Conditional densities

Apply Hölder

Take logarithms

When is equality achieved?

ex-fsi-ch01-15

Exponent algebra

Completing the square

Compute $\mu(s)$

Chernoff information

ex-fsi-ch01-16

Achievability (upper bound on $P_M^\star$)

Converse (lower bound on $P_M^\star$)

Comment

ex-fsi-ch01-17

Minimax-Bayes duality

Equalisation at $\pi_1^\star$

Minimax optimality

Numerical procedure