Ferkans — Interactive Telecom Tutor

When Priors Are Available

Suppose we know --- from long experience or from the system design --- that target-present events occur with prior probability $\pi_1$ . Suppose further that false alarms and misses have quantifiable costs: a missed cancer diagnosis is catastrophic, a false burglar alarm is merely annoying. The Bayes framework bundles all of this side information into a single scalar criterion, the Bayes risk, and produces the decision rule that minimises it. The Bayes rule is the gold standard against which every other detector is measured.

Definition:
Bayes Risk

Let $\pi_0 = P(\mathcal{H}_0)$ , $\pi_1 = P(\mathcal{H}_1) = 1 - \pi_0$ be the prior probabilities, and let $C_{ij} \geq 0$ be the cost of deciding $\mathcal{H}_i$ when $\mathcal{H}_j$ is true. The Bayes risk of a decision rule $g$ is $r(g) \;=\; \sum_{i,j \in \{0,1\}} C_{ij}\,\pi_j\, P(g(Y) = i \mid \mathcal{H}_j).$ Expanding, $r(g) = C_{00}\pi_0(1-P_f) + C_{10}\pi_0 P_f + C_{01}\pi_1 P_M + C_{11}\pi_1 P_d.$ A Bayes rule is any decision rule that minimises $r(g)$ .

We always assume the reasonable cost condition $C_{10} > C_{00}$ and $C_{01} > C_{11}$ --- erring is more expensive than deciding correctly under each hypothesis. Without this condition the "optimal" rule may ignore the data entirely.

Theorem: Bayes-Optimal Decision Rule

Under the reasonable cost condition, the Bayes-optimal decision rule decides $\mathcal{H}_1$ on the set $\mathcal{Y}_1^\star = \Bigl\{y \in \mathcal{Y} \colon\; \frac{f_1(y)}{f_0(y)} \;>\; \tau^\star \Bigr\}, \qquad \tau^\star \;=\; \frac{\pi_0 (C_{10} - C_{00})}{\pi_1 (C_{01} - C_{11})},$ and $\mathcal{H}_0$ on the complement. Ties (the set $\{y: f_1(y)/f_0(y) = \tau^\star\}$ ) may be assigned arbitrarily without affecting the risk.

Pointwise, the Bayes rule compares the posterior-weighted risk of each decision at $y$ and picks the cheaper one. Because posteriors are proportional to (prior $\times$ likelihood), the comparison reduces to a likelihood-ratio test against a threshold that encodes both the priors and the cost structure.

Show Hint

Write the Bayes risk as an integral over $\mathcal{Y}$ and separate the contributions from $\mathcal{Y}_0$ and $\mathcal{Y}_1$ .

Collect all terms that depend on $\mathcal{Y}_1$ --- these are the only free degrees of freedom.

Assign $y$ to $\mathcal{Y}_1$ iff the integrand at $y$ is negative (it lowers the risk).

Proof

Express the risk as an integral

Using $P_f = \int_{\mathcal{Y}_1} f_0\,dy$ and $P_d = \int_{\mathcal{Y}_1} f_1\,dy$ , and $1-P_f = \int_{\mathcal{Y}_0} f_0\,dy$ , $P_M = \int_{\mathcal{Y}_0} f_1\,dy$ , $r(g) = \int_{\mathcal{Y}_0}\!\!\bigl[C_{00}\pi_0 f_0(y) + C_{01}\pi_1 f_1(y)\bigr]dy + \int_{\mathcal{Y}_1}\!\!\bigl[C_{10}\pi_0 f_0(y) + C_{11}\pi_1 f_1(y)\bigr]dy.$

Move to a single integral

Using $\mathcal{Y}_0 = \mathcal{Y} \setminus \mathcal{Y}_1$ , $r(g) = \underbrace{\int_{\mathcal{Y}}\!\!\bigl[C_{00}\pi_0 f_0(y) + C_{01}\pi_1 f_1(y)\bigr]dy}_{\text{independent of }g} + \int_{\mathcal{Y}_1}\!\!\bigl[(C_{10}-C_{00})\pi_0 f_0(y) - (C_{01}-C_{11})\pi_1 f_1(y)\bigr]dy.$

Pointwise optimisation

The first integral does not depend on the decision rule. To minimise the second, include $y$ in $\mathcal{Y}_1$ exactly when its integrand is strictly negative: $(C_{10}-C_{00})\pi_0 f_0(y) - (C_{01}-C_{11})\pi_1 f_1(y) < 0.$

Rearrange into LRT form

Using the reasonable-cost condition (both differences positive), $\frac{f_1(y)}{f_0(y)} > \frac{\pi_0(C_{10}-C_{00})}{\pi_1(C_{01}-C_{11})} = \tau^\star.$ Any rule deciding $\mathcal{H}_1$ on this set and $\mathcal{H}_0$ on its complement achieves the minimum risk. Tie assignments affect a set of zero measure under Lebesgue-absolutely-continuous densities, so the risk is unchanged. $\blacksquare$

,

Definition:
Maximum a Posteriori (MAP) Rule

The 0-1 cost (or uniform cost) structure is $C_{00} = C_{11} = 0$ , $C_{01} = C_{10} = 1$ , for which Bayes risk reduces to average error probability $P_e = \pi_0 P_f + \pi_1 P_M$ . The corresponding Bayes rule is the MAP rule: $g_{\text{MAP}}(y) = 1 \;\;\Longleftrightarrow\;\; \pi_1 f_1(y) > \pi_0 f_0(y) \;\;\Longleftrightarrow\;\; \frac{f_1(y)}{f_0(y)} > \frac{\pi_0}{\pi_1}.$ Equivalently, the MAP rule picks the hypothesis with the larger posterior probability $P(\mathcal{H}_i \mid Y=y)$ .

Under equal priors $\pi_0 = \pi_1 = 1/2$ , the MAP rule further reduces to the maximum likelihood (ML) rule: decide $\mathcal{H}_1$ iff $f_1(y) > f_0(y)$ .

Example: Bayes Rule for Two Gaussians

Under $\mathcal{H}_0$ , $Y \sim \mathcal{N}(-m, \sigma^2)$ ; under $\mathcal{H}_1$ , $Y \sim \mathcal{N}(+m, \sigma^2)$ . Priors are $\pi_0, \pi_1$ and costs are 0-1 (MAP rule). Derive the explicit decision threshold on $y$ .

Solution

Write the likelihood ratio

With $f_i(y) = \frac{1}{\sqrt{2\pi}\sigma}\exp(-(y-\mu_i)^2/(2\sigma^2))$ , $\mu_0 = -m$ , $\mu_1 = +m$ , the ratio is $L(y) = \exp\!\left(-\frac{(y-m)^2 - (y+m)^2}{2\sigma^2}\right) = \exp\!\left(\frac{2my}{\sigma^2}\right).$

Apply the MAP threshold

Decide $\mathcal{H}_1$ iff $L(y) > \pi_0/\pi_1$ , i.e., $\frac{2my}{\sigma^2} > \log\!\frac{\pi_0}{\pi_1} \;\Longleftrightarrow\; y > \tau, \qquad \tau = \frac{\sigma^2}{2m}\log\!\frac{\pi_0}{\pi_1}.$

Interpret the threshold

If $\pi_0 = \pi_1$ , then $\tau = 0$ : the decision boundary is the midpoint of the two means, by symmetry. If $\pi_1 > \pi_0$ (the alternative is more likely a priori), $\log(\pi_0/\pi_1) < 0$ , so $\tau < 0$ : the rule is biased toward $\mathcal{H}_1$ , enlarging $\mathcal{Y}_1$ .

Bayes Risk as a Function of Threshold and Prior

For the Gaussian problem of EBayes Rule for Two Gaussians, plot the Bayes risk $r(\tau)$ as a function of the threshold $\tau$ for different prior probabilities $\pi_1$ . The minimiser is the Bayes threshold $\tau^\star$ .

Parameters

m

1

Half-separation of Gaussian means

\sigma

1

Common standard deviation

\pi_1

0.5

Prior probability of $\mathcal{H}_1$

C_{10}

1

Cost of a false alarm

C_{01}

1

Cost of a miss

Posterior Reformulation

The MAP rule admits the transparent posterior form $g_{\text{MAP}}(y) = \arg\max_{i \in \{0,1\}} P(\mathcal{H}_i \mid Y = y), \quad\text{where}\quad P(\mathcal{H}_i \mid Y = y) = \frac{\pi_i f_i(y)}{\pi_0 f_0(y) + \pi_1 f_1(y)}.$ Under 0-1 costs, the Bayes-optimal strategy is to choose whichever hypothesis has the higher posterior probability given the observation. This is the most natural Bayesian statement: update beliefs via Bayes' rule, then pick the most probable hypothesis.

Common Mistake: Tie-Breaking and Equality in the LRT

Mistake:

Worrying extensively about the convention at $L(y) = \tau^\star$ .

Correction:

For Lebesgue-absolutely-continuous $f_0, f_1$ , the set $\{L(y) = \tau^\star\}$ has Lebesgue measure zero, so the choice " $>$ " vs. " $\geq$ " in the LRT does not affect $P_f$ , $P_d$ , or the Bayes risk. The tie set matters only for discrete observations (Section 1.4, randomised tests).

Historical Note: Reverend Bayes and the Long Road to Decision Theory

1763-1950s

Thomas Bayes (1701-1761), an English Presbyterian minister and amateur mathematician, wrote the manuscript that would become "An Essay towards Solving a Problem in the Doctrine of Chances" --- posthumously published in 1763 by his friend Richard Price. The essay contained what we now call Bayes' theorem, though only in a special case.

For nearly two centuries, Bayesian reasoning was marginalised by frequentists who viewed priors as unscientific. The rehabilitation began in the 1940s-1950s with the decision-theoretic foundations laid by Abraham Wald (1902-1950), who introduced loss functions and risk minimisation as the unifying language of statistics. It was Wald who showed that under mild regularity, every admissible decision rule is a Bayes rule for some prior --- a striking vindication of the Bayesian viewpoint from a purely frequentist optimality criterion.

Quick Check

Under 0-1 costs with priors $\pi_0 = 0.8, \pi_1 = 0.2$ , the MAP rule decides $\mathcal{H}_1$ iff $L(y)$ exceeds which threshold?

$\tau^\star = 0.25$

$\tau^\star = 4$

$\tau^\star = 1$

$\tau^\star = 0.8$