The Bayes Decision Rule

When Priors Are Available

Suppose we know --- from long experience or from the system design --- that target-present events occur with prior probability Ο€1\pi_1. Suppose further that false alarms and misses have quantifiable costs: a missed cancer diagnosis is catastrophic, a false burglar alarm is merely annoying. The Bayes framework bundles all of this side information into a single scalar criterion, the Bayes risk, and produces the decision rule that minimises it. The Bayes rule is the gold standard against which every other detector is measured.

Definition:

Bayes Risk

Let Ο€0=P(H0)\pi_0 = P(\mathcal{H}_0), Ο€1=P(H1)=1βˆ’Ο€0\pi_1 = P(\mathcal{H}_1) = 1 - \pi_0 be the prior probabilities, and let Cijβ‰₯0C_{ij} \geq 0 be the cost of deciding Hi\mathcal{H}_i when Hj\mathcal{H}_j is true. The Bayes risk of a decision rule gg is r(g)β€…β€Š=β€…β€Šβˆ‘i,j∈{0,1}Cij πj P(g(Y)=i∣Hj).r(g) \;=\; \sum_{i,j \in \{0,1\}} C_{ij}\,\pi_j\, P(g(Y) = i \mid \mathcal{H}_j). Expanding, r(g)=C00Ο€0(1βˆ’Pf)+C10Ο€0Pf+C01Ο€1PM+C11Ο€1Pd.r(g) = C_{00}\pi_0(1-P_f) + C_{10}\pi_0 P_f + C_{01}\pi_1 P_M + C_{11}\pi_1 P_d. A Bayes rule is any decision rule that minimises r(g)r(g).

We always assume the reasonable cost condition C10>C00C_{10} > C_{00} and C01>C11C_{01} > C_{11} --- erring is more expensive than deciding correctly under each hypothesis. Without this condition the "optimal" rule may ignore the data entirely.

Theorem: Bayes-Optimal Decision Rule

Under the reasonable cost condition, the Bayes-optimal decision rule decides H1\mathcal{H}_1 on the set Y1⋆={y∈Y ⁣:β€…β€Šf1(y)f0(y)β€…β€Š>β€…β€ŠΟ„β‹†},Ο„β‹†β€…β€Š=β€…β€ŠΟ€0(C10βˆ’C00)Ο€1(C01βˆ’C11),\mathcal{Y}_1^\star = \Bigl\{y \in \mathcal{Y} \colon\; \frac{f_1(y)}{f_0(y)} \;>\; \tau^\star \Bigr\}, \qquad \tau^\star \;=\; \frac{\pi_0 (C_{10} - C_{00})}{\pi_1 (C_{01} - C_{11})}, and H0\mathcal{H}_0 on the complement. Ties (the set {y:f1(y)/f0(y)=τ⋆}\{y: f_1(y)/f_0(y) = \tau^\star\}) may be assigned arbitrarily without affecting the risk.

Pointwise, the Bayes rule compares the posterior-weighted risk of each decision at yy and picks the cheaper one. Because posteriors are proportional to (prior Γ—\times likelihood), the comparison reduces to a likelihood-ratio test against a threshold that encodes both the priors and the cost structure.

,

Definition:

Maximum a Posteriori (MAP) Rule

The 0-1 cost (or uniform cost) structure is C00=C11=0C_{00} = C_{11} = 0, C01=C10=1C_{01} = C_{10} = 1, for which Bayes risk reduces to average error probability Pe=Ο€0Pf+Ο€1PMP_e = \pi_0 P_f + \pi_1 P_M. The corresponding Bayes rule is the MAP rule: gMAP(y)=1β€…β€Šβ€…β€ŠβŸΊβ€…β€Šβ€…β€ŠΟ€1f1(y)>Ο€0f0(y)β€…β€Šβ€…β€ŠβŸΊβ€…β€Šβ€…β€Šf1(y)f0(y)>Ο€0Ο€1.g_{\text{MAP}}(y) = 1 \;\;\Longleftrightarrow\;\; \pi_1 f_1(y) > \pi_0 f_0(y) \;\;\Longleftrightarrow\;\; \frac{f_1(y)}{f_0(y)} > \frac{\pi_0}{\pi_1}. Equivalently, the MAP rule picks the hypothesis with the larger posterior probability P(Hi∣Y=y)P(\mathcal{H}_i \mid Y=y).

Under equal priors Ο€0=Ο€1=1/2\pi_0 = \pi_1 = 1/2, the MAP rule further reduces to the maximum likelihood (ML) rule: decide H1\mathcal{H}_1 iff f1(y)>f0(y)f_1(y) > f_0(y).

Example: Bayes Rule for Two Gaussians

Under H0\mathcal{H}_0, Y∼N(βˆ’m,Οƒ2)Y \sim \mathcal{N}(-m, \sigma^2); under H1\mathcal{H}_1, Y∼N(+m,Οƒ2)Y \sim \mathcal{N}(+m, \sigma^2). Priors are Ο€0,Ο€1\pi_0, \pi_1 and costs are 0-1 (MAP rule). Derive the explicit decision threshold on yy.

Bayes Risk as a Function of Threshold and Prior

For the Gaussian problem of EBayes Rule for Two Gaussians, plot the Bayes risk r(Ο„)r(\tau) as a function of the threshold Ο„\tau for different prior probabilities Ο€1\pi_1. The minimiser is the Bayes threshold τ⋆\tau^\star.

Parameters
1

Half-separation of Gaussian means

1

Common standard deviation

0.5

Prior probability of $\mathcal{H}_1$

1

Cost of a false alarm

1

Cost of a miss

Posterior Reformulation

The MAP rule admits the transparent posterior form gMAP(y)=arg⁑max⁑i∈{0,1}P(Hi∣Y=y),whereP(Hi∣Y=y)=Ο€ifi(y)Ο€0f0(y)+Ο€1f1(y).g_{\text{MAP}}(y) = \arg\max_{i \in \{0,1\}} P(\mathcal{H}_i \mid Y = y), \quad\text{where}\quad P(\mathcal{H}_i \mid Y = y) = \frac{\pi_i f_i(y)}{\pi_0 f_0(y) + \pi_1 f_1(y)}. Under 0-1 costs, the Bayes-optimal strategy is to choose whichever hypothesis has the higher posterior probability given the observation. This is the most natural Bayesian statement: update beliefs via Bayes' rule, then pick the most probable hypothesis.

Common Mistake: Tie-Breaking and Equality in the LRT

Mistake:

Worrying extensively about the convention at L(y)=τ⋆L(y) = \tau^\star.

Correction:

For Lebesgue-absolutely-continuous f0,f1f_0, f_1, the set {L(y)=τ⋆}\{L(y) = \tau^\star\} has Lebesgue measure zero, so the choice ">>" vs. "β‰₯\geq" in the LRT does not affect PfP_f, PdP_d, or the Bayes risk. The tie set matters only for discrete observations (Section 1.4, randomised tests).

Historical Note: Reverend Bayes and the Long Road to Decision Theory

1763-1950s

Thomas Bayes (1701-1761), an English Presbyterian minister and amateur mathematician, wrote the manuscript that would become "An Essay towards Solving a Problem in the Doctrine of Chances" --- posthumously published in 1763 by his friend Richard Price. The essay contained what we now call Bayes' theorem, though only in a special case.

For nearly two centuries, Bayesian reasoning was marginalised by frequentists who viewed priors as unscientific. The rehabilitation began in the 1940s-1950s with the decision-theoretic foundations laid by Abraham Wald (1902-1950), who introduced loss functions and risk minimisation as the unifying language of statistics. It was Wald who showed that under mild regularity, every admissible decision rule is a Bayes rule for some prior --- a striking vindication of the Bayesian viewpoint from a purely frequentist optimality criterion.

Quick Check

Under 0-1 costs with priors Ο€0=0.8,Ο€1=0.2\pi_0 = 0.8, \pi_1 = 0.2, the MAP rule decides H1\mathcal{H}_1 iff L(y)L(y) exceeds which threshold?

τ⋆=0.25\tau^\star = 0.25

τ⋆=4\tau^\star = 4

τ⋆=1\tau^\star = 1

τ⋆=0.8\tau^\star = 0.8

Posterior probability

Given observation yy, the posterior probability of Hi\mathcal{H}_i is P(Hi∣Y=y)=Ο€ifi(y)/βˆ‘jΟ€jfj(y)P(\mathcal{H}_i \mid Y=y) = \pi_i f_i(y) / \sum_j \pi_j f_j(y). It represents the updated belief about the state of nature after seeing yy, and is the quantity the MAP rule maximises.

Related: prior probability, Bayes-Optimal Decision Rule, Maximum a Posteriori (MAP) Rule

Bayes vs. Neyman-Pearson Formulations

AspectBayesianNeyman-Pearson
Requires priorsYes (Ο€0,Ο€1\pi_0, \pi_1)No
Requires cost matrixYes (CijC_{ij})No
CriterionMinimise Bayes risk r(g)r(g)Maximise PdP_d s.t. Pf≀αP_f \leq \alpha
Optimal ruleLRT with τ⋆=Ο€0(C10βˆ’C00)/[Ο€1(C01βˆ’C11)]\tau^\star = \pi_0(C_{10}{-}C_{00})/[\pi_1(C_{01}{-}C_{11})]LRT with Ο„\tau s.t. Pf(Ο„)=Ξ±P_f(\tau) = \alpha
InterpretationExpected costWorst-case false alarm
Typical domainCommunications, ML, medicalRadar, hypothesis screening