The M-ary Decision Problem

From Binary to M-ary Decisions

A binary decision is a luxury afforded only to the simplest communication systems. Every modern receiver β€” whether it demodulates a QAM symbol, classifies a radar target among several range bins, or decodes one of 2k2^k codewords β€” is solving an MM-ary hypothesis testing problem. The theory in this section generalizes the LRT machinery of Chapter 1 to the setting of MM competing hypotheses. The essential object is no longer a single likelihood ratio but a vector of MM likelihoods, and the optimal rule carves the observation space into MM decision regions.

Definition:

M-ary Hypothesis Testing Problem

Let H0,H1,…,HMβˆ’1\mathcal{H}_0, \mathcal{H}_1, \ldots, \mathcal{H}_{M-1} be MM hypotheses with prior probabilities Ο€0,…,Ο€Mβˆ’1\pi_0, \ldots, \pi_{M-1} (non-negative, summing to one). Under Hm\mathcal{H}_m the observation Y∈Y\mathbf{Y} \in \mathcal{Y} has conditional density f(y∣Hm)f(\mathbf{y}\mid \mathcal{H}_m) with respect to a common reference measure. A deterministic decision rule is a measurable map g:Y⟢{0,1,…,Mβˆ’1}.g : \mathcal{Y} \longrightarrow \{0,1,\ldots,M-1\}. It induces a partition Y=⨆m=0Mβˆ’1Rm\mathcal{Y} = \bigsqcup_{m=0}^{M-1} \mathcal{R}_m where Rm={y:g(y)=m}\mathcal{R}_m = \{\mathbf{y} : g(\mathbf{y}) = m\} is the decision region for Hm\mathcal{H}_m.

Unlike the binary case where a single scalar likelihood ratio is sufficient, here the observation must be compared against all MM likelihoods simultaneously β€” detection is intrinsically multi-class.

Definition:

Probability of Error and Confusion Matrix

The conditional error probability given Hm\mathcal{H}_m is Pe∣mβ€…β€Š=β€…β€ŠPr⁑(g(Y)β‰ m∣Hm)β€…β€Š=β€…β€Šβˆ«Yβˆ–Rmf(y∣Hm) dy.P_{e\mid m} \;=\; \Pr\bigl(g(\mathbf{Y}) \neq m \mid \mathcal{H}_m\bigr) \;=\; \int_{\mathcal{Y} \setminus \mathcal{R}_m} f(\mathbf{y}\mid \mathcal{H}_m)\,d\mathbf{y}. The (average) symbol error probability is Peβ€…β€Š=β€…β€Šβˆ‘m=0Mβˆ’1Ο€m Pe∣m.P_e \;=\; \sum_{m=0}^{M-1} \pi_m\, P_{e\mid m}. The confusion probabilities P(mβ†’j)=∫Rjf(y∣Hm) dyP(m \to j) = \int_{\mathcal{R}_j} f(\mathbf{y}\mid \mathcal{H}_m)\,d\mathbf{y} form the confusion matrix C\mathbf{C} with Cmj=P(mβ†’j)C_{mj} = P(m \to j); its rows sum to one and Pe∣m=1βˆ’CmmP_{e\mid m} = 1 - C_{mm}.

Theorem: MAP Decision Rule Minimizes Symbol Error Probability

Among all deterministic decision rules, the rule gMAP(y)β€…β€Š=β€…β€Šarg⁑max⁑m∈{0,…,Mβˆ’1}Ο€mf(y∣Hm)g_{\text{MAP}}(\mathbf{y}) \;=\; \arg\max_{m \in \{0,\ldots,M-1\}} \pi_m f(\mathbf{y}\mid \mathcal{H}_m) minimizes the average symbol error probability PeP_e. Ties may be broken arbitrarily without changing PeP_e.

Heuristically: for each observation y\mathbf{y} we can make exactly one mistake at a time, so we should commit to the hypothesis that is most likely a posteriori. This is the multi-class generalization of the Bayes binary classifier.

Definition:

Maximum-Likelihood Decision Rule

When priors are equal (Ο€m=1/M\pi_m = 1/M) β€” or are deliberately ignored β€” the MAP rule reduces to the maximum-likelihood (ML) rule: gML(y)β€…β€Š=β€…β€Šarg⁑max⁑mf(y∣Hm).g_{\text{ML}}(\mathbf{y}) \;=\; \arg\max_{m} f(\mathbf{y}\mid \mathcal{H}_m). Equivalently, in the log-domain, gML(y)=arg⁑max⁑mlog⁑f(y∣Hm)g_{\text{ML}}(\mathbf{y}) = \arg\max_m \log f(\mathbf{y}\mid \mathcal{H}_m).

In digital communications the transmitter encodes information bits uniformly onto constellation points, so equiprobable priors are the default assumption and the ML rule is used almost universally.

Geometry of MAP Decision Regions

The MAP decision regions are polyhedra in the observation space when the log-likelihoods are linear (Gaussian case with known means). Each pairwise boundary {y:Ο€mf(y∣Hm)=Ο€jf(y∣Hj)}\{\mathbf{y} : \pi_m f(\mathbf{y}\mid\mathcal{H}_m) = \pi_j f(\mathbf{y}\mid \mathcal{H}_j)\} is a hyperplane for Gaussian observations with common covariance. With equiprobable priors and a common white-noise variance, these hyperplanes are perpendicular bisectors of the pairs (xm,xj)(\mathbf{x}_m, \mathbf{x}_j) β€” the MAP regions become the Voronoi cells of the constellation. We return to this geometric viewpoint in Section 3.2.

Example: Ternary MAP Detection in Scalar Gaussian Noise

Let M=3M=3 with Hm:Y=am+Z\mathcal{H}_m : Y = a_m + Z for m∈{0,1,2}m\in\{0,1,2\}, where a0=βˆ’1a_0 = -1, a1=0a_1 = 0, a2=+1a_2 = +1, Z∼N(0,Οƒ2)Z \sim \mathcal{N}(0,\sigma^2), and priors Ο€0=Ο€2=1/4\pi_0 = \pi_2 = 1/4, Ο€1=1/2\pi_1 = 1/2. Find the MAP decision regions and the symbol error probability PeP_e.

MAP/ML Detector for M Hypotheses

Complexity: O(M)O(M) likelihood evaluations per observation
Input: observation y\mathbf{y}, priors {Ο€m}\{\pi_m\}, densities {f(β‹…βˆ£Hm)}\{f(\cdot\mid\mathcal{H}_m)\}
Output: decision m^∈{0,…,Mβˆ’1}\hat m \in \{0,\ldots,M-1\}
1. for m=0,1,…,Mβˆ’1m = 0, 1, \ldots, M-1 do
2. Ξ»m←log⁑πm+log⁑f(y∣Hm)\quad \lambda_m \leftarrow \log \pi_m + \log f(\mathbf{y}\mid \mathcal{H}_m)
3. end for
4. m^←arg⁑max⁑mΞ»m\hat m \leftarrow \arg\max_m \lambda_m
5. return m^\hat m

For large MM (e.g. 1024-QAM, where M=1024M = 1024), direct enumeration is infeasible and structural shortcuts are required β€” e.g., separable decoding of I and Q components for square QAM, or the sphere decoder for lattice constellations.

MAP Decision Regions for a 2D Constellation

Visualize how the MAP decision regions (Voronoi cells, shown with colored shading) change shape as constellation geometry and noise level are varied. The perpendicular bisectors of pairs of constellation points form the region boundaries for equiprobable priors.

Parameters
10

Common Mistake: ML is Not MAP When Priors Are Unequal

Mistake:

Because digital communication systems typically use equiprobable symbols, students sometimes internalize "ML = optimal" and forget the prior correction when it matters β€” for example, when channel coding induces strongly biased a-priori bit probabilities, or when soft information is being passed between decoder stages.

Correction:

The MAP rule is always the Bayes-optimal decoder, whereas ML is optimal only when priors are uniform. In modern iterative decoders (Turbo, LDPC) the "priors" fed to each constituent decoder are actually the a-posteriori probabilities from the previous iteration, and ignoring them would break the iteration. Whenever you see the MAP-vs-ML distinction matter, look for a prior imbalance that came from somewhere.

Why This Matters: M-ary Detection as the Symbol Demapper

In every modern coded-modulation receiver, the first step after the matched filter is symbol-by-symbol demapping: given the received vector y\mathbf{y}, compute the log-likelihoods of the MM constellation points. These per-symbol log-likelihoods are then folded into per-bit LLRs that feed the outer LDPC or polar decoder. The MM-ary hypothesis test of this chapter is exactly the demapper, and its accuracy bounds the achievable information rate of the coded system.

See full treatment in Composite Hypotheses and the GLRT

Decision Region

The subset RmβŠ†Y\mathcal{R}_m \subseteq \mathcal{Y} of the observation space that a decision rule maps to hypothesis Hm\mathcal{H}_m. For the MAP rule with equiprobable priors and Gaussian noise, Rm\mathcal{R}_m is the Voronoi cell of the signal point xm\mathbf{x}_m.

Related: Voronoi Cell, MAP Decision Rule Minimizes Symbol Error Probability

Quick Check

Suppose M=4M=4 with priors Ο€0=0.5,Ο€1=Ο€2=Ο€3=1/6\pi_0 = 0.5, \pi_1 = \pi_2 = \pi_3 = 1/6 and identical likelihood shapes but means a0<a1<a2<a3a_0 < a_1 < a_2 < a_3. How do the MAP decision regions differ from the ML regions?

They are identical because the likelihood shapes are identical.

The MAP region for H0\mathcal{H}_0 is larger than its ML counterpart.

MAP always uses tighter (smaller) decision regions.

ML always minimizes PeP_e when likelihoods are Gaussian.

Historical Note: The Bayes Decision Principle

1930s-1950s

The Bayes decision principle β€” assign each observation to the hypothesis of maximum posterior probability β€” dates back to Abraham Wald's work on statistical decision theory (1950), which generalized the Neyman-Pearson framework from binary to arbitrary action spaces. The term "MAP" itself became standard in communications after Van Trees' 1968 monograph Detection, Estimation, and Modulation Theory, which gave the engineering community a common vocabulary for what statisticians had been doing for decades.