The M-ary Decision Problem
From Binary to M-ary Decisions
A binary decision is a luxury afforded only to the simplest communication systems. Every modern receiver β whether it demodulates a QAM symbol, classifies a radar target among several range bins, or decodes one of codewords β is solving an -ary hypothesis testing problem. The theory in this section generalizes the LRT machinery of Chapter 1 to the setting of competing hypotheses. The essential object is no longer a single likelihood ratio but a vector of likelihoods, and the optimal rule carves the observation space into decision regions.
Definition: M-ary Hypothesis Testing Problem
M-ary Hypothesis Testing Problem
Let be hypotheses with prior probabilities (non-negative, summing to one). Under the observation has conditional density with respect to a common reference measure. A deterministic decision rule is a measurable map It induces a partition where is the decision region for .
Unlike the binary case where a single scalar likelihood ratio is sufficient, here the observation must be compared against all likelihoods simultaneously β detection is intrinsically multi-class.
Definition: Probability of Error and Confusion Matrix
Probability of Error and Confusion Matrix
The conditional error probability given is The (average) symbol error probability is The confusion probabilities form the confusion matrix with ; its rows sum to one and .
Theorem: MAP Decision Rule Minimizes Symbol Error Probability
Among all deterministic decision rules, the rule minimizes the average symbol error probability . Ties may be broken arbitrarily without changing .
Heuristically: for each observation we can make exactly one mistake at a time, so we should commit to the hypothesis that is most likely a posteriori. This is the multi-class generalization of the Bayes binary classifier.
Write and note that we maximize the second term.
Swap sum and integral: .
For each fixed , the integrand is maximized by picking the with the largest .
Express $1-P_e$ as an integral
By the law of total probability, Minimizing is equivalent to maximizing the right-hand side over the partition .
Interchange sum and integral
Since partition , for every exactly one indicator is nonzero. Hence
Pointwise optimization
For each fixed , the inner sum contains exactly one nonzero term, and its value equals for the chosen index . To maximize this pointwise we assign to the hypothesis with the largest product .
Conclude
The rule achieves this pointwise maximum and therefore minimizes globally. Ties on a measure-zero set are immaterial.
Definition: Maximum-Likelihood Decision Rule
Maximum-Likelihood Decision Rule
When priors are equal () β or are deliberately ignored β the MAP rule reduces to the maximum-likelihood (ML) rule: Equivalently, in the log-domain, .
In digital communications the transmitter encodes information bits uniformly onto constellation points, so equiprobable priors are the default assumption and the ML rule is used almost universally.
Geometry of MAP Decision Regions
The MAP decision regions are polyhedra in the observation space when the log-likelihoods are linear (Gaussian case with known means). Each pairwise boundary is a hyperplane for Gaussian observations with common covariance. With equiprobable priors and a common white-noise variance, these hyperplanes are perpendicular bisectors of the pairs β the MAP regions become the Voronoi cells of the constellation. We return to this geometric viewpoint in Section 3.2.
Example: Ternary MAP Detection in Scalar Gaussian Noise
Let with for , where , , , , and priors , . Find the MAP decision regions and the symbol error probability .
Set up the log-likelihood comparisons
The log-likelihood under is (plus a hypothesis-independent constant). Pairwise boundaries solve .
Boundary between $\mathcal{H}_0$ and $\mathcal{H}_1$
Expand and cancel: . Plug : , so .
Boundary between $\mathcal{H}_1$ and $\mathcal{H}_2$
By symmetry . Note that as , and β the equal-prior midpoints.
Decision regions and error probability
, , . Hence (wait β we want , i.e. given mean ) , and similarly for . For , . Plugging in and averaging with the priors gives .
MAP/ML Detector for M Hypotheses
Complexity: likelihood evaluations per observationFor large (e.g. 1024-QAM, where ), direct enumeration is infeasible and structural shortcuts are required β e.g., separable decoding of I and Q components for square QAM, or the sphere decoder for lattice constellations.
MAP Decision Regions for a 2D Constellation
Visualize how the MAP decision regions (Voronoi cells, shown with colored shading) change shape as constellation geometry and noise level are varied. The perpendicular bisectors of pairs of constellation points form the region boundaries for equiprobable priors.
Parameters
Common Mistake: ML is Not MAP When Priors Are Unequal
Mistake:
Because digital communication systems typically use equiprobable symbols, students sometimes internalize "ML = optimal" and forget the prior correction when it matters β for example, when channel coding induces strongly biased a-priori bit probabilities, or when soft information is being passed between decoder stages.
Correction:
The MAP rule is always the Bayes-optimal decoder, whereas ML is optimal only when priors are uniform. In modern iterative decoders (Turbo, LDPC) the "priors" fed to each constituent decoder are actually the a-posteriori probabilities from the previous iteration, and ignoring them would break the iteration. Whenever you see the MAP-vs-ML distinction matter, look for a prior imbalance that came from somewhere.
Why This Matters: M-ary Detection as the Symbol Demapper
In every modern coded-modulation receiver, the first step after the matched filter is symbol-by-symbol demapping: given the received vector , compute the log-likelihoods of the constellation points. These per-symbol log-likelihoods are then folded into per-bit LLRs that feed the outer LDPC or polar decoder. The -ary hypothesis test of this chapter is exactly the demapper, and its accuracy bounds the achievable information rate of the coded system.
See full treatment in Composite Hypotheses and the GLRT
Decision Region
The subset of the observation space that a decision rule maps to hypothesis . For the MAP rule with equiprobable priors and Gaussian noise, is the Voronoi cell of the signal point .
Related: Voronoi Cell, MAP Decision Rule Minimizes Symbol Error Probability
Quick Check
Suppose with priors and identical likelihood shapes but means . How do the MAP decision regions differ from the ML regions?
They are identical because the likelihood shapes are identical.
The MAP region for is larger than its ML counterpart.
MAP always uses tighter (smaller) decision regions.
ML always minimizes when likelihoods are Gaussian.
Larger prior shifts the boundaries outward, enlarging .
Historical Note: The Bayes Decision Principle
1930s-1950sThe Bayes decision principle β assign each observation to the hypothesis of maximum posterior probability β dates back to Abraham Wald's work on statistical decision theory (1950), which generalized the Neyman-Pearson framework from binary to arbitrary action spaces. The term "MAP" itself became standard in communications after Van Trees' 1968 monograph Detection, Estimation, and Modulation Theory, which gave the engineering community a common vocabulary for what statisticians had been doing for decades.