Ferkans — Interactive Telecom Tutor

From Binary to M-ary Decisions

A binary decision is a luxury afforded only to the simplest communication systems. Every modern receiver — whether it demodulates a QAM symbol, classifies a radar target among several range bins, or decodes one of $2^k$ codewords — is solving an $M$ -ary hypothesis testing problem. The theory in this section generalizes the LRT machinery of Chapter 1 to the setting of $M$ competing hypotheses. The essential object is no longer a single likelihood ratio but a vector of $M$ likelihoods, and the optimal rule carves the observation space into $M$ decision regions.

Definition:
M-ary Hypothesis Testing Problem

Let $\mathcal{H}_0, \mathcal{H}_1, \ldots, \mathcal{H}_{M-1}$ be $M$ hypotheses with prior probabilities $\pi_0, \ldots, \pi_{M-1}$ (non-negative, summing to one). Under $\mathcal{H}_m$ the observation $\mathbf{Y} \in \mathcal{Y}$ has conditional density $f(\mathbf{y}\mid \mathcal{H}_m)$ with respect to a common reference measure. A deterministic decision rule is a measurable map $g : \mathcal{Y} \longrightarrow \{0,1,\ldots,M-1\}.$ It induces a partition $\mathcal{Y} = \bigsqcup_{m=0}^{M-1} \mathcal{R}_m$ where $\mathcal{R}_m = \{\mathbf{y} : g(\mathbf{y}) = m\}$ is the decision region for $\mathcal{H}_m$ .

Unlike the binary case where a single scalar likelihood ratio is sufficient, here the observation must be compared against all $M$ likelihoods simultaneously — detection is intrinsically multi-class.

Definition:
Probability of Error and Confusion Matrix

The conditional error probability given $\mathcal{H}_m$ is $P_{e\mid m} \;=\; \Pr\bigl(g(\mathbf{Y}) \neq m \mid \mathcal{H}_m\bigr) \;=\; \int_{\mathcal{Y} \setminus \mathcal{R}_m} f(\mathbf{y}\mid \mathcal{H}_m)\,d\mathbf{y}.$ The (average) symbol error probability is $P_e \;=\; \sum_{m=0}^{M-1} \pi_m\, P_{e\mid m}.$ The confusion probabilities $P(m \to j) = \int_{\mathcal{R}_j} f(\mathbf{y}\mid \mathcal{H}_m)\,d\mathbf{y}$ form the confusion matrix $\mathbf{C}$ with $C_{mj} = P(m \to j)$ ; its rows sum to one and $P_{e\mid m} = 1 - C_{mm}$ .

Theorem: MAP Decision Rule Minimizes Symbol Error Probability

Among all deterministic decision rules, the rule $g_{\text{MAP}}(\mathbf{y}) \;=\; \arg\max_{m \in \{0,\ldots,M-1\}} \pi_m f(\mathbf{y}\mid \mathcal{H}_m)$ minimizes the average symbol error probability $P_e$ . Ties may be broken arbitrarily without changing $P_e$ .

Heuristically: for each observation $\mathbf{y}$ we can make exactly one mistake at a time, so we should commit to the hypothesis that is most likely a posteriori. This is the multi-class generalization of the Bayes binary classifier.

Show Hint

Write $P_e = 1 - \sum_m \pi_m \int_{\mathcal{R}_m} f(\mathbf{y}\mid \mathcal{H}_m)\,d\mathbf{y}$ and note that we maximize the second term.

Swap sum and integral: $\sum_m \pi_m \int_{\mathcal{R}_m}\! \cdot = \int_{\mathcal{Y}} \sum_m \pi_m f(\mathbf{y}\mid \mathcal{H}_m) \mathbb{1}[\mathbf{y}\in\mathcal{R}_m]\,d\mathbf{y}$ .

For each fixed $\mathbf{y}$ , the integrand is maximized by picking the $m$ with the largest $\pi_m f(\mathbf{y}\mid \mathcal{H}_m)$ .

Proof

Express $1-P_e$ as an integral

By the law of total probability, $1 - P_e \;=\; \sum_{m=0}^{M-1} \pi_m \int_{\mathcal{R}_m} f(\mathbf{y}\mid \mathcal{H}_m)\,d\mathbf{y}.$ Minimizing $P_e$ is equivalent to maximizing the right-hand side over the partition $\{\mathcal{R}_m\}$ .

Interchange sum and integral

Since $\{\mathcal{R}_m\}$ partition $\mathcal{Y}$ , for every $\mathbf{y}$ exactly one indicator $\mathbb{1}[\mathbf{y}\in\mathcal{R}_m]$ is nonzero. Hence $1 - P_e \;=\; \int_{\mathcal{Y}} \sum_{m=0}^{M-1} \pi_m f(\mathbf{y}\mid \mathcal{H}_m) \mathbb{1}[\mathbf{y}\in\mathcal{R}_m]\,d\mathbf{y}.$

Pointwise optimization

For each fixed $\mathbf{y}$ , the inner sum contains exactly one nonzero term, and its value equals $\pi_m f(\mathbf{y}\mid \mathcal{H}_m)$ for the chosen index $m = g(\mathbf{y})$ . To maximize this pointwise we assign $\mathbf{y}$ to the hypothesis with the largest product $\pi_m f(\mathbf{y}\mid \mathcal{H}_m)$ .

Conclude

The rule $g_{\text{MAP}}(\mathbf{y}) = \arg\max_m \pi_m f(\mathbf{y}\mid\mathcal{H}_m)$ achieves this pointwise maximum and therefore minimizes $P_e$ globally. Ties on a measure-zero set are immaterial. $\blacksquare$

Definition:
Maximum-Likelihood Decision Rule

When priors are equal ( $\pi_m = 1/M$ ) — or are deliberately ignored — the MAP rule reduces to the maximum-likelihood (ML) rule: $g_{\text{ML}}(\mathbf{y}) \;=\; \arg\max_{m} f(\mathbf{y}\mid \mathcal{H}_m).$ Equivalently, in the log-domain, $g_{\text{ML}}(\mathbf{y}) = \arg\max_m \log f(\mathbf{y}\mid \mathcal{H}_m)$ .

In digital communications the transmitter encodes information bits uniformly onto constellation points, so equiprobable priors are the default assumption and the ML rule is used almost universally.

Geometry of MAP Decision Regions

The MAP decision regions are polyhedra in the observation space when the log-likelihoods are linear (Gaussian case with known means). Each pairwise boundary $\{\mathbf{y} : \pi_m f(\mathbf{y}\mid\mathcal{H}_m) = \pi_j f(\mathbf{y}\mid \mathcal{H}_j)\}$ is a hyperplane for Gaussian observations with common covariance. With equiprobable priors and a common white-noise variance, these hyperplanes are perpendicular bisectors of the pairs $(\mathbf{x}_m, \mathbf{x}_j)$ — the MAP regions become the Voronoi cells of the constellation. We return to this geometric viewpoint in Section 3.2.

Example: Ternary MAP Detection in Scalar Gaussian Noise

Let $M=3$ with $\mathcal{H}_m : Y = a_m + Z$ for $m\in\{0,1,2\}$ , where $a_0 = -1$ , $a_1 = 0$ , $a_2 = +1$ , $Z \sim \mathcal{N}(0,\sigma^2)$ , and priors $\pi_0 = \pi_2 = 1/4$ , $\pi_1 = 1/2$ . Find the MAP decision regions and the symbol error probability $P_e$ .

Solution

Set up the log-likelihood comparisons

The log-likelihood under $\mathcal{H}_m$ is $\log \pi_m - \frac{(y-a_m)^2}{2\sigma^2}$ (plus a hypothesis-independent constant). Pairwise boundaries solve $\log\pi_m - \frac{(y-a_m)^2}{2\sigma^2} = \log\pi_j - \frac{(y-a_j)^2}{2\sigma^2}$ .

Boundary between $\mathcal{H}_0$ and $\mathcal{H}_1$

Expand and cancel: $(y-a_1)^2 - (y-a_0)^2 = 2\sigma^2 \log(\pi_1/\pi_0)$ . Plug $a_0=-1, a_1=0, \pi_1/\pi_0 = 2$ : $2y + 1 = 2\sigma^2 \log 2$ , so $y_{01} = \sigma^2 \log 2 - 1/2$ .

Boundary between $\mathcal{H}_1$ and $\mathcal{H}_2$

By symmetry $y_{12} = 1/2 - \sigma^2 \log 2$ . Note that as $\sigma \to 0$ , $y_{01} \to -1/2$ and $y_{12} \to 1/2$ — the equal-prior midpoints.

Decision regions and error probability

$\mathcal{R}_0 = (-\infty, y_{01}]$ , $\mathcal{R}_1 = (y_{01}, y_{12}]$ , $\mathcal{R}_2 = (y_{12}, \infty)$ . Hence $P_{e\mid 0} = Q\!\bigl(\tfrac{-1 - y_{01}}{\sigma}\bigr)$ (wait — we want $Y \not\in \mathcal{R}_0$ , i.e. $Y > y_{01}$ given mean $-1$ ) $= Q\!\bigl(\tfrac{y_{01} + 1}{\sigma}\bigr)$ , and similarly for $P_{e\mid 2}$ . For $\mathcal{H}_1$ , $P_{e\mid 1} = Q\!\bigl(\tfrac{y_{12}}{\sigma}\bigr) + Q\!\bigl(\tfrac{-y_{01}}{\sigma}\bigr)$ . Plugging in and averaging with the priors gives $P_e$ .

MAP/ML Detector for M Hypotheses

Complexity:

O(M)

likelihood evaluations per observation

Input: observation

\mathbf{y}

, priors

\{\pi_m\}

, densities

\{f(\cdot\mid\mathcal{H}_m)\}

Output: decision

\hat m \in \{0,\ldots,M-1\}

1. for

m = 0, 1, \ldots, M-1

do

2.

\quad \lambda_m \leftarrow \log \pi_m + \log f(\mathbf{y}\mid \mathcal{H}_m)

3. end for

4.

\hat m \leftarrow \arg\max_m \lambda_m

5. return

\hat m

For large $M$ (e.g. 1024-QAM, where $M = 1024$ ), direct enumeration is infeasible and structural shortcuts are required — e.g., separable decoding of I and Q components for square QAM, or the sphere decoder for lattice constellations.

MAP Decision Regions for a 2D Constellation

Visualize how the MAP decision regions (Voronoi cells, shown with colored shading) change shape as constellation geometry and noise level are varied. The perpendicular bisectors of pairs of constellation points form the region boundaries for equiprobable priors.

Parameters

Constellation

\text{SNR}

(dB)10

Show noisy received samples

Common Mistake: ML is Not MAP When Priors Are Unequal

Mistake:

Because digital communication systems typically use equiprobable symbols, students sometimes internalize "ML = optimal" and forget the prior correction when it matters — for example, when channel coding induces strongly biased a-priori bit probabilities, or when soft information is being passed between decoder stages.

Correction:

The MAP rule is always the Bayes-optimal decoder, whereas ML is optimal only when priors are uniform. In modern iterative decoders (Turbo, LDPC) the "priors" fed to each constituent decoder are actually the a-posteriori probabilities from the previous iteration, and ignoring them would break the iteration. Whenever you see the MAP-vs-ML distinction matter, look for a prior imbalance that came from somewhere.

Why This Matters: M-ary Detection as the Symbol Demapper

In every modern coded-modulation receiver, the first step after the matched filter is symbol-by-symbol demapping: given the received vector $\mathbf{y}$ , compute the log-likelihoods of the $M$ constellation points. These per-symbol log-likelihoods are then folded into per-bit LLRs that feed the outer LDPC or polar decoder. The $M$ -ary hypothesis test of this chapter is exactly the demapper, and its accuracy bounds the achievable information rate of the coded system.

See full treatment in Composite Hypotheses and the GLRT

Decision Region

The subset $\mathcal{R}_m \subseteq \mathcal{Y}$ of the observation space that a decision rule maps to hypothesis $\mathcal{H}_m$ . For the MAP rule with equiprobable priors and Gaussian noise, $\mathcal{R}_m$ is the Voronoi cell of the signal point $\mathbf{x}_m$ .

Quick Check

Suppose $M=4$ with priors $\pi_0 = 0.5, \pi_1 = \pi_2 = \pi_3 = 1/6$ and identical likelihood shapes but means $a_0 < a_1 < a_2 < a_3$ . How do the MAP decision regions differ from the ML regions?

They are identical because the likelihood shapes are identical.

The MAP region for $\mathcal{H}_0$ is larger than its ML counterpart.

MAP always uses tighter (smaller) decision regions.

ML always minimizes $P_e$ when likelihoods are Gaussian.

Correction:

The MAP region for

\mathcal{H}_0

is larger than its ML counterpart.

Larger prior $\pi_0$ shifts the boundaries outward, enlarging $\mathcal{R}_0$ .

Historical Note: The Bayes Decision Principle

1930s-1950s

The Bayes decision principle — assign each observation to the hypothesis of maximum posterior probability — dates back to Abraham Wald's work on statistical decision theory (1950), which generalized the Neyman-Pearson framework from binary to arbitrary action spaces. The term "MAP" itself became standard in communications after Van Trees' 1968 monograph Detection, Estimation, and Modulation Theory, which gave the engineering community a common vocabulary for what statisticians had been doing for decades.

The M-ary Decision Problem

From Binary to M-ary Decisions

Definition: M-ary Hypothesis Testing Problem

Definition: Probability of Error and Confusion Matrix

Theorem: MAP Decision Rule Minimizes Symbol Error Probability

Express $1-P_e$ as an integral

Interchange sum and integral

Pointwise optimization

Conclude

Definition: Maximum-Likelihood Decision Rule

Geometry of MAP Decision Regions

Example: Ternary MAP Detection in Scalar Gaussian Noise

Set up the log-likelihood comparisons

Boundary between $\mathcal{H}_0$ and $\mathcal{H}_1$

Boundary between $\mathcal{H}_1$ and $\mathcal{H}_2$

Decision regions and error probability

MAP/ML Detector for M Hypotheses

MAP Decision Regions for a 2D Constellation

Parameters

Common Mistake: ML is Not MAP When Priors Are Unequal

Why This Matters: M-ary Detection as the Symbol Demapper

Decision Region

Quick Check

Historical Note: The Bayes Decision Principle

Definition:
M-ary Hypothesis Testing Problem

Definition:
Probability of Error and Confusion Matrix

Definition:
Maximum-Likelihood Decision Rule