Ferkans — Interactive Telecom Tutor

Exact Combinatorics for Discrete Sources

The AEP and typicality give us the right exponential behavior but with loose polynomial factors. The method of types provides exact exponential analysis by working directly with the empirical distribution (type) of a sequence. The key insight: the probability of a type class is determined by the KL divergence between the type and the true distribution, and the number of types is polynomial in $n$ . This combinatorial precision is essential for error exponents (Chapter 4), hypothesis testing, and large-deviation results.

Definition:
Type (Empirical Distribution)

The type of a sequence $\mathbf{x} \in \mathcal{X}^n$ is its empirical distribution:

$Q_{\mathbf{x}}(a) = \hat{P}_{\mathbf{x}}(a) = \frac{|\{i : x_i = a\}|}{n}, \quad a \in \mathcal{X}.$

We denote the set of all types (empirical distributions) that can arise from length- $n$ sequences over $\mathcal{X}$ by $\mathcal{P}_n(\mathcal{X})$ .

Definition:
Type Class

The type class of a distribution $Q \in \mathcal{P}_n(\mathcal{X})$ is:

$T_Q = \{\mathbf{x} \in \mathcal{X}^n : Q_{\mathbf{x}} = Q\},$

the set of all sequences with empirical distribution exactly equal to $Q$ .

Type (empirical distribution)

The histogram of a sequence normalized to a probability distribution. For $\mathbf{x} \in \mathcal{X}^n$ : $Q_{\mathbf{x}}(a) = \frac{1}{n}|\{i: x_i = a\}|$ . The method of types analyzes probabilities by grouping sequences by their type.

Related: Strong typicality

Theorem: Number of Types Is Polynomial

$|\mathcal{P}_n(\mathcal{X})| \leq (n+1)^{|\mathcal{X}|}.$ $The number of distinct types for length-$ n $sequences is at most polynomial in$ n$.

Each type is determined by the counts $(n_1, \ldots, n_{|\mathcal{X}|})$ with $\sum n_a = n$ and $n_a \geq 0$ . The number of such integer partitions is $\binom{n + |\mathcal{X}| - 1}{|\mathcal{X}| - 1} \leq (n+1)^{|\mathcal{X}|}$ .

The polynomial growth of the number of types is key: it means that polynomial factors are "free" when computing exponential rates. This is why the $\doteq$ notation (ignoring polynomial factors) works so well.

Proof

Counting

Each component $n_a = n \cdot Q(a) \in \{0, 1, \ldots, n\}$ for each $a \in \mathcal{X}$ . There are at most $(n+1)$ choices for each of $|\mathcal{X}|$ components (overcounting, since they must sum to $n$ ):

$|\mathcal{P}_n| \leq (n+1)^{|\mathcal{X}|}$ .

Theorem: Size of a Type Class

For any type $Q \in \mathcal{P}_n(\mathcal{X})$ :

$\frac{1}{(n+1)^{|\mathcal{X}|}} \cdot 2^{nH(Q)} \leq |T_Q| \leq 2^{nH(Q)}.$

In exponential notation: $|T_Q| \doteq 2^{nH(Q)}$ .

The type class contains all permutations of a sequence with the given histogram. The multinomial coefficient $\binom{n}{nQ(a_1), \ldots, nQ(a_M)} \doteq 2^{nH(Q)}$ by Stirling's approximation. The entropy of the type — not the true distribution — determines the size.

Proof

Stirling bounds

$|T_Q| = \frac{n!}{\prod_a (nQ(a))!}$ .

Using $\sqrt{2\pi n}(n/e)^n \leq n! \leq e\sqrt{n}(n/e)^n$ :

$|T_Q| \doteq \prod_a Q(a)^{-nQ(a)} = 2^{n\sum_a -Q(a)\log Q(a)} = 2^{nH(Q)}$ .

The polynomial correction factors are absorbed by $(n+1)^{|\mathcal{X}|}$ .

Theorem: Probability of a Type Class

If $\mathbf{X} \sim P^n$ (i.i.d. with true distribution $P$ ), then for any type $Q \in \mathcal{P}_n$ :

$P^n(T_Q) \doteq 2^{-nD(Q \| P)}.$

More precisely:

$\frac{1}{(n+1)^{|\mathcal{X}|}} \cdot 2^{-nD(Q\|P)} \leq P^n(T_Q) \leq 2^{-nD(Q\|P)}.$

The probability of seeing type $Q$ when the true distribution is $P$ decays exponentially with the KL divergence $D(Q \| P)$ . Types close to $P$ (small divergence) have high probability; types far from $P$ (large divergence) are exponentially unlikely. This is the precise version of the "concentration on typical sequences" phenomenon.

Proof

Combine size and per-sequence probability

Every $\mathbf{x} \in T_Q$ has the same probability: $P^n(\mathbf{x}) = \prod_a P(a)^{nQ(a)} = 2^{n\sum_a Q(a)\log P(a)}$ .

$P^n(T_Q) = |T_Q| \cdot P^n(\mathbf{x}) \doteq 2^{nH(Q)} \cdot 2^{n\sum_a Q(a)\log P(a)}$

$= 2^{-n[-H(Q) - \sum_a Q(a)\log P(a)]} = 2^{-nD(Q\|P)}$ .

Theorem: Sanov's Theorem

Let $\mathcal{E}$ be a set of distributions on $\mathcal{X}$ (a subset of the probability simplex), and let $Q^* = \arg\min_{Q \in \overline{\mathcal{E}}} D(Q \| P)$ be the I-projection of $P$ onto the closure of $\mathcal{E}$ . Then:

$\Pr(Q_{\mathbf{X}} \in \mathcal{E}) \doteq 2^{-nD(Q^* \| P)}.$

The probability that the empirical distribution falls in a set $\mathcal{E}$ decays exponentially with the minimum KL divergence from $P$ to $\mathcal{E}$ .

Among all types in $\mathcal{E}$ , the one closest to $P$ (in KL divergence) dominates the probability — all others are exponentially less likely. Sanov's theorem is the master large-deviation result for discrete distributions. It governs error exponents in hypothesis testing, source coding, and channel coding.

Proof

Upper bound

$\Pr(Q_{\mathbf{X}} \in \mathcal{E}) = \sum_{Q \in \mathcal{P}_n \cap \mathcal{E}} P^n(T_Q)$

$\leq |\mathcal{P}_n| \cdot \max_{Q \in \mathcal{P}_n \cap \mathcal{E}} 2^{-nD(Q\|P)}$

$\leq (n+1)^{|\mathcal{X}|} \cdot 2^{-nD(Q^*\|P)} \doteq 2^{-nD(Q^*\|P)}$ .

Lower bound (sketch)

For any type $Q_n \in \mathcal{P}_n \cap \mathcal{E}$ approaching $Q^*$ :

$\Pr(Q_{\mathbf{X}} \in \mathcal{E}) \geq P^n(T_{Q_n}) \geq \frac{1}{(n+1)^{|\mathcal{X}|}} 2^{-nD(Q_n\|P)}$

$\doteq 2^{-nD(Q^*\|P)}$ as $n \to \infty$ .

Example: Sanov's Theorem: Detecting a Biased Coin

A coin has $P(\text{heads}) = 0.5$ (fair). We flip it $n$ times and want the probability that the empirical frequency of heads exceeds $0.6$ . Estimate this probability for large $n$ .

Solution

Apply Sanov

$\mathcal{E} = \{Q : Q(\text{heads}) \geq 0.6\}$ . The I-projection of $P = (0.5, 0.5)$ onto $\mathcal{E}$ is $Q^* = (0.6, 0.4)$ .

$D(Q^* \| P) = 0.6\log\frac{0.6}{0.5} + 0.4\log\frac{0.4}{0.5} = 0.6\log 1.2 + 0.4\log 0.8 \approx 0.0290$ bits.

Rate of decay

$\Pr(\hat{P}(\text{heads}) \geq 0.6) \doteq 2^{-0.0290n}$ .

For $n = 100$ : $\approx 2^{-2.9} \approx 0.13$ . For $n = 1000$ : $\approx 2^{-29} \approx 1.9 \times 10^{-9}$ .

The probability of a significant deviation from the true distribution decays exponentially in $n$ , with the KL divergence as the exponent.

Historical Note: Csiszár and the Method of Types

1970s-1981

The method of types was developed systematically by Imre Csiszár and János Körner in a series of papers starting in the 1970s, culminating in their landmark book (1981, 2nd ed. 2011). While the basic idea of grouping sequences by their empirical distribution is natural, Csiszár and Körner showed that this approach provides the sharpest possible exponential bounds and leads to the cleanest proofs of coding theorems.

The method of types is particularly natural for discrete memoryless systems. For continuous sources and channels, the analogous tool is the Donsker-Varadhan large deviation principle, which generalizes the type analysis to abstract alphabets.

Type Class Probabilities

For a binary source with true probability $p$ , visualize the probability $P^n(T_Q) \doteq 2^{-nD(Q\|P)}$ for each type $Q$ . Types near $P$ have exponentially higher probability.

Parameters

True probability p0.3

P(X=1)

Sequence length n50

Length of sequences

🎓CommIT Contribution(2014)

Error Exponents for Mismatched Decoding

A. Somekh-Baruch, G. Caire — IEEE Transactions on Information Theory

The method of types provides the tightest exponential analysis of coding performance. Somekh-Baruch and Caire analyzed error exponents under mismatched decoding — where the decoder uses a metric different from the true channel law — using type-based techniques. This mismatch setting is practically important: real receivers often use simplified (e.g., Gaussian) metrics even when the true channel is non-Gaussian. The results extend the Csiszár-Körner framework to characterize exactly when mismatched decoding incurs an exponent penalty versus the matched case.

method-of-typeserror-exponentsmismatched-decodingView Paper →

Key Takeaway

The method of types gives exact exponential analysis. The probability of observing type $Q$ under true distribution $P$ is $P^n(T_Q) \doteq 2^{-nD(Q\|P)}$ , and the number of types is polynomial in $n$ . Sanov's theorem extends this to arbitrary sets of distributions: the probability of the set decays as $2^{-nD(Q^*\|P)}$ where $Q^*$ is the closest type in KL divergence.

The Method of Types

Exact Combinatorics for Discrete Sources

Definition: Type (Empirical Distribution)

Definition: Type Class

Type (empirical distribution)

Theorem: Number of Types Is Polynomial

Counting

Theorem: Size of a Type Class

Stirling bounds

Theorem: Probability of a Type Class

Combine size and per-sequence probability

Theorem: Sanov's Theorem

Upper bound

Lower bound (sketch)

Example: Sanov's Theorem: Detecting a Biased Coin

Apply Sanov

Rate of decay

Historical Note: Csiszár and the Method of Types

Type Class Probabilities

Parameters

Error Exponents for Mismatched Decoding

Key Takeaway

Definition:
Type (Empirical Distribution)

Definition:
Type Class