The Method of Types

Exact Combinatorics for Discrete Sources

The AEP and typicality give us the right exponential behavior but with loose polynomial factors. The method of types provides exact exponential analysis by working directly with the empirical distribution (type) of a sequence. The key insight: the probability of a type class is determined by the KL divergence between the type and the true distribution, and the number of types is polynomial in nn. This combinatorial precision is essential for error exponents (Chapter 4), hypothesis testing, and large-deviation results.

Definition:

Type (Empirical Distribution)

The type of a sequence xXn\mathbf{x} \in \mathcal{X}^n is its empirical distribution:

Qx(a)=P^x(a)={i:xi=a}n,aX.Q_{\mathbf{x}}(a) = \hat{P}_{\mathbf{x}}(a) = \frac{|\{i : x_i = a\}|}{n}, \quad a \in \mathcal{X}.

We denote the set of all types (empirical distributions) that can arise from length-nn sequences over X\mathcal{X} by Pn(X)\mathcal{P}_n(\mathcal{X}).

Definition:

Type Class

The type class of a distribution QPn(X)Q \in \mathcal{P}_n(\mathcal{X}) is:

TQ={xXn:Qx=Q},T_Q = \{\mathbf{x} \in \mathcal{X}^n : Q_{\mathbf{x}} = Q\},

the set of all sequences with empirical distribution exactly equal to QQ.

Type (empirical distribution)

The histogram of a sequence normalized to a probability distribution. For xXn\mathbf{x} \in \mathcal{X}^n: Qx(a)=1n{i:xi=a}Q_{\mathbf{x}}(a) = \frac{1}{n}|\{i: x_i = a\}|. The method of types analyzes probabilities by grouping sequences by their type.

Related: Strong typicality

Theorem: Number of Types Is Polynomial

Pn(X)(n+1)X.|\mathcal{P}_n(\mathcal{X})| \leq (n+1)^{|\mathcal{X}|}.ThenumberofdistincttypesforlengthThe number of distinct types for length-nsequencesisatmostpolynomialinsequences is at most polynomial inn$.

Each type is determined by the counts (n1,,nX)(n_1, \ldots, n_{|\mathcal{X}|}) with na=n\sum n_a = n and na0n_a \geq 0. The number of such integer partitions is (n+X1X1)(n+1)X\binom{n + |\mathcal{X}| - 1}{|\mathcal{X}| - 1} \leq (n+1)^{|\mathcal{X}|}.

The polynomial growth of the number of types is key: it means that polynomial factors are "free" when computing exponential rates. This is why the \doteq notation (ignoring polynomial factors) works so well.

Theorem: Size of a Type Class

For any type QPn(X)Q \in \mathcal{P}_n(\mathcal{X}):

1(n+1)X2nH(Q)TQ2nH(Q).\frac{1}{(n+1)^{|\mathcal{X}|}} \cdot 2^{nH(Q)} \leq |T_Q| \leq 2^{nH(Q)}.

In exponential notation: TQ2nH(Q)|T_Q| \doteq 2^{nH(Q)}.

The type class contains all permutations of a sequence with the given histogram. The multinomial coefficient (nnQ(a1),,nQ(aM))2nH(Q)\binom{n}{nQ(a_1), \ldots, nQ(a_M)} \doteq 2^{nH(Q)} by Stirling's approximation. The entropy of the type — not the true distribution — determines the size.

Theorem: Probability of a Type Class

If XPn\mathbf{X} \sim P^n (i.i.d. with true distribution PP), then for any type QPnQ \in \mathcal{P}_n:

Pn(TQ)2nD(QP).P^n(T_Q) \doteq 2^{-nD(Q \| P)}.

More precisely:

1(n+1)X2nD(QP)Pn(TQ)2nD(QP).\frac{1}{(n+1)^{|\mathcal{X}|}} \cdot 2^{-nD(Q\|P)} \leq P^n(T_Q) \leq 2^{-nD(Q\|P)}.

The probability of seeing type QQ when the true distribution is PP decays exponentially with the KL divergence D(QP)D(Q \| P). Types close to PP (small divergence) have high probability; types far from PP (large divergence) are exponentially unlikely. This is the precise version of the "concentration on typical sequences" phenomenon.

Theorem: Sanov's Theorem

Let E\mathcal{E} be a set of distributions on X\mathcal{X} (a subset of the probability simplex), and let Q=argminQED(QP)Q^* = \arg\min_{Q \in \overline{\mathcal{E}}} D(Q \| P) be the I-projection of PP onto the closure of E\mathcal{E}. Then:

Pr(QXE)2nD(QP).\Pr(Q_{\mathbf{X}} \in \mathcal{E}) \doteq 2^{-nD(Q^* \| P)}.

The probability that the empirical distribution falls in a set E\mathcal{E} decays exponentially with the minimum KL divergence from PP to E\mathcal{E}.

Among all types in E\mathcal{E}, the one closest to PP (in KL divergence) dominates the probability — all others are exponentially less likely. Sanov's theorem is the master large-deviation result for discrete distributions. It governs error exponents in hypothesis testing, source coding, and channel coding.

Example: Sanov's Theorem: Detecting a Biased Coin

A coin has P(heads)=0.5P(\text{heads}) = 0.5 (fair). We flip it nn times and want the probability that the empirical frequency of heads exceeds 0.60.6. Estimate this probability for large nn.

Historical Note: Csiszár and the Method of Types

1970s-1981

The method of types was developed systematically by Imre Csiszár and János Körner in a series of papers starting in the 1970s, culminating in their landmark book (1981, 2nd ed. 2011). While the basic idea of grouping sequences by their empirical distribution is natural, Csiszár and Körner showed that this approach provides the sharpest possible exponential bounds and leads to the cleanest proofs of coding theorems.

The method of types is particularly natural for discrete memoryless systems. For continuous sources and channels, the analogous tool is the Donsker-Varadhan large deviation principle, which generalizes the type analysis to abstract alphabets.

Type Class Probabilities

For a binary source with true probability pp, visualize the probability Pn(TQ)2nD(QP)P^n(T_Q) \doteq 2^{-nD(Q\|P)} for each type QQ. Types near PP have exponentially higher probability.

Parameters
0.3

P(X=1)

50

Length of sequences

🎓CommIT Contribution(2014)

Error Exponents for Mismatched Decoding

A. Somekh-Baruch, G. CaireIEEE Transactions on Information Theory

The method of types provides the tightest exponential analysis of coding performance. Somekh-Baruch and Caire analyzed error exponents under mismatched decoding — where the decoder uses a metric different from the true channel law — using type-based techniques. This mismatch setting is practically important: real receivers often use simplified (e.g., Gaussian) metrics even when the true channel is non-Gaussian. The results extend the Csiszár-Körner framework to characterize exactly when mismatched decoding incurs an exponent penalty versus the matched case.

method-of-typeserror-exponentsmismatched-decodingView Paper →

Key Takeaway

The method of types gives exact exponential analysis. The probability of observing type QQ under true distribution PP is Pn(TQ)2nD(QP)P^n(T_Q) \doteq 2^{-nD(Q\|P)}, and the number of types is polynomial in nn. Sanov's theorem extends this to arbitrary sets of distributions: the probability of the set decays as 2nD(QP)2^{-nD(Q^*\|P)} where QQ^* is the closest type in KL divergence.