The Method of Types
Exact Combinatorics for Discrete Sources
The AEP and typicality give us the right exponential behavior but with loose polynomial factors. The method of types provides exact exponential analysis by working directly with the empirical distribution (type) of a sequence. The key insight: the probability of a type class is determined by the KL divergence between the type and the true distribution, and the number of types is polynomial in . This combinatorial precision is essential for error exponents (Chapter 4), hypothesis testing, and large-deviation results.
Definition: Type (Empirical Distribution)
Type (Empirical Distribution)
The type of a sequence is its empirical distribution:
We denote the set of all types (empirical distributions) that can arise from length- sequences over by .
Definition: Type Class
Type Class
The type class of a distribution is:
the set of all sequences with empirical distribution exactly equal to .
Type (empirical distribution)
The histogram of a sequence normalized to a probability distribution. For : . The method of types analyzes probabilities by grouping sequences by their type.
Related: Strong typicality
Theorem: Number of Types Is Polynomial
nn$.
Each type is determined by the counts with and . The number of such integer partitions is .
The polynomial growth of the number of types is key: it means that polynomial factors are "free" when computing exponential rates. This is why the notation (ignoring polynomial factors) works so well.
Counting
Each component for each . There are at most choices for each of components (overcounting, since they must sum to ):
.
Theorem: Size of a Type Class
For any type :
In exponential notation: .
The type class contains all permutations of a sequence with the given histogram. The multinomial coefficient by Stirling's approximation. The entropy of the type — not the true distribution — determines the size.
Stirling bounds
.
Using :
.
The polynomial correction factors are absorbed by .
Theorem: Probability of a Type Class
If (i.i.d. with true distribution ), then for any type :
More precisely:
The probability of seeing type when the true distribution is decays exponentially with the KL divergence . Types close to (small divergence) have high probability; types far from (large divergence) are exponentially unlikely. This is the precise version of the "concentration on typical sequences" phenomenon.
Combine size and per-sequence probability
Every has the same probability: .
.
Theorem: Sanov's Theorem
Let be a set of distributions on (a subset of the probability simplex), and let be the I-projection of onto the closure of . Then:
The probability that the empirical distribution falls in a set decays exponentially with the minimum KL divergence from to .
Among all types in , the one closest to (in KL divergence) dominates the probability — all others are exponentially less likely. Sanov's theorem is the master large-deviation result for discrete distributions. It governs error exponents in hypothesis testing, source coding, and channel coding.
Upper bound
.
Lower bound (sketch)
For any type approaching :
as .
Example: Sanov's Theorem: Detecting a Biased Coin
A coin has (fair). We flip it times and want the probability that the empirical frequency of heads exceeds . Estimate this probability for large .
Apply Sanov
. The I-projection of onto is .
bits.
Rate of decay
.
For : . For : .
The probability of a significant deviation from the true distribution decays exponentially in , with the KL divergence as the exponent.
Historical Note: Csiszár and the Method of Types
1970s-1981The method of types was developed systematically by Imre Csiszár and János Körner in a series of papers starting in the 1970s, culminating in their landmark book (1981, 2nd ed. 2011). While the basic idea of grouping sequences by their empirical distribution is natural, Csiszár and Körner showed that this approach provides the sharpest possible exponential bounds and leads to the cleanest proofs of coding theorems.
The method of types is particularly natural for discrete memoryless systems. For continuous sources and channels, the analogous tool is the Donsker-Varadhan large deviation principle, which generalizes the type analysis to abstract alphabets.
Type Class Probabilities
For a binary source with true probability , visualize the probability for each type . Types near have exponentially higher probability.
Parameters
P(X=1)
Length of sequences
Error Exponents for Mismatched Decoding
The method of types provides the tightest exponential analysis of coding performance. Somekh-Baruch and Caire analyzed error exponents under mismatched decoding — where the decoder uses a metric different from the true channel law — using type-based techniques. This mismatch setting is practically important: real receivers often use simplified (e.g., Gaussian) metrics even when the true channel is non-Gaussian. The results extend the Csiszár-Körner framework to characterize exactly when mismatched decoding incurs an exponent penalty versus the matched case.
Key Takeaway
The method of types gives exact exponential analysis. The probability of observing type under true distribution is , and the number of types is polynomial in . Sanov's theorem extends this to arbitrary sets of distributions: the probability of the set decays as where is the closest type in KL divergence.