Ferkans — Interactive Telecom Tutor

Why the Method of Types?

In Chapter 3 we used typicality to prove Shannon's source and channel coding theorems. The proofs were elegant, but the bounds we obtained were asymptotic — they told us what happens as the blocklength $n \to \infty$ , but said nothing about how fast things converge. The method of types is a more refined combinatorial tool that gives us exponential control over probabilities. The key insight is deceptively simple: instead of asking "is this sequence typical?", we ask "what is the exact empirical distribution of this sequence?" This single shift in perspective unlocks error exponents — the rate at which error probabilities decay to zero.

Definition:
Type (Empirical Distribution)

Let $\mathbf{x} = (x_1, x_2, \ldots, x_n) \in \mathcal{X}^n$ be a sequence of length $n$ over a finite alphabet $\mathcal{X}$ . The type (or empirical distribution) of $\mathbf{x}$ is the probability distribution $\hat{P}_{\mathbf{x}}$ on $\mathcal{X}$ defined by $\hat{P}_{\mathbf{x}}(a) = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}\{x_i = a\}, \quad a \in \mathcal{X}.$ We write $\hat{P}_{\mathbf{x}}(a) = N(a | \mathbf{x})/n$ where $N(a | \mathbf{x})$ is the number of occurrences of symbol $a$ in $\mathbf{x}$ .

The type captures everything about a sequence that matters for i.i.d. probability calculations: if two sequences have the same type, they have the same probability under any i.i.d. source.

Definition:
Type Class

The type class of a distribution $Q \in \mathcal{P}_n(\mathcal{X})$ is the set of all sequences of length $n$ having type $Q$ : $T_Q = \{\mathbf{x} \in \mathcal{X}^n : \hat{P}_{\mathbf{x}} = Q\}.$ The type class is also called the composition class or shell of $Q$ .

Each sequence in $\mathcal{X}^n$ belongs to exactly one type class, so the type classes partition $\mathcal{X}^n$ .

Definition:
Set of Types

The set of all possible types with denominator $n$ on alphabet $\mathcal{X}$ is $\mathcal{P}_n(\mathcal{X}) = \left\{Q \in \mathcal{P}(\mathcal{X}) : Q(a) = \frac{k_a}{n} \text{ for some } k_a \in \{0, 1, \ldots, n\}, \; \sum_{a} k_a = n \right\}.$ This is a finite set — the number of types is polynomial in $n$ .

Theorem: Polynomial Bound on the Number of Types

The number of types with denominator $n$ on an alphabet of size $|\mathcal{X}|$ satisfies $|\mathcal{P}_n(\mathcal{X})| \leq (n+1)^{|\mathcal{X}|}.$

Each type is determined by choosing $|\mathcal{X}|$ non-negative integers that sum to $n$ . The number of such compositions is $\binom{n + |\mathcal{X}| - 1}{|\mathcal{X}| - 1}$ , which is polynomial in $n$ for fixed $|\mathcal{X}|$ . The point is that the number of types grows only polynomially in $n$ , while the number of sequences grows exponentially. This polynomial-vs-exponential gap is what makes the method of types so powerful.

Proof

Stars and bars counting

Each type $Q \in \mathcal{P}_n(\mathcal{X})$ is determined by the counts $(nQ(a_1), nQ(a_2), \ldots, nQ(a_{|\mathcal{X}|}))$ where each count is a non-negative integer and the counts sum to $n$ . The number of such tuples is $\binom{n + |\mathcal{X}| - 1}{|\mathcal{X}| - 1}.$

Upper bound

Since $\binom{n + |\mathcal{X}| - 1}{|\mathcal{X}| - 1} \leq (n+1)^{|\mathcal{X}| - 1} \leq (n+1)^{|\mathcal{X}|}$ , we obtain $|\mathcal{P}_n(\mathcal{X})| \leq (n+1)^{|\mathcal{X}|}$ .

Theorem: Size of the Type Class

For any type $Q \in \mathcal{P}_n(\mathcal{X})$ : $(n+1)^{-|\mathcal{X}|} \cdot 2^{n H(Q)} \leq |T_Q| \leq 2^{n H(Q)}$ where $H(Q) = -\sum_{a \in \mathcal{X}} Q(a) \log Q(a)$ is the entropy of the distribution $Q$ .

The type class has approximately $2^{n H(Q)}$ elements — this is a precise version of the AEP. The entropy of the empirical distribution determines the exponential growth rate of the type class, up to a polynomial factor. Intuitively, sequences with "more uniform" types (higher entropy) come from larger type classes.

Proof

Multinomial counting for the upper bound

The size of $T_Q$ is the multinomial coefficient: $|T_Q| = \binom{n}{nQ(a_1), nQ(a_2), \ldots, nQ(a_{|\mathcal{X}|})} = \frac{n!}{\prod_{a \in \mathcal{X}} (nQ(a))!}.$ By Stirling's approximation, $\log n! = n \log n - n \log e + O(\log n)$ , we get $\frac{1}{n}\log |T_Q| = H(Q) + O\!\left(\frac{|\mathcal{X}| \log n}{n}\right).$ Therefore $|T_Q| \leq 2^{n H(Q)}$ .

Lower bound from partitioning

Since the type classes partition $\mathcal{X}^n$ and there are at most $(n+1)^{|\mathcal{X}|}$ types: $|\mathcal{X}|^n = \sum_{Q \in \mathcal{P}_n(\mathcal{X})} |T_Q| \leq (n+1)^{|\mathcal{X}|} \cdot \max_Q |T_Q|.$ For the specific type $Q$ , we use that $P^n(\mathbf{x})$ is the same for all $\mathbf{x} \in T_Q$ (under the i.i.d. distribution $P$ ), and since probabilities sum to at most 1: $1 \geq \sum_{\mathbf{x} \in T_Q} P^n(\mathbf{x}) = |T_Q| \cdot 2^{-n(H(Q) + D(Q \| P))}$ for any $P$ . Setting $P = Q$ gives $|T_Q| \leq 2^{n H(Q)}$ . For the lower bound, the fact that the type classes cover $\mathcal{X}^n$ with at most $(n+1)^{|\mathcal{X}|}$ classes yields $|T_Q| \geq (n+1)^{-|\mathcal{X}|} \cdot 2^{n H(Q)}.$

Theorem: Probability of a Type Class under an i.i.d. Source

If $X_1, X_2, \ldots, X_n$ are drawn i.i.d. according to $P$ on $\mathcal{X}$ , then for any type $Q \in \mathcal{P}_n(\mathcal{X})$ : $P^n(T_Q) = \sum_{\mathbf{x} \in T_Q} P^n(\mathbf{x}) \doteq 2^{-n D(Q \| P)}.$ More precisely: $(n+1)^{-|\mathcal{X}|} \cdot 2^{-n D(Q \| P)} \leq P^n(T_Q) \leq 2^{-n D(Q \| P)}.$

The probability of seeing a type $Q$ when the true distribution is $P$ decays exponentially at rate $D(Q \| P)$ . The farther $Q$ is from $P$ in KL divergence, the less likely we are to see that type. When $Q = P$ , the KL divergence is zero and the probability is essentially 1 (the most likely type). This is the method of types' sharpened version of the law of large numbers.

Proof

Probability of a single sequence

For any $\mathbf{x} \in T_Q$ : $P^n(\mathbf{x}) = \prod_{a \in \mathcal{X}} P(a)^{nQ(a)} = 2^{n \sum_a Q(a) \log P(a)} = 2^{-n(H(Q) + D(Q \| P))}.$ This is the same for every sequence in $T_Q$ — this is the key simplification.

Multiply by type class size

Using the bounds on $|T_Q|$ : $P^n(T_Q) = |T_Q| \cdot 2^{-n(H(Q) + D(Q \| P))}.$ Substituting $(n+1)^{-|\mathcal{X}|} 2^{nH(Q)} \leq |T_Q| \leq 2^{nH(Q)}$ gives the result. The entropy terms cancel, leaving only $2^{-n D(Q \| P)}$ up to the polynomial factor.

Example: Types of Binary Sequences

Let $\mathcal{X} = \{0, 1\}$ and $n = 4$ . Enumerate all types, their type classes, and compute $P^4(T_Q)$ for each type when $P = \text{Bernoulli}(1/3)$ .

Solution

Enumerate types

There are $n + 1 = 5$ types on a binary alphabet: $\mathcal{P}_4(\{0,1\}) = \left\{(1, 0),\; (3/4, 1/4),\; (1/2, 1/2),\; (1/4, 3/4),\; (0, 1)\right\}$ where each pair is $(Q(0), Q(1))$ .

Type classes

$T_{(1,0)} = \{0000\}$ , $|T_{(1,0)}| = 1$
$T_{(3/4, 1/4)} = \{0001, 0010, 0100, 1000\}$ , $|T_{(3/4, 1/4)}| = 4 = \binom{4}{1}$
$T_{(1/2, 1/2)} = \{0011, 0101, 0110, 1001, 1010, 1100\}$ , $|T_{(1/2, 1/2)}| = 6 = \binom{4}{2}$
$T_{(1/4, 3/4)} = \{0111, 1011, 1101, 1110\}$ , $|T_{(1/4, 3/4)}| = 4$
$T_{(0,1)} = \{1111\}$ , $|T_{(0,1)}| = 1$

Notice $|T_Q| = 2^{nH(Q)}$ up to polynomial factors: $|T_{(1/2,1/2)}| = 6 \approx 2^{4 \cdot 1} = 16$ is the loosest bound; for small $n$ the polynomial factor matters.

Probabilities under $P = \text{Bernoulli}(1/3)$

With $P(0) = 2/3$ , $P(1) = 1/3$ :

$P^4(T_{(1,0)}) = (2/3)^4 = 16/81 \approx 0.198$
$P^4(T_{(3/4,1/4)}) = 4 \cdot (2/3)^3(1/3)^1 = 32/81 \approx 0.395$
$P^4(T_{(1/2,1/2)}) = 6 \cdot (2/3)^2(1/3)^2 = 24/81 \approx 0.296$
$P^4(T_{(1/4,3/4)}) = 4 \cdot (2/3)^1(1/3)^3 = 8/81 \approx 0.099$
$P^4(T_{(0,1)}) = (1/3)^4 = 1/81 \approx 0.012$

The most probable type is $Q = (3/4, 1/4)$ , which is the closest type to $P = (2/3, 1/3)$ . As $n$ grows, the most probable type converges to $P$ .

Definition:
Joint Type and Conditional Type

For a pair of sequences $(\mathbf{x}, \mathbf{y}) \in \mathcal{X}^n \times \mathcal{Y}^n$ , the joint type is the empirical joint distribution: $\hat{P}_{\mathbf{x}, \mathbf{y}}(a, b) = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}\{x_i = a, y_i = b\}.$ Given $\mathbf{x}$ with type $Q$ , a conditional type $V : \mathcal{X} \to \mathcal{P}(\mathcal{Y})$ is a stochastic matrix such that $\hat{P}_{\mathbf{x},\mathbf{y}}(a,b) = Q(a) V(b|a)$ . The conditional type class is $T_{V|\mathbf{x}} = \{\mathbf{y} \in \mathcal{Y}^n : \hat{P}_{\mathbf{x},\mathbf{y}}(a,b) = Q(a) V(b|a) \; \forall a, b\}.$

Theorem: Size of the Conditional Type Class

For any $\mathbf{x} \in T_Q$ and conditional type $V$ : $(n+1)^{-|\mathcal{X}||\mathcal{Y}|} \cdot 2^{n H(V|Q)} \leq |T_{V|\mathbf{x}}| \leq 2^{n H(V|Q)}$ where $H(V|Q) = -\sum_{a,b} Q(a) V(b|a) \log V(b|a)$ is the conditional entropy of $Y$ given $X$ under the joint distribution $Q \times V$ .

Proof

Reduction to marginal type classes

For each value $a \in \mathcal{X}$ , the subsequence of $\mathbf{y}$ at positions where $x_i = a$ must have type $V(\cdot | a)$ over $\mathcal{Y}$ . The number of such subsequences is the product over $a$ of the type class sizes for the conditional distributions, which by the previous theorem gives the exponential rate $\sum_a Q(a) H(V(\cdot|a)) = H(V|Q)$ .

Quick Check

For a ternary alphabet $\mathcal{X} = \{a, b, c\}$ and $n = 6$ , the number of distinct types $|\mathcal{P}_6(\{a,b,c\})|$ is at most:

18

28

343

729

Correction:

28

The exact count is $\\binom{n + |\\mathcal{X}| - 1}{|\\mathcal{X}| - 1} = \\binom{8}{2} = 28$ . The looser bound gives $(6+1)^3 = 343$ .

Type (empirical distribution)

The relative frequency of each symbol in a sequence. For $\mathbf{x} \in \mathcal{X}^n$ , $\hat{P}_{\mathbf{x}}(a) = N(a|\mathbf{x})/n$ . Two sequences with the same type have the same probability under any i.i.d. source.

Related: Type Class

Type class

The set $T_Q$ of all sequences in $\mathcal{X}^n$ whose empirical distribution equals $Q$ . Its size is approximately $2^{n H(Q)}$ .

Related: Type (Empirical Distribution)

Exponential equality ( $\doteq$ )

Notation for equality to first order in the exponent: $a_n \doteq 2^{nb}$ means $\lim_{n \to \infty} \frac{1}{n}\log a_n = b$ . Equivalently, the ratio of the logarithms converges. This ignores polynomial prefactors.

Common Mistake: Types vs. Typical Sequences

Mistake:

Confusing the type class $T_Q$ with the typical set $\mathcal{T}_\epsilon^{(n)}$ . A student might think "the typical set is just the type class of $P$ ."

Correction:

The typical set $\mathcal{T}_\epsilon^{(n)}$ is the union of all type classes $T_Q$ for which $Q$ is close to $P$ . Specifically, $\mathcal{T}_\epsilon^{(n)} = \bigcup_{Q : |H(Q) - H(P)| \leq \epsilon} T_Q$ (for weakly typical sets) or $\bigcup_{Q : |Q(a) - P(a)| \leq \epsilon \; \forall a} T_Q$ (for strongly typical sets). The typical set is a coarse-grained object; individual type classes are the fine-grained building blocks.

Common Mistake: Ignoring Polynomial Factors Too Early

Mistake:

Using the $\doteq$ notation to conclude that $P^n(T_Q) = 2^{-nD(Q \| P)}$ exactly, then performing non-exponential operations (like summing over polynomially many types) without checking that the polynomial factors cancel.

Correction:

The $\doteq$ notation absorbs polynomial factors, which is fine for exponential-rate arguments. But when you sum over $(n+1)^{|\mathcal{X}|}$ types (a polynomial number), the polynomial factors contribute at most another polynomial, which is still absorbed. The danger arises when you try to extract exact constants (not just exponents) — there, the method of types gives only the exponential rate, and you need refined asymptotics for the prefactor.

Historical Note: Csiszár and Körner: The Hungarian School

1970s–1980s

The method of types was systematically developed by Imre Csiszár and János Körner in their landmark 1981 monograph Information Theory: Coding Theorems for Discrete Memoryless Systems. While the basic counting arguments were known earlier (types appear implicitly in Shannon's 1948 paper), Csiszár and Körner elevated the method into a complete framework capable of proving essentially all known coding theorems for discrete memoryless sources and channels — often with tighter results than the probabilistic (typicality-based) approach. The Hungarian school's contribution was to recognize that the combinatorial structure of types, rather than the probabilistic structure of typical sets, is the more fundamental object. Their approach yields error exponents naturally, whereas typicality only gives achievability at the right rate.

Type Classes: Partitioning the Sequence Space

Animated walkthrough of how type classes partition

\mathcal{X}^n

for binary sequences of length

n = 4

. Shows the exponential growth

|T_Q| \doteq 2^{nH(Q)}

and how the number of types remains polynomial in

n

.

Key Takeaway

The method of types replaces probabilistic typicality arguments with exact combinatorial counting. The three fundamental facts are: (1) the number of types is polynomial in $n$ , (2) each type class has $\doteq 2^{nH(Q)}$ elements, and (3) the probability of a type class under an i.i.d. source decays as $2^{-nD(Q \| P)}$ . Together, these yield exponentially tight bounds on all quantities of interest.

Types, Type Classes, and Their Properties

Why the Method of Types?

Definition: Type (Empirical Distribution)

Definition: Type Class

Definition: Set of Types

Theorem: Polynomial Bound on the Number of Types

Stars and bars counting

Upper bound

Theorem: Size of the Type Class

Multinomial counting for the upper bound

Lower bound from partitioning

Theorem: Probability of a Type Class under an i.i.d. Source

Probability of a single sequence

Multiply by type class size

Example: Types of Binary Sequences

Enumerate types

Type classes

Probabilities under $P = \text{Bernoulli}(1/3)$

Definition: Joint Type and Conditional Type

Theorem: Size of the Conditional Type Class

Reduction to marginal type classes

Quick Check

Type (empirical distribution)

Type class

Exponential equality (≐\doteq≐)

Common Mistake: Types vs. Typical Sequences

Common Mistake: Ignoring Polynomial Factors Too Early

Historical Note: Csiszár and Körner: The Hungarian School

Type Classes: Partitioning the Sequence Space

Key Takeaway

Definition:
Type (Empirical Distribution)

Definition:
Type Class

Definition:
Set of Types

Definition:
Joint Type and Conditional Type

Exponential equality ( $\doteq$ )