Types, Type Classes, and Their Properties

Why the Method of Types?

In Chapter 3 we used typicality to prove Shannon's source and channel coding theorems. The proofs were elegant, but the bounds we obtained were asymptotic — they told us what happens as the blocklength nn \to \infty, but said nothing about how fast things converge. The method of types is a more refined combinatorial tool that gives us exponential control over probabilities. The key insight is deceptively simple: instead of asking "is this sequence typical?", we ask "what is the exact empirical distribution of this sequence?" This single shift in perspective unlocks error exponents — the rate at which error probabilities decay to zero.

Definition:

Type (Empirical Distribution)

Let x=(x1,x2,,xn)Xn\mathbf{x} = (x_1, x_2, \ldots, x_n) \in \mathcal{X}^n be a sequence of length nn over a finite alphabet X\mathcal{X}. The type (or empirical distribution) of x\mathbf{x} is the probability distribution P^x\hat{P}_{\mathbf{x}} on X\mathcal{X} defined by P^x(a)=1ni=1n1{xi=a},aX.\hat{P}_{\mathbf{x}}(a) = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}\{x_i = a\}, \quad a \in \mathcal{X}. We write P^x(a)=N(ax)/n\hat{P}_{\mathbf{x}}(a) = N(a | \mathbf{x})/n where N(ax)N(a | \mathbf{x}) is the number of occurrences of symbol aa in x\mathbf{x}.

The type captures everything about a sequence that matters for i.i.d. probability calculations: if two sequences have the same type, they have the same probability under any i.i.d. source.

Definition:

Type Class

The type class of a distribution QPn(X)Q \in \mathcal{P}_n(\mathcal{X}) is the set of all sequences of length nn having type QQ: TQ={xXn:P^x=Q}.T_Q = \{\mathbf{x} \in \mathcal{X}^n : \hat{P}_{\mathbf{x}} = Q\}. The type class is also called the composition class or shell of QQ.

Each sequence in Xn\mathcal{X}^n belongs to exactly one type class, so the type classes partition Xn\mathcal{X}^n.

Definition:

Set of Types

The set of all possible types with denominator nn on alphabet X\mathcal{X} is Pn(X)={QP(X):Q(a)=kan for some ka{0,1,,n},  aka=n}.\mathcal{P}_n(\mathcal{X}) = \left\{Q \in \mathcal{P}(\mathcal{X}) : Q(a) = \frac{k_a}{n} \text{ for some } k_a \in \{0, 1, \ldots, n\}, \; \sum_{a} k_a = n \right\}. This is a finite set — the number of types is polynomial in nn.

Theorem: Polynomial Bound on the Number of Types

The number of types with denominator nn on an alphabet of size X|\mathcal{X}| satisfies Pn(X)(n+1)X.|\mathcal{P}_n(\mathcal{X})| \leq (n+1)^{|\mathcal{X}|}.

Each type is determined by choosing X|\mathcal{X}| non-negative integers that sum to nn. The number of such compositions is (n+X1X1)\binom{n + |\mathcal{X}| - 1}{|\mathcal{X}| - 1}, which is polynomial in nn for fixed X|\mathcal{X}|. The point is that the number of types grows only polynomially in nn, while the number of sequences grows exponentially. This polynomial-vs-exponential gap is what makes the method of types so powerful.

Theorem: Size of the Type Class

For any type QPn(X)Q \in \mathcal{P}_n(\mathcal{X}): (n+1)X2nH(Q)TQ2nH(Q)(n+1)^{-|\mathcal{X}|} \cdot 2^{n H(Q)} \leq |T_Q| \leq 2^{n H(Q)} where H(Q)=aXQ(a)logQ(a)H(Q) = -\sum_{a \in \mathcal{X}} Q(a) \log Q(a) is the entropy of the distribution QQ.

The type class has approximately 2nH(Q)2^{n H(Q)} elements — this is a precise version of the AEP. The entropy of the empirical distribution determines the exponential growth rate of the type class, up to a polynomial factor. Intuitively, sequences with "more uniform" types (higher entropy) come from larger type classes.

Theorem: Probability of a Type Class under an i.i.d. Source

If X1,X2,,XnX_1, X_2, \ldots, X_n are drawn i.i.d. according to PP on X\mathcal{X}, then for any type QPn(X)Q \in \mathcal{P}_n(\mathcal{X}): Pn(TQ)=xTQPn(x)2nD(QP).P^n(T_Q) = \sum_{\mathbf{x} \in T_Q} P^n(\mathbf{x}) \doteq 2^{-n D(Q \| P)}. More precisely: (n+1)X2nD(QP)Pn(TQ)2nD(QP).(n+1)^{-|\mathcal{X}|} \cdot 2^{-n D(Q \| P)} \leq P^n(T_Q) \leq 2^{-n D(Q \| P)}.

The probability of seeing a type QQ when the true distribution is PP decays exponentially at rate D(QP)D(Q \| P). The farther QQ is from PP in KL divergence, the less likely we are to see that type. When Q=PQ = P, the KL divergence is zero and the probability is essentially 1 (the most likely type). This is the method of types' sharpened version of the law of large numbers.

Example: Types of Binary Sequences

Let X={0,1}\mathcal{X} = \{0, 1\} and n=4n = 4. Enumerate all types, their type classes, and compute P4(TQ)P^4(T_Q) for each type when P=Bernoulli(1/3)P = \text{Bernoulli}(1/3).

Definition:

Joint Type and Conditional Type

For a pair of sequences (x,y)Xn×Yn(\mathbf{x}, \mathbf{y}) \in \mathcal{X}^n \times \mathcal{Y}^n, the joint type is the empirical joint distribution: P^x,y(a,b)=1ni=1n1{xi=a,yi=b}.\hat{P}_{\mathbf{x}, \mathbf{y}}(a, b) = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}\{x_i = a, y_i = b\}. Given x\mathbf{x} with type QQ, a conditional type V:XP(Y)V : \mathcal{X} \to \mathcal{P}(\mathcal{Y}) is a stochastic matrix such that P^x,y(a,b)=Q(a)V(ba)\hat{P}_{\mathbf{x},\mathbf{y}}(a,b) = Q(a) V(b|a). The conditional type class is TVx={yYn:P^x,y(a,b)=Q(a)V(ba)  a,b}.T_{V|\mathbf{x}} = \{\mathbf{y} \in \mathcal{Y}^n : \hat{P}_{\mathbf{x},\mathbf{y}}(a,b) = Q(a) V(b|a) \; \forall a, b\}.

Theorem: Size of the Conditional Type Class

For any xTQ\mathbf{x} \in T_Q and conditional type VV: (n+1)XY2nH(VQ)TVx2nH(VQ)(n+1)^{-|\mathcal{X}||\mathcal{Y}|} \cdot 2^{n H(V|Q)} \leq |T_{V|\mathbf{x}}| \leq 2^{n H(V|Q)} where H(VQ)=a,bQ(a)V(ba)logV(ba)H(V|Q) = -\sum_{a,b} Q(a) V(b|a) \log V(b|a) is the conditional entropy of YY given XX under the joint distribution Q×VQ \times V.

Quick Check

For a ternary alphabet X={a,b,c}\mathcal{X} = \{a, b, c\} and n=6n = 6, the number of distinct types P6({a,b,c})|\mathcal{P}_6(\{a,b,c\})| is at most:

18

28

343

729

Type (empirical distribution)

The relative frequency of each symbol in a sequence. For xXn\mathbf{x} \in \mathcal{X}^n, P^x(a)=N(ax)/n\hat{P}_{\mathbf{x}}(a) = N(a|\mathbf{x})/n. Two sequences with the same type have the same probability under any i.i.d. source.

Related: Type Class

Type class

The set TQT_Q of all sequences in Xn\mathcal{X}^n whose empirical distribution equals QQ. Its size is approximately 2nH(Q)2^{n H(Q)}.

Related: Type (Empirical Distribution)

Exponential equality (\doteq)

Notation for equality to first order in the exponent: an2nba_n \doteq 2^{nb} means limn1nlogan=b\lim_{n \to \infty} \frac{1}{n}\log a_n = b. Equivalently, the ratio of the logarithms converges. This ignores polynomial prefactors.

Common Mistake: Types vs. Typical Sequences

Mistake:

Confusing the type class TQT_Q with the typical set Tϵ(n)\mathcal{T}_\epsilon^{(n)}. A student might think "the typical set is just the type class of PP."

Correction:

The typical set Tϵ(n)\mathcal{T}_\epsilon^{(n)} is the union of all type classes TQT_Q for which QQ is close to PP. Specifically, Tϵ(n)=Q:H(Q)H(P)ϵTQ\mathcal{T}_\epsilon^{(n)} = \bigcup_{Q : |H(Q) - H(P)| \leq \epsilon} T_Q (for weakly typical sets) or Q:Q(a)P(a)ϵ  aTQ\bigcup_{Q : |Q(a) - P(a)| \leq \epsilon \; \forall a} T_Q (for strongly typical sets). The typical set is a coarse-grained object; individual type classes are the fine-grained building blocks.

Common Mistake: Ignoring Polynomial Factors Too Early

Mistake:

Using the \doteq notation to conclude that Pn(TQ)=2nD(QP)P^n(T_Q) = 2^{-nD(Q \| P)} exactly, then performing non-exponential operations (like summing over polynomially many types) without checking that the polynomial factors cancel.

Correction:

The \doteq notation absorbs polynomial factors, which is fine for exponential-rate arguments. But when you sum over (n+1)X(n+1)^{|\mathcal{X}|} types (a polynomial number), the polynomial factors contribute at most another polynomial, which is still absorbed. The danger arises when you try to extract exact constants (not just exponents) — there, the method of types gives only the exponential rate, and you need refined asymptotics for the prefactor.

Historical Note: Csiszár and Körner: The Hungarian School

1970s–1980s

The method of types was systematically developed by Imre Csiszár and János Körner in their landmark 1981 monograph Information Theory: Coding Theorems for Discrete Memoryless Systems. While the basic counting arguments were known earlier (types appear implicitly in Shannon's 1948 paper), Csiszár and Körner elevated the method into a complete framework capable of proving essentially all known coding theorems for discrete memoryless sources and channels — often with tighter results than the probabilistic (typicality-based) approach. The Hungarian school's contribution was to recognize that the combinatorial structure of types, rather than the probabilistic structure of typical sets, is the more fundamental object. Their approach yields error exponents naturally, whereas typicality only gives achievability at the right rate.

Type Classes: Partitioning the Sequence Space

Animated walkthrough of how type classes partition Xn\mathcal{X}^n for binary sequences of length n=4n = 4. Shows the exponential growth TQ2nH(Q)|T_Q| \doteq 2^{nH(Q)} and how the number of types remains polynomial in nn.

Key Takeaway

The method of types replaces probabilistic typicality arguments with exact combinatorial counting. The three fundamental facts are: (1) the number of types is polynomial in nn, (2) each type class has 2nH(Q)\doteq 2^{nH(Q)} elements, and (3) the probability of a type class under an i.i.d. source decays as 2nD(QP)2^{-nD(Q \| P)}. Together, these yield exponentially tight bounds on all quantities of interest.