The Asymptotic Equipartition Property (AEP)

The Law of Large Numbers for Information Theory

The law of large numbers tells us that the sample mean of i.i.d. random variables converges to the true mean. The asymptotic equipartition property (AEP) is the information-theoretic analogue: for an i.i.d. source, the per-symbol log-probability of a typical sequence converges to the entropy H(X)H(X). In other words, among the ∣X∣n|\mathcal{X}|^n possible sequences, only about 2nH(X)2^{nH(X)} of them carry appreciable probability β€” and each of these "typical" sequences has probability approximately 2βˆ’nH(X)2^{-nH(X)}.

This concentration phenomenon is the engine behind both source coding (we only need nH(X)nH(X) bits to describe the typical sequences) and channel coding (the code needs to "pack" only 2nR2^{nR} codewords into the typical output set of size 2nH(Y)2^{nH(Y)}). Understanding typicality is the single most important step toward understanding the coding theorems.

Theorem: The Weak AEP

Let X1,X2,…,XnX_1, X_2, \ldots, X_n be i.i.d. ∼PX\sim P_X on a finite alphabet X\mathcal{X}. Then:

βˆ’1nlog⁑PXn(X1,…,Xn)β†’PH(X).-\frac{1}{n}\log P_{X^n}(X_1, \ldots, X_n) \xrightarrow{P} H(X).

That is, for any Ο΅>0\epsilon > 0:

Pr⁑ ⁣(βˆ£βˆ’1nlog⁑PXn(X)βˆ’H(X)∣>Ο΅)β†’0asΒ nβ†’βˆž.\Pr\!\left(\left|-\frac{1}{n}\log P_{X^n}(\mathbf{X}) - H(X)\right| > \epsilon\right) \to 0 \quad \text{as } n \to \infty.

For i.i.d. sequences, βˆ’1nlog⁑P(x)=βˆ’1nβˆ‘i=1nlog⁑p(xi)-\frac{1}{n}\log P(\mathbf{x}) = -\frac{1}{n}\sum_{i=1}^n \log p(x_i). This is the sample mean of the i.i.d. random variables βˆ’log⁑p(Xi)-\log p(X_i), each with mean H(X)=E[βˆ’log⁑p(X)]H(X) = \mathbb{E}[-\log p(X)]. The weak law of large numbers directly gives convergence in probability.

Asymptotic equipartition property (AEP)

The property that for i.i.d. sequences, βˆ’1nlog⁑P(X)β†’H(X)-\frac{1}{n}\log P(\mathbf{X}) \to H(X) in probability. Implies that typical sequences all have roughly equal probability β‰ˆ2βˆ’nH(X)\approx 2^{-nH(X)}.

Related: Typical set

Definition:

Weakly Typical Set

The weakly typical set AΟ΅(n)A_\epsilon^{(n)} with respect to PXP_X is:

AΟ΅(n)={x∈Xn:βˆ£βˆ’1nlog⁑PXn(x)βˆ’H(X)βˆ£β‰€Ο΅}.A_\epsilon^{(n)} = \left\{\mathbf{x} \in \mathcal{X}^n : \left|-\frac{1}{n}\log P_{X^n}(\mathbf{x}) - H(X)\right| \leq \epsilon\right\}.

Equivalently, x∈AΟ΅(n)\mathbf{x} \in A_\epsilon^{(n)} iff 2βˆ’n(H(X)+Ο΅)≀PXn(x)≀2βˆ’n(H(X)βˆ’Ο΅)2^{-n(H(X)+\epsilon)} \leq P_{X^n}(\mathbf{x}) \leq 2^{-n(H(X)-\epsilon)}.

Theorem: Properties of the Typical Set

For any Ο΅>0\epsilon > 0 and sufficiently large nn:

  1. High probability: Pr⁑(X∈AΟ΅(n))>1βˆ’Ο΅\Pr(\mathbf{X} \in A_\epsilon^{(n)}) > 1 - \epsilon.
  2. Upper bound on size: ∣AΟ΅(n)βˆ£β‰€2n(H(X)+Ο΅)|A_\epsilon^{(n)}| \leq 2^{n(H(X) + \epsilon)}.
  3. Lower bound on size: ∣AΟ΅(n)∣β‰₯(1βˆ’Ο΅) 2n(H(X)βˆ’Ο΅)|A_\epsilon^{(n)}| \geq (1-\epsilon)\,2^{n(H(X) - \epsilon)}.

Property 1: almost all probability mass concentrates on the typical set. Properties 2-3: the typical set contains about 2nH(X)2^{nH(X)} sequences. The total number of sequences is ∣X∣n=2nlog⁑∣X∣|\mathcal{X}|^n = 2^{n\log|\mathcal{X}|}. Since H(X)≀log⁑∣X∣H(X) \leq \log|\mathcal{X}|, the typical set is an exponentially small fraction of all sequences β€” yet it carries almost all the probability.

Example: AEP for a Binary Source

Let Xi∼Bernoulli(0.3)X_i \sim \text{Bernoulli}(0.3) i.i.d. For n=100n = 100, estimate the size of the typical set and compare with 2n2^n.

AEP: Probability Concentration on the Typical Set

Visualize how the probability mass concentrates on the typical set as the sequence length nn increases. The histogram shows the distribution of βˆ’1nlog⁑P(X)-\frac{1}{n}\log P(\mathbf{X}) for random i.i.d. sequences, converging to a spike at H(X)H(X).

Parameters
0.3

P(X=1) for Bernoulli source

100

Length of i.i.d. sequences

Historical Note: Shannon's Original Insight

1948

Shannon recognized the AEP in his 1948 paper, calling it the "equipartition property." He observed that for long sequences from an ergodic source, the sequences divide naturally into two classes: a "typical" set of about 2nH2^{nH} sequences that are all roughly equally likely, and an "atypical" set that has negligible total probability despite containing vastly more sequences. This dichotomy is the conceptual foundation of both lossless and lossy source coding.

The name "asymptotic equipartition property" was coined later, by analogy with the equipartition of energy in statistical mechanics. The connection is not merely nominal β€” both phenomena arise from concentration of measure in high dimensions.

Quick Check

For a source with entropy H(X)=2H(X) = 2 bits and alphabet size ∣X∣=8|\mathcal{X}| = 8, approximately what fraction of all length-nn sequences are typical?

2βˆ’n2^{-n}

1/21/2

22n/28n2^{2n}/2^{8n}

Close to 11 for large nn

Common Mistake: Most Sequences Are Atypical, But Typical Sequences Carry All the Probability

Mistake:

Confusing "fraction of sequences that are typical" (which goes to zero) with "probability that a random sequence is typical" (which goes to one).

Correction:

The typical set contains about 2nH2^{nH} sequences out of ∣X∣n=2nlog⁑∣X∣|\mathcal{X}|^n = 2^{n\log|\mathcal{X}|} total. Since H<log⁑∣X∣H < \log|\mathcal{X}| for non-uniform sources, the fraction 2nH/2nlog⁑∣X∣=2βˆ’n(log⁑∣Xβˆ£βˆ’H)β†’02^{nH}/2^{n\log|\mathcal{X}|} = 2^{-n(\log|\mathcal{X}| - H)} \to 0. Yet Pr⁑(typical)β†’1\Pr(\text{typical}) \to 1. The resolution: typical sequences are individually more probable than atypical ones, which compensates for their smaller count.

Typical set

The set of sequences whose empirical log-probability is close to the entropy: ∣1nlog⁑P(x)+H(X)βˆ£β‰€Ο΅|\frac{1}{n}\log P(\mathbf{x}) + H(X)| \leq \epsilon. Contains about 2nH(X)2^{nH(X)} sequences and captures almost all the probability mass for large nn.

Related: Asymptotic equipartition property (AEP), Strong typicality

Typical Set Concentration as n Grows

Visualizes how the typical set concentrates as the sequence length nn increases. The typical set shrinks as a fraction of all sequences but carries nearly all the probability.