The Asymptotic Equipartition Property (AEP)
The Law of Large Numbers for Information Theory
The law of large numbers tells us that the sample mean of i.i.d. random variables converges to the true mean. The asymptotic equipartition property (AEP) is the information-theoretic analogue: for an i.i.d. source, the per-symbol log-probability of a typical sequence converges to the entropy . In other words, among the possible sequences, only about of them carry appreciable probability β and each of these "typical" sequences has probability approximately .
This concentration phenomenon is the engine behind both source coding (we only need bits to describe the typical sequences) and channel coding (the code needs to "pack" only codewords into the typical output set of size ). Understanding typicality is the single most important step toward understanding the coding theorems.
Theorem: The Weak AEP
Let be i.i.d. on a finite alphabet . Then:
That is, for any :
For i.i.d. sequences, . This is the sample mean of the i.i.d. random variables , each with mean . The weak law of large numbers directly gives convergence in probability.
Express as sample mean
Since :
where is the self-information.
Apply WLLN
The are i.i.d. with mean and finite variance (since is finite). By the weak law of large numbers:
Asymptotic equipartition property (AEP)
The property that for i.i.d. sequences, in probability. Implies that typical sequences all have roughly equal probability .
Related: Typical set
Definition: Weakly Typical Set
Weakly Typical Set
The weakly typical set with respect to is:
Equivalently, iff .
Theorem: Properties of the Typical Set
For any and sufficiently large :
- High probability: .
- Upper bound on size: .
- Lower bound on size: .
Property 1: almost all probability mass concentrates on the typical set. Properties 2-3: the typical set contains about sequences. The total number of sequences is . Since , the typical set is an exponentially small fraction of all sequences β yet it carries almost all the probability.
Property 1 (high probability)
Direct from the AEP: , so for large enough , .
Property 2 (upper bound)
.
Therefore .
Property 3 (lower bound)
.
Therefore .
Example: AEP for a Binary Source
Let i.i.d. For , estimate the size of the typical set and compare with .
Entropy
bits.
Typical set size
.
The total number of binary sequences: .
The typical set fraction: .
Only about of all binary sequences are typical β yet they carry almost all the probability. This is why we can compress the source from bit/symbol to bits/symbol.
AEP: Probability Concentration on the Typical Set
Visualize how the probability mass concentrates on the typical set as the sequence length increases. The histogram shows the distribution of for random i.i.d. sequences, converging to a spike at .
Parameters
P(X=1) for Bernoulli source
Length of i.i.d. sequences
Historical Note: Shannon's Original Insight
1948Shannon recognized the AEP in his 1948 paper, calling it the "equipartition property." He observed that for long sequences from an ergodic source, the sequences divide naturally into two classes: a "typical" set of about sequences that are all roughly equally likely, and an "atypical" set that has negligible total probability despite containing vastly more sequences. This dichotomy is the conceptual foundation of both lossless and lossy source coding.
The name "asymptotic equipartition property" was coined later, by analogy with the equipartition of energy in statistical mechanics. The connection is not merely nominal β both phenomena arise from concentration of measure in high dimensions.
Quick Check
For a source with entropy bits and alphabet size , approximately what fraction of all length- sequences are typical?
Close to for large
Typical set size . Total sequences: . Fraction , which shrinks exponentially.
Common Mistake: Most Sequences Are Atypical, But Typical Sequences Carry All the Probability
Mistake:
Confusing "fraction of sequences that are typical" (which goes to zero) with "probability that a random sequence is typical" (which goes to one).
Correction:
The typical set contains about sequences out of total. Since for non-uniform sources, the fraction . Yet . The resolution: typical sequences are individually more probable than atypical ones, which compensates for their smaller count.
Typical set
The set of sequences whose empirical log-probability is close to the entropy: . Contains about sequences and captures almost all the probability mass for large .
Related: Asymptotic equipartition property (AEP), Strong typicality