Prefix Codes and Kraft's Inequality
The Fundamental Question of Source Coding
Suppose we observe a discrete random variable and want to describe it using a binary string. How short can we make the description, on average? Shannon's answer β bits β is one of the most elegant results in all of mathematics. But to make this precise, we need to define what we mean by a "code" and what properties it must have to be useful. The key requirement is instantaneous decodability: a decoder must be able to identify each codeword the moment it is received, without waiting for future symbols. This leads us to prefix codes, and Kraft's inequality tells us exactly which codeword length assignments are possible.
Definition: Source Code
Source Code
A source code for a random variable with alphabet is a mapping from source symbols to finite-length strings over a code alphabet (typically ). The binary string is called the codeword for , and is its length.
Definition: Uniquely Decodable Code
Uniquely Decodable Code
A code is uniquely decodable if the extension defined by concatenation, is injective β that is, no two distinct source sequences produce the same binary string.
Unique decodability is the minimum requirement for a useful code. Without it, the decoder cannot recover the source sequence from the binary representation.
Definition: Prefix Code (Instantaneous Code)
Prefix Code (Instantaneous Code)
A code is a prefix code (also called an instantaneous code) if no codeword is a prefix of any other codeword. Formally: for all , is not a prefix of .
Every prefix code is uniquely decodable, but not every uniquely decodable code is a prefix code. The advantage of prefix codes is that they can be decoded symbol by symbol without looking ahead β the decoder recognizes the end of each codeword immediately. This is essential for real-time applications.
Example: Prefix Code vs. Non-Prefix Code
Let . Consider two codes:
| Symbol | Code 1 | Code 2 |
|---|---|---|
| 0 | 0 | |
| 10 | 01 | |
| 110 | 011 | |
| 111 | 0111 |
Determine which codes are prefix codes and which are uniquely decodable.
Code 1: Prefix code
No codeword is a prefix of another: is not a prefix of ; is not a prefix of ; and is not a prefix of . This code is prefix-free and can be decoded instantaneously. For example, decodes as β we can identify each codeword boundary while reading left to right.
Code 2: Not a prefix code, but uniquely decodable
Here is a prefix of , which is a prefix of , which is a prefix of . So Code 2 is not a prefix code. However, it is uniquely decodable β the decoder waits until it sees a 0 after a run of 1s to determine the boundary. This code is called a comma code: the leading 0 in each codeword acts as a delimiter.
Theorem: Kraft's Inequality
For any prefix code over a -ary alphabet with codeword lengths : Conversely, if the lengths satisfy this inequality, there exists a prefix code with these codeword lengths.
Think of the code as a -ary tree. Each codeword is a leaf, and a codeword of length "uses up" a fraction of the tree's capacity. The total capacity is 1 (the root), and the leaves must not overlap (prefix condition), so the fractions must sum to at most 1. The converse says that any length assignment respecting this budget can be realized as an actual prefix code β we can always arrange the codewords on the tree.
Forward direction: prefix code implies Kraft
Let . Consider the complete -ary tree of depth . Each codeword of length corresponds to an internal node at depth . Since the code is prefix-free, the subtree rooted at this node (containing leaves) is disjoint from all other such subtrees. The total number of leaves in all subtrees is (total leaves in the complete tree). Dividing by : .
Converse: Kraft implies prefix code existence
Sort lengths: . Assign codewords greedily on the -ary tree: for each , pick the leftmost available node at depth and mark its entire subtree as used. The Kraft inequality guarantees that at each step, a node is available β the capacity consumed so far is , leaving room for .
McMillan's Extension
A remarkable fact: Kraft's inequality holds not only for prefix codes but for all uniquely decodable codes (McMillan, 1956). This means that restricting attention to prefix codes costs nothing in terms of codeword lengths β any achievable length vector for a uniquely decodable code is also achievable by a prefix code. We therefore lose nothing by working exclusively with prefix codes, which have the additional advantage of instantaneous decodability.
Theorem: Source Coding Converse
For any uniquely decodable binary code for a source with distribution , the expected codeword length satisfies
We cannot compress below entropy β any code that tries to use fewer than bits per symbol on average will fail to be uniquely decodable. Entropy is not just a measure of randomness; it is the operational minimum description length.
Use KL divergence
Define for a -ary code. By Kraft, . Then: Note , so is a sub-probability. Writing : since both terms are non-negative ( and ).
Equality condition
Equality holds iff (so ) and (Kraft with equality). This requires for all , which demands be a power of . In general, these lengths are not integers, so equality is typically not achievable.
Theorem: Shannon's Source Coding Theorem (Symbol-by-Symbol)
For a DMS with distribution on alphabet , there exists a prefix code with expected codeword length More generally, for block coding of symbols:
Shannon's theorem says entropy is achievable: we can get within 1 bit of with a symbol-by-symbol code, and within bits with a block code of length . The gap shrinks to zero as , so in the limit, entropy is the exact compression rate.
Shannon code construction
Set . Since : so Kraft is satisfied, and a prefix code with these lengths exists.
Expected length bound
L \geq H(X)\mathbf{X} = (X_1, \ldots, X_n)H(\mathbf{X}) = nH(X)nH(X) \leq L_n^* < nH(X) + 1$.
Binary Code Tree and Kraft's Inequality
Visualize a binary code tree. Select codeword lengths and see the tree structure, the Kraft sum, and whether the assignment is valid. The plot highlights available and occupied branches.
Parameters
Common Mistake: Uniquely Decodable Does Not Mean Instantaneous
Mistake:
Assuming that any uniquely decodable code can be decoded symbol-by-symbol in real time.
Correction:
Only prefix codes are instantaneously decodable. A uniquely decodable code that is not prefix-free may require the decoder to wait until the entire message is received before it can determine the first symbol. By McMillan's theorem, any uniquely decodable code has the same length vector as some prefix code, so we never need non-prefix codes in practice.
Prefix code
A code in which no codeword is a prefix of another. Equivalently, the codewords correspond to leaves of a binary tree. Prefix codes are instantaneously decodable: the decoder recognizes each codeword boundary without lookahead.
Related: Prefix Code (Instantaneous Code)
Kraft's inequality
The constraint that any prefix code must satisfy. Equivalently, the necessary and sufficient condition on codeword lengths for a prefix code to exist.
Related: Kraft's Inequality
Historical Note: Kraft, McMillan, and the Tree Argument
1949β1956Kraft's inequality was proved in Leon Kraft's 1949 MIT master's thesis, supervised by Robert Fano. Brockway McMillan independently proved in 1956 that the same inequality holds for all uniquely decodable codes, not just prefix codes β a surprising strengthening that eliminated any reason to consider non-prefix codes. The binary tree argument we used in the proof is due to Kraft; McMillan's proof uses generating functions and is more algebraic in flavor. Together, these results established prefix codes as the "right" class of codes for lossless compression.
Key Takeaway
Prefix codes are the right framework for lossless source coding: they are instantaneously decodable, and by McMillan's theorem, they achieve all codeword lengths achievable by any uniquely decodable code. Kraft's inequality () is both necessary and sufficient for a prefix code to exist. The entropy is a sharp lower bound on the expected codeword length, achievable within 1 bit by Shannon codes and to within by block codes of length .