Ferkans — Interactive Telecom Tutor

The Fundamental Question of Source Coding

Suppose we observe a discrete random variable $X$ and want to describe it using a binary string. How short can we make the description, on average? Shannon's answer — $H(X)$ bits — is one of the most elegant results in all of mathematics. But to make this precise, we need to define what we mean by a "code" and what properties it must have to be useful. The key requirement is instantaneous decodability: a decoder must be able to identify each codeword the moment it is received, without waiting for future symbols. This leads us to prefix codes, and Kraft's inequality tells us exactly which codeword length assignments are possible.

Definition:
Source Code

A source code $c$ for a random variable $X$ with alphabet $\mathcal{X} = \{x_1, \ldots, x_m\}$ is a mapping $c : \mathcal{X} \to \mathcal{D}^*$ from source symbols to finite-length strings over a code alphabet $\mathcal{D}$ (typically $\mathcal{D} = \{0, 1\}$ ). The binary string $c(x)$ is called the codeword for $x$ , and $\ell(x) = |c(x)|$ is its length.

Definition:
Uniquely Decodable Code

A code $c$ is uniquely decodable if the extension $c^* : \mathcal{X}^* \to \mathcal{D}^*$ defined by concatenation, $c^*(x_1 x_2 \cdots x_n) = c(x_1) c(x_2) \cdots c(x_n),$ is injective — that is, no two distinct source sequences produce the same binary string.

Unique decodability is the minimum requirement for a useful code. Without it, the decoder cannot recover the source sequence from the binary representation.

Definition:
Prefix Code (Instantaneous Code)

A code $c$ is a prefix code (also called an instantaneous code) if no codeword is a prefix of any other codeword. Formally: for all $x \neq x'$ , $c(x)$ is not a prefix of $c(x')$ .

Every prefix code is uniquely decodable, but not every uniquely decodable code is a prefix code. The advantage of prefix codes is that they can be decoded symbol by symbol without looking ahead — the decoder recognizes the end of each codeword immediately. This is essential for real-time applications.

Example: Prefix Code vs. Non-Prefix Code

Let $\mathcal{X} = \{a, b, c, d\}$ . Consider two codes:

Symbol	Code 1	Code 2
$a$	0	0
$b$	10	01
$c$	110	011
$d$	111	0111

Determine which codes are prefix codes and which are uniquely decodable.

Solution

Code 1: Prefix code

No codeword is a prefix of another: $0$ is not a prefix of $10, 110, 111$ ; $10$ is not a prefix of $110, 111$ ; and $110$ is not a prefix of $111$ . This code is prefix-free and can be decoded instantaneously. For example, $01101110$ decodes as $a \cdot c \cdot d \cdot a$ — we can identify each codeword boundary while reading left to right.

Code 2: Not a prefix code, but uniquely decodable

Here $c(a) = 0$ is a prefix of $c(b) = 01$ , which is a prefix of $c(c) = 011$ , which is a prefix of $c(d) = 0111$ . So Code 2 is not a prefix code. However, it is uniquely decodable — the decoder waits until it sees a 0 after a run of 1s to determine the boundary. This code is called a comma code: the leading 0 in each codeword acts as a delimiter.

Theorem: Kraft's Inequality

For any prefix code over a $D$ -ary alphabet with codeword lengths $\ell_1, \ell_2, \ldots, \ell_m$ : $\sum_{i=1}^{m} D^{-\ell_i} \leq 1.$ Conversely, if the lengths $\ell_1, \ldots, \ell_m$ satisfy this inequality, there exists a prefix code with these codeword lengths.

Think of the code as a $D$ -ary tree. Each codeword is a leaf, and a codeword of length $\ell$ "uses up" a fraction $D^{-\ell}$ of the tree's capacity. The total capacity is 1 (the root), and the leaves must not overlap (prefix condition), so the fractions must sum to at most 1. The converse says that any length assignment respecting this budget can be realized as an actual prefix code — we can always arrange the codewords on the tree.

Proof

Forward direction: prefix code implies Kraft

Let $\ell_{\max} = \max_i \ell_i$ . Consider the complete $D$ -ary tree of depth $\ell_{\max}$ . Each codeword $c(x_i)$ of length $\ell_i$ corresponds to an internal node at depth $\ell_i$ . Since the code is prefix-free, the subtree rooted at this node (containing $D^{\ell_{\max} - \ell_i}$ leaves) is disjoint from all other such subtrees. The total number of leaves in all subtrees is $\sum_{i=1}^m D^{\ell_{\max} - \ell_i} \leq D^{\ell_{\max}}$ (total leaves in the complete tree). Dividing by $D^{\ell_{\max}}$ : $\sum_{i=1}^m D^{-\ell_i} \leq 1$ .

Converse: Kraft implies prefix code existence

Sort lengths: $\ell_1 \leq \ell_2 \leq \cdots \leq \ell_m$ . Assign codewords greedily on the $D$ -ary tree: for each $i$ , pick the leftmost available node at depth $\ell_i$ and mark its entire subtree as used. The Kraft inequality guarantees that at each step, a node is available — the capacity consumed so far is $\sum_{j<i} D^{-\ell_j} < 1$ , leaving room for $D^{-\ell_i}$ .

McMillan's Extension

A remarkable fact: Kraft's inequality holds not only for prefix codes but for all uniquely decodable codes (McMillan, 1956). This means that restricting attention to prefix codes costs nothing in terms of codeword lengths — any achievable length vector for a uniquely decodable code is also achievable by a prefix code. We therefore lose nothing by working exclusively with prefix codes, which have the additional advantage of instantaneous decodability.

Theorem: Source Coding Converse

For any uniquely decodable binary code for a source $X$ with distribution $P$ , the expected codeword length satisfies $L = \mathbb{E}[\ell(X)] \geq H(X).$

We cannot compress below entropy — any code that tries to use fewer than $H(X)$ bits per symbol on average will fail to be uniquely decodable. Entropy is not just a measure of randomness; it is the operational minimum description length.

Proof

Use KL divergence

Define $q_i = D^{-\ell_i} / \sum_j D^{-\ell_j}$ for a $D$ -ary code. By Kraft, $\sum_i D^{-\ell_i} \leq 1$ . Then: $L - H_{D}(X) = \sum_i p_i \log_D \frac{p_i}{D^{-\ell_i}} - 0 = \sum_i p_i \log_D \frac{p_i}{D^{-\ell_i}}.$ Note $\sum_i D^{-\ell_i} \leq 1$ , so $D^{-\ell_i}$ is a sub-probability. Writing $r_i = D^{-\ell_i}$ : $L - H_{D}(X) = D(P \| r) + \log_D \frac{1}{\sum_j r_j} \geq 0$ since both terms are non-negative ( $D \geq 0$ and $\sum r_j \leq 1$ ).

Equality condition

Equality holds iff $D(P \| r) = 0$ (so $p_i = r_i = D^{-\ell_i}$ ) and $\sum r_j = 1$ (Kraft with equality). This requires $\ell_i = -\log_D p_i$ for all $i$ , which demands $p_i$ be a power of $1/D$ . In general, these lengths are not integers, so equality is typically not achievable.

Theorem: Shannon's Source Coding Theorem (Symbol-by-Symbol)

For a DMS with distribution $P$ on alphabet $\mathcal{X}$ , there exists a prefix code with expected codeword length $H(X) \leq L^* < H(X) + 1.$ More generally, for block coding of $n$ symbols: $H(X) \leq \frac{1}{n}L_n^* < H(X) + \frac{1}{n}.$

Shannon's theorem says entropy is achievable: we can get within 1 bit of $H(X)$ with a symbol-by-symbol code, and within $1/n$ bits with a block code of length $n$ . The gap shrinks to zero as $n \to \infty$ , so in the limit, entropy is the exact compression rate.

Proof

Shannon code construction

Set $\ell_i = \lceil -\log p_i \rceil$ . Since $-\log p_i \leq \ell_i < -\log p_i + 1$ : $\sum_i 2^{-\ell_i} \leq \sum_i 2^{\log p_i} = \sum_i p_i = 1$ so Kraft is satisfied, and a prefix code with these lengths exists.

Expected length bound

$L = \sum_i p_i \ell_i < \sum_i p_i(-\log p_i + 1) = H(X) + 1.$ $The lower bound$ L \geq H(X) $was proved in the converse. For block codes, apply the same argument to the block source$ \mathbf{X} = (X_1, \ldots, X_n) $with$ H(\mathbf{X}) = nH(X) $(i.i.d. case), giving$ nH(X) \leq L_n^* < nH(X) + 1$.

Binary Code Tree and Kraft's Inequality

Visualize a binary code tree. Select codeword lengths and see the tree structure, the Kraft sum, and whether the assignment is valid. The plot highlights available and occupied branches.

Parameters

Length

\ell_1

1

Length

\ell_2

2

Length

\ell_3

3

Length

\ell_4

3

Common Mistake: Uniquely Decodable Does Not Mean Instantaneous

Mistake:

Assuming that any uniquely decodable code can be decoded symbol-by-symbol in real time.

Correction:

Only prefix codes are instantaneously decodable. A uniquely decodable code that is not prefix-free may require the decoder to wait until the entire message is received before it can determine the first symbol. By McMillan's theorem, any uniquely decodable code has the same length vector as some prefix code, so we never need non-prefix codes in practice.

Prefix code

A code in which no codeword is a prefix of another. Equivalently, the codewords correspond to leaves of a binary tree. Prefix codes are instantaneously decodable: the decoder recognizes each codeword boundary without lookahead.

Kraft's inequality

The constraint $\sum_i 2^{-\ell_i} \leq 1$ that any prefix code must satisfy. Equivalently, the necessary and sufficient condition on codeword lengths for a prefix code to exist.

Related: Kraft's Inequality

Historical Note: Kraft, McMillan, and the Tree Argument

1949–1956

Kraft's inequality was proved in Leon Kraft's 1949 MIT master's thesis, supervised by Robert Fano. Brockway McMillan independently proved in 1956 that the same inequality holds for all uniquely decodable codes, not just prefix codes — a surprising strengthening that eliminated any reason to consider non-prefix codes. The binary tree argument we used in the proof is due to Kraft; McMillan's proof uses generating functions and is more algebraic in flavor. Together, these results established prefix codes as the "right" class of codes for lossless compression.

Key Takeaway

Prefix codes are the right framework for lossless source coding: they are instantaneously decodable, and by McMillan's theorem, they achieve all codeword lengths achievable by any uniquely decodable code. Kraft's inequality ( $\sum 2^{-\ell_i} \leq 1$ ) is both necessary and sufficient for a prefix code to exist. The entropy $H(X)$ is a sharp lower bound on the expected codeword length, achievable within 1 bit by Shannon codes and to within $1/n$ by block codes of length $n$ .

Prefix Codes and Kraft's Inequality

The Fundamental Question of Source Coding

Definition: Source Code

Definition: Uniquely Decodable Code

Definition: Prefix Code (Instantaneous Code)

Example: Prefix Code vs. Non-Prefix Code

Code 1: Prefix code

Code 2: Not a prefix code, but uniquely decodable

Theorem: Kraft's Inequality

Forward direction: prefix code implies Kraft

Converse: Kraft implies prefix code existence

McMillan's Extension

Theorem: Source Coding Converse

Use KL divergence

Equality condition

Theorem: Shannon's Source Coding Theorem (Symbol-by-Symbol)

Shannon code construction

Expected length bound

Binary Code Tree and Kraft's Inequality

Parameters

Common Mistake: Uniquely Decodable Does Not Mean Instantaneous

Prefix code

Kraft's inequality

Historical Note: Kraft, McMillan, and the Tree Argument

Key Takeaway

Definition:
Source Code

Definition:
Uniquely Decodable Code

Definition:
Prefix Code (Instantaneous Code)