Ferkans — Interactive Telecom Tutor

Why Routing Is Not Enough

For unicast (one source, one destination), we proved that routing — simply forwarding packets along paths — achieves the max-flow. But what about multicast, where one source must deliver the same information to multiple destinations? It turns out that routing alone cannot always achieve the max-flow min-cut bound for multicast. The breakthrough insight of Ahlswede, Cai, Li, and Yeung (2000) is that coding at intermediate nodes — combining incoming packets before forwarding — can achieve the max-flow bound for any multicast problem.

This is network coding, and it is one of the most elegant results in information theory's intersection with combinatorics and algebra.

Example: The Butterfly Network

Consider the butterfly network: source $s$ wants to multicast two bits $a, b$ to two sinks $t_1, t_2$ . The network has edges (all capacity 1): $s \to u$ , $s \to v$ , $u \to w$ , $v \to w$ , $w \to x$ , $u \to t_1$ , $x \to t_1$ , $v \to t_2$ , $x \to t_2$ .

(a) Show that routing alone can deliver at most rate 1 to both sinks simultaneously.

(b) Show that network coding achieves rate 2 (both bits) to both sinks.

Solution

Part (a): Routing fails

The edge $w \to x$ has capacity 1. With routing, node $w$ must forward either $a$ (from $u$ ) or $b$ (from $v$ ), but not both.

If $w$ forwards $a$ : $t_1$ receives $a$ (via $u \to t_1$ ) and $a$ (via $x \to t_1$ ), but $t_2$ receives only $b$ (via $v \to t_2$ ) and $a$ (via $x \to t_2$ ). So $t_2$ gets both bits. But $t_1$ gets only $a$ — it cannot recover $b$ .
If $w$ forwards $b$ : symmetric argument, $t_2$ gets only $b$ .

Under routing, the best we can do is deliver both bits to one sink and one bit to the other, for a multicast rate of 1.

Part (b): Network coding achieves rate 2

Node $w$ receives $a$ from $u$ and $b$ from $v$ . Instead of forwarding one of them, $w$ computes and forwards $a \oplus b$ (XOR).

Now $x$ receives $a \oplus b$ and forwards it to both $t_1$ and $t_2$ .

$t_1$ receives $a$ (from $u \to t_1$ ) and $a \oplus b$ (from $x \to t_1$ ). It computes $b = a \oplus (a \oplus b)$ . Both bits recovered.
$t_2$ receives $b$ (from $v \to t_2$ ) and $a \oplus b$ (from $x \to t_2$ ). It computes $a = b \oplus (a \oplus b)$ . Both bits recovered.

The multicast rate is 2, achieved by a single XOR at node $w$ .

The XOR Trick: Simple but Profound

The butterfly network example is deceptively simple — a single XOR at one node doubles the multicast rate. But the principle is deep: by combining information at intermediate nodes, we create "coded" packets that are simultaneously useful to multiple receivers. Each receiver, having different side information, can decode the original data from the coded packet.

This is the same principle that appears in coded caching (Chapter 27) and index coding: coded multicasting serves multiple users simultaneously because each user's "wanted" data is mixed with data that serves as side information for the other users.

Theorem: Network Coding Theorem (Ahlswede–Cai–Li–Yeung, 2000)

For a directed acyclic graph $G$ with a single source $s$ and $K$ sinks $t_1, \ldots, t_K$ , the multicast capacity (the maximum rate at which the source can deliver the same information to all sinks simultaneously) is $C_{\text{multicast}} = \min_{k = 1, \ldots, K} \text{mincut}(s, t_k).$

This rate is achievable using network coding — coding at intermediate nodes — over a sufficiently large finite field $\mathbb{F}_q$ .

The multicast capacity is the minimum of the individual max-flows to each sink. This makes intuitive sense: we cannot deliver more than any single sink can receive. The surprising part is that this bound is achievable: we do not lose anything by requiring the same information to go to all sinks, as long as we allow intermediate nodes to code.

The proof shows that linear network codes over $\mathbb{F}_q$ suffice, and the field size $q$ only needs to be larger than the number of sinks $K$ .

Proof

Converse: multicast rate cannot exceed any individual min-cut

For any sink $t_k$ , the information flowing from $s$ to $t_k$ must pass through every $s$ - $t_k$ cut. Therefore the rate to $t_k$ is at most $\text{mincut}(s, t_k)$ . Since the multicast rate must be achievable for all sinks, $C_{\text{multicast}} \leq \min_k \text{mincut}(s, t_k)$ .

Achievability: linear network codes

Let $h = \min_k \text{mincut}(s, t_k)$ . The source generates $h$ symbols $(a_1, \ldots, a_h) \in \mathbb{F}_q^h$ per time unit.

Each edge $e$ carries a symbol $y_e \in \mathbb{F}_q$ that is a linear combination of the symbols on incoming edges to the tail node of $e$ : $y_e = \sum_{e' \in \text{in}(\text{tail}(e))} \alpha_{e',e} \cdot y_{e'}$ where $\alpha_{e',e} \in \mathbb{F}_q$ are the local coding coefficients.

The global coding vector $\mathbf{g}_e \in \mathbb{F}_q^h$ gives the linear combination of source symbols: $y_e = \mathbf{g}_e^T \mathbf{a}$ .

Decodability condition

Sink $t_k$ observes the symbols on its $\text{mincut}(s, t_k) \geq h$ incoming edges. Let $\mathbf{M}_k$ be the $h \times h$ matrix whose rows are the global coding vectors of $h$ linearly independent incoming edges.

Sink $t_k$ can decode if and only if $\text{rank}(\mathbf{M}_k) = h$ , i.e., the transfer matrix is full rank. A valid network code is one where $\mathbf{M}_k$ is invertible for all $k = 1, \ldots, K$ .

By the algebraic framework of Koetter and Médard (2003), such a code exists over any field with $|\mathbb{F}_q| > K$ . $\blacksquare$

Historical Note: Ahlswede, Cai, Li, and Yeung (2000): A Paradigm Shift

2000

The network coding theorem was published by Ahlswede, Cai, Li, and Yeung in the year 2000, and it caused a paradigm shift in networking. Before this result, the prevailing assumption in both theory and practice was that intermediate nodes should only route (forward) packets — the "store and forward" paradigm. The idea that nodes should mix (code) packets before forwarding seemed wasteful or even harmful.

The butterfly network example — showing that a single XOR doubles the multicast throughput — was so compelling that it launched a new field of research. Within a decade, network coding had influenced content distribution networks (CDN), peer-to-peer systems, wireless mesh networks, and even DNA storage. The subsequent work by Li, Yeung, and Cai (2003) showing that linear codes suffice, and by Ho et al. (2006) showing that random linear codes work with high probability, made the theory practical.

Definition:
Linear Network Code

A linear network code over $\mathbb{F}_q$ for a directed acyclic graph $G = (V, E)$ with source $s$ producing $h$ symbols per time unit consists of:

Local coding coefficients: for each pair of adjacent edges $(e', e)$ where the head of $e'$ equals the tail of $e$ , a coefficient $\alpha_{e',e} \in \mathbb{F}_q$
Edge symbol: the symbol on edge $e$ is $y_e = \sum_{e' \in \text{in}(\text{tail}(e))} \alpha_{e',e} \cdot y_{e'}$
Global coding vector: the vector $\mathbf{g}_e \in \mathbb{F}_q^h$ such that $y_e = \mathbf{g}_e^T \mathbf{a}$ where $\mathbf{a} = (a_1, \ldots, a_h)^T$ is the source symbol vector
Transfer matrix at sink $t_k$ : the $h \times h$ matrix $\mathbf{M}_k = [\mathbf{g}_{e_1}, \ldots, \mathbf{g}_{e_h}]^T$ formed by the global coding vectors of $h$ incoming edges

The code is valid if $\det(\mathbf{M}_k) \neq 0$ for all $k$ .

Linear codes are not the only option — nonlinear network codes exist — but they are sufficient for multicast and have the advantage of simple encoding (linear combinations) and decoding (Gaussian elimination).

Network code

A coding scheme for a multi-hop network where intermediate nodes apply coding operations (e.g., linear combinations over a finite field) to incoming symbols before forwarding, rather than simply routing packets.

Related: Network flow

Global coding vector

The vector $\mathbf{g}_e \in \mathbb{F}_q^h$ describing how the symbol on edge $e$ depends on the $h$ source symbols. If $\mathbf{a}$ is the source vector, then $y_e = \mathbf{g}_e^T \mathbf{a}$ .

Theorem: Sufficiency of Linear Network Codes (Li–Yeung–Cai, 2003)

For single-source multicast over a directed acyclic graph with $K$ sinks, linear network codes over $\mathbb{F}_q$ achieve the multicast capacity $h = \min_k \text{mincut}(s, t_k)$ whenever $q > K$ .

The decodability condition at each sink is that a certain $h \times h$ matrix over $\mathbb{F}_q$ must be nonsingular. The determinant of this matrix is a multivariate polynomial in the local coding coefficients, and a nonzero polynomial over $\mathbb{F}_q$ has a nonzero evaluation point as long as $q$ is large enough. Since there are $K$ sinks and the polynomial has degree at most $K$ , a field of size $q > K$ suffices by the Schwartz–Zippel lemma.

The practical implication is striking: for a network with 100 sinks, a field of size $q = 128$ ( $\mathbb{F}_{2^7}$ ) suffices — all arithmetic is on 7-bit symbols, easily implementable in hardware.

Proof

Polynomial formulation

The determinant of each transfer matrix $\mathbf{M}_k$ is a polynomial $\det(\mathbf{M}_k) \in \mathbb{F}_q[\alpha_{e',e}]$ in the local coding coefficients. The product $\prod_{k=1}^K \det(\mathbf{M}_k)$ is a nonzero polynomial of degree at most $K \cdot h$ (since each $\det(\mathbf{M}_k)$ has degree at most $h$ in the coding coefficients).

Schwartz–Zippel argument

By the Schwartz–Zippel lemma, a nonzero polynomial of degree $d$ over $\mathbb{F}_q$ has at most $d \cdot q^{n-1}$ zeros among $q^n$ possible evaluations. For a random assignment of coefficients from $\mathbb{F}_q$ : $\Pr\left[\prod_{k=1}^K \det(\mathbf{M}_k) = 0\right] \leq \frac{Kh}{q}.$

For $q > Kh$ (a slight strengthening of $q > K$ ), the probability is less than 1, so a valid assignment exists. For $q > K$ , a more refined analysis using the structure of the network graph suffices. $\blacksquare$

Definition:
Random Linear Network Coding

In random linear network coding (RLNC), each intermediate node independently and uniformly selects its local coding coefficients from $\mathbb{F}_q$ . The source prepends each packet with its global coding vector (a header of $h$ symbols from $\mathbb{F}_q$ ), so that each node and each sink can track the linear combination it holds.

The key properties of RLNC:

Distributed: no centralized code design is needed — each node acts independently
Robust: tolerant to link failures and topology changes (as long as the min-cut remains $\geq h$ )
Success probability: a randomly chosen code is valid with probability $\geq (1 - K/q)^{|E|}$ , which approaches 1 for large $q$

The overhead of RLNC is the coding vector header: $h \lceil\log_2 q\rceil$ bits per packet. For large packets (e.g., 1500-byte Ethernet frames with $h = 10$ and $q = 256$ ), the overhead is 10 bytes, or less than 1%.

Random Linear Network Coding: Success Probability

Explore how the probability of a random linear network code being valid (all sinks can decode) depends on the field size $q$ , the number of sinks $K$ , and the network size. For practical field sizes ( $q \geq 256$ ), the failure probability is negligible.

Parameters

K

(number of sinks)10

h

(multicast rate)4

|E|

(number of edges)30

Example: Constructing a Linear Network Code for the Butterfly

Construct a linear network code over $\mathbb{F}_3$ for the butterfly network and verify that both sinks can decode.

Solution

Assign source symbols

Source $s$ produces $\mathbf{a} = (a_1, a_2) \in \mathbb{F}_3^2$ .

Edge $s \to u$ carries $a_1$ (global vector $\mathbf{g} = (1, 0)$ )
Edge $s \to v$ carries $a_2$ (global vector $\mathbf{g} = (0, 1)$ )

Intermediate coding

Node $w$ receives $a_1$ (from $u$ ) and $a_2$ (from $v$ ). Choose local coefficients $\alpha_1 = 1, \alpha_2 = 1$ :

Edge $w \to x$ carries $a_1 + a_2 \pmod{3}$ (global vector $(1, 1)$ )

Edge $u \to t_1$ carries $a_1$ (global vector $(1, 0)$ ). Edge $x \to t_1$ carries $a_1 + a_2$ (global vector $(1, 1)$ ). Edge $v \to t_2$ carries $a_2$ (global vector $(0, 1)$ ). Edge $x \to t_2$ carries $a_1 + a_2$ (global vector $(1, 1)$ ).

Verify decodability

Sink $t_1$ : Transfer matrix $\mathbf{M}_1 = \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix}$ . $\det(\mathbf{M}_1) = 1 \neq 0$ in $\mathbb{F}_3$ . Decodable.

Sink $t_2$ : Transfer matrix $\mathbf{M}_2 = \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}$ . $\det(\mathbf{M}_2) = -1 = 2 \neq 0$ in $\mathbb{F}_3$ . Decodable.

Both sinks can recover $(a_1, a_2)$ by Gaussian elimination over $\mathbb{F}_3$ . Note: this code also works over $\mathbb{F}_2$ since the XOR operation is addition in $\mathbb{F}_2$ .

Common Mistake: Assuming Routing Achieves Multicast Capacity

Mistake:

Designing a multicast distribution tree using standard shortest-path or Steiner tree algorithms and assuming this achieves the max-flow bound.

Correction:

Routing (tree-based distribution) can fall short of the multicast capacity. The butterfly network shows a gap: routing achieves rate 1 while network coding achieves rate 2. For general networks, the gap can be arbitrarily large. Network coding — even simple random linear coding — always achieves the max-flow bound.

Common Mistake: Using Too Small a Field

Mistake:

Using $\mathbb{F}_2$ (binary XOR) for network coding on a network with many sinks, assuming it always works because it worked for the butterfly.

Correction:

$\mathbb{F}_2$ suffices for the butterfly (2 sinks) but may fail for larger networks. With $K$ sinks, a field of size $q > K$ is needed to guarantee the existence of a valid linear code. Random codes over $\mathbb{F}_2$ fail with probability up to $K/2$ per edge — unacceptable for large networks. In practice, $\mathbb{F}_{2^8} = \text{GF}(256)$ is the standard choice, providing byte-level operations and negligible failure probability.

🔧Engineering Note

Network Coding in Content Distribution Networks

Network coding has found practical application in several areas:

Microsoft Avalanche (2005): a peer-to-peer content distribution system using random linear network coding. Each peer encodes its received blocks using random linear combinations over $\mathbb{F}_{2^8}$ , enabling efficient content delivery without the "rare block" problem of BitTorrent.
COPE (Katti et al., 2006): opportunistic network coding for wireless mesh networks. Routers XOR overheard packets to create coded multicast opportunities, achieving 2–4x throughput gains in testbed experiments.
Coded caching (Chapter 27): the placement phase stores uncoded content, but the delivery phase uses coded multicast — each coded packet is useful to multiple users, dramatically reducing the delivery load.

The main practical challenges are: (a) coding/decoding computational cost ( $O(h^2)$ per packet for Gaussian elimination), (b) coding vector header overhead, and (c) delay due to buffering coded packets before decoding.

Routing vs. Network Coding

Property	Routing (Store-and-Forward)	Network Coding
Unicast capacity	Achieves max-flow	Achieves max-flow
Multicast capacity	May fall short	Achieves max-flow
Intermediate operations	Copy and forward	Linear combination over $\mathbb{F}_q$
Distributed design	Requires global routing tables	Random codes work (RLNC)
Robustness to failures	Requires route recomputation	Inherently robust (random codes adapt)
Complexity per packet	$O(1)$ per hop	$O(h)$ per hop (linear combination)
Decoding complexity	$O(1)$	$O(h^2)$ (Gaussian elimination)
Overhead	Routing header	Coding vector header ( $h \log q$ bits)

Quick Check

For a network with a single source and $K = 20$ sinks, what is the minimum field size needed to guarantee a valid linear network code exists?

$q > 20$ — a field of size at least 23 (or $\mathbb{F}_{32}$ as the nearest power of 2)

$q = 2$ always suffices

$q = K = 20$

Correction:

q > 20

— a field of size at least 23 (or

\mathbb{F}_{32}

as the nearest power of 2)

The Li–Yeung–Cai theorem guarantees a valid linear code over $\mathbb{F}_q$ when $q > K = 20$ . The nearest prime power is $q = 23$ (prime) or $q = 32 = 2^5$ (a practical choice since $\mathbb{F}_{2^5}$ supports byte-friendly arithmetic).

Key Takeaway

Network coding — coding at intermediate nodes — achieves the max-flow min-cut bound for multicast, where routing alone falls short. Linear codes over $\mathbb{F}_q$ with $q > K$ (number of sinks) suffice, and random linear network codes provide a distributed, robust, and near-optimal solution. The butterfly network is the canonical example: a single XOR doubles the multicast rate from 1 to 2.

The Butterfly Network: Routing vs Network Coding

Step-by-step animation of the butterfly network. First shows how routing fails to deliver both bits to both sinks, then how a single XOR at the intermediate node solves the problem.

Linear Network Code Construction

Shows how global coding vectors propagate through a network via linear combinations over a finite field, and how the transfer matrix at each sink determines decodability.

Network Coding: The Butterfly Network and Beyond

Why Routing Is Not Enough

Example: The Butterfly Network

Part (a): Routing fails

Part (b): Network coding achieves rate 2

The XOR Trick: Simple but Profound

Theorem: Network Coding Theorem (Ahlswede–Cai–Li–Yeung, 2000)

Converse: multicast rate cannot exceed any individual min-cut

Achievability: linear network codes

Decodability condition

Historical Note: Ahlswede, Cai, Li, and Yeung (2000): A Paradigm Shift

Definition: Linear Network Code

Network code

Global coding vector

Theorem: Sufficiency of Linear Network Codes (Li–Yeung–Cai, 2003)

Polynomial formulation

Schwartz–Zippel argument

Definition: Random Linear Network Coding

Random Linear Network Coding: Success Probability

Parameters

Example: Constructing a Linear Network Code for the Butterfly

Assign source symbols

Intermediate coding

Verify decodability

Common Mistake: Assuming Routing Achieves Multicast Capacity

Common Mistake: Using Too Small a Field

Network Coding in Content Distribution Networks

Routing vs. Network Coding

Quick Check

Key Takeaway

The Butterfly Network: Routing vs Network Coding

Linear Network Code Construction

Definition:
Linear Network Code

Definition:
Random Linear Network Coding