Multi-Source Multi-Sink Networks

Beyond Single-Source Multicast

The network coding theorem gives a clean, single-letter characterization of the multicast capacity for a single source. But real networks have multiple sources sending different information to different destinations. What is the capacity region of a general multi-source multi-sink network?

The short answer: we do not know, in general. The multi-source problem is dramatically harder than the single-source case, and the characterization of the rate region remains one of the major open problems in network information theory. In this section, we survey what is known and what remains open.

Definition:
Multi-Source Multi-Sink Network Coding Problem

A multi-source multi-sink network coding problem on a directed acyclic graph $G = (V, E)$ consists of:

Multiple source nodes $s_1, \ldots, s_L$ , each generating an independent message $W_l$ at rate $R_l$
Multiple sink nodes $t_1, \ldots, t_K$ , where sink $t_k$ demands a subset $\mathcal{D}_k \subseteq \{W_1, \ldots, W_L\}$ of the messages

The rate region is the set of rate tuples $(R_1, \ldots, R_L)$ for which all sinks can reliably recover their demanded messages using network coding at intermediate nodes.

Special cases:

Multiple unicast: each source has a dedicated sink ( $L = K$ , $\mathcal{D}_k = \{W_k\}$ )
Single-source multicast: $L = 1$ , $\mathcal{D}_k = \{W_1\}$ for all $k$ (solved by ACLY)
Multiple multicast: each source multicasts to all sinks

Theorem: Cut-Set Bound for Multi-Source Networks

For a multi-source multi-sink network, the rate region is contained in the set of rate tuples $(R_1, \ldots, R_L)$ satisfying: for every cut $(S, S^c)$ separating a subset of sources from the sinks that demand their messages, $\sum_{l: s_l \in S,\, \exists k: t_k \in S^c, W_l \in \mathcal{D}_k} R_l \leq \text{cap}(S, S^c).$

This is the network generalization of the max-flow min-cut bound.

The cut-set bound says that the total rate of information that needs to cross any cut cannot exceed the cut capacity. For single-source multicast, this bound is tight (the ACLY theorem achieves it). For multi-source problems, the cut-set bound is generally not tight — there are examples where the achievable rate region is strictly smaller than the cut-set bound.

Proof

Proof sketch

Consider any cut $(S, S^c)$ . The messages from sources in $S$ that are demanded by sinks in $S^c$ must traverse the cut. The total rate of these messages is bounded by the information that can cross the cut, which is at most $\text{cap}(S, S^c)$ symbols per time unit.

Formally, this follows from the independence of the source messages and the capacity constraint on each edge. $\blacksquare$

Example: Two-Unicast Network: When Cut-Set Is Not Tight

Consider a network with two sources $s_1, s_2$ and two sinks $t_1, t_2$ , where $t_k$ wants $W_k$ only. The network is:

$s_1 \to a$ (cap 1), $s_2 \to a$ (cap 1), $a \to t_1$ (cap 1), $a \to t_2$ (cap 1).

The cut-set bound allows $R_1 + R_2 \leq 2$ (cut at the sources) and $R_1 \leq 1$ , $R_2 \leq 1$ (individual cuts). Can we achieve $R_1 = R_2 = 1$ ?

Solution

Analysis

Node $a$ receives both $W_1$ and $W_2$ and has two outgoing edges, each of capacity 1. To deliver $W_1$ to $t_1$ and $W_2$ to $t_2$ , node $a$ must send $W_1$ on the edge to $t_1$ and $W_2$ on the edge to $t_2$ .

This works — $(R_1, R_2) = (1, 1)$ is achievable by simple routing.

But now consider a different topology where the paths to $t_1$ and $t_2$ share a bottleneck edge of capacity 1. Then node $a$ can only send one symbol per time unit on the shared link, and it must choose between $W_1$ and $W_2$ (or a coded combination). If $t_1$ wants only $W_1$ and $t_2$ wants only $W_2$ , then network coding (sending $W_1 \oplus W_2$ ) does not help — $t_1$ cannot recover $W_1$ from $W_1 \oplus W_2$ without knowing $W_2$ .

Key insight

Network coding helps for multicast (same data to all sinks) but does not necessarily help for multiple unicast (different data to different sinks). For the two-unicast problem, the capacity region depends on the specific topology, and the cut-set bound is not always achievable.

This is fundamentally different from the single-source case and is one reason why the general multi-source problem remains open.

Open Problems in Network Coding

Several fundamental questions remain unresolved:

Multiple unicast capacity: The capacity region of the two-unicast problem (two source-sink pairs) over a general network is unknown. This is closely related to the interference channel capacity problem (Chapter 17).
Insufficiency of linear codes: For multiple unicast, linear codes over any finite field may not achieve the capacity. The Matroid port construction of Dougherty, Freiling, and Zeger (2005) gave the first example where nonlinear codes strictly outperform linear codes.
Non-Shannon inequalities: The cut-set bound uses only Shannon-type information inequalities ( $H$ , $I$ , submodularity). There exist non-Shannon inequalities (Zhang–Yeung, 1998) that provide tighter outer bounds for some networks. Whether these are ever needed to characterize the capacity is an active research question.
Index coding: The index coding problem (a special case of network coding where there is a single bottleneck link and each receiver has side information) is equivalent to the general network coding problem. Despite its seeming simplicity, the index coding capacity is unknown in general.

Historical Note: The Evolution of Network Coding (2000–2020)

2000–2020

The field of network coding evolved rapidly after the ACLY paper of 2000:

2003: Li, Yeung, and Cai proved that linear codes suffice for single-source multicast, reducing the problem to linear algebra.
2003: Koetter and Médard developed the algebraic framework that unified linear network coding with classical coding theory.
2006: Ho, Médard, Koetter, et al. showed that random linear network coding works with high probability, eliminating the need for centralized code design.
2005: Dougherty, Freiling, and Zeger showed that linear codes are insufficient for general multi-source problems — a surprising negative result.
2008: Katti et al. (COPE) demonstrated practical gains from network coding in wireless mesh networks, inspiring the "physical-layer network coding" direction.
2010s: The connection between network coding and coded caching (Maddah-Ali and Niesen, 2014) opened a new chapter, with coded multicast as the bridge between the two fields.

Today, network coding is a mature theory with practical impact in content distribution, wireless relaying, and distributed storage.

Definition:
Index Coding

The index coding problem is a special case of network coding with a single broadcast link: a server has $n$ messages $W_1, \ldots, W_n$ and broadcasts to $n$ receivers over a shared noiseless link. Receiver $k$ wants $W_k$ and has a subset $\mathcal{A}_k \subset \{W_1, \ldots, W_n\} \setminus \{W_k\}$ as side information.

The optimal broadcast rate $\beta$ is the minimum number of transmissions (from $\mathbb{F}_q$ ) needed to satisfy all receivers. Finding $\beta$ is NP-hard in general.

Index coding is equivalent to network coding: any network coding problem can be reduced to an index coding problem and vice versa (El Rouayheb and Sprintson, 2010). This equivalence means that understanding index coding is both necessary and sufficient for understanding general network coding.

The side information graph — where there is a directed edge from receiver $i$ to receiver $j$ if receiver $j$ has $W_i$ as side information — encodes the structure of the problem. The clique cover number of this graph gives a simple (but generally loose) upper bound on $\beta$ .

Index coding

A network coding problem where a single server broadcasts coded messages to multiple receivers, each with different side information. The goal is to minimize the number of transmissions. Equivalent to the general network coding problem.

Related: Network code

Multicast capacity

The maximum rate at which a single source can deliver the same information to all sinks in a network, equal to $\min_k \text{mincut}(s, t_k)$ and achievable by linear network coding.

Butterfly Network: Routing vs. Network Coding

Visualize the butterfly network and compare routing-only solutions with network coding. Toggle between different routing strategies and the XOR coding strategy to see how network coding achieves the multicast capacity where routing falls short.

Parameters

Strategy

Routing strategy or network coding

q

(field size)2

Network Coding Throughput Gain vs. Network Size

Compare the multicast throughput of routing vs. network coding for random networks of varying size. The gain of network coding grows with network density and the number of sinks.

Parameters

|V|

(number of nodes)15

K

(number of sinks)5

Edge probability0.3

Probability of each edge existing in the random graph

Quick Check

For the two-unicast problem (two sources, two sinks, each wanting a different message), does network coding always achieve the cut-set bound?

No — the cut-set bound is not always tight for multiple unicast, and network coding may not close the gap

Yes — the ACLY theorem applies to any source-sink configuration

Yes, but only over sufficiently large finite fields

Correction:

No — the cut-set bound is not always tight for multiple unicast, and network coding may not close the gap

Unlike single-source multicast, the cut-set bound for multiple unicast is not always achievable. Network coding (even nonlinear) cannot always achieve the cut-set bound for this problem. The gap between the achievable region and the cut-set bound depends on the network topology and is related to the open problem of interference channel capacity.

🔧Engineering Note

Computational Complexity of Random Linear Network Coding

The main computational bottleneck of RLNC is decoding at the sinks, which requires Gaussian elimination over $\mathbb{F}_q$ on an $h \times h$ matrix:

Encoding at each node: $O(h)$ operations per outgoing symbol (one linear combination of $h$ incoming symbols over $\mathbb{F}_q$ )
Decoding at each sink: $O(h^2)$ operations per source symbol (Gaussian elimination on an $h \times h$ matrix, amortized over $h$ symbols)
Header overhead: $h \lceil\log_2 q\rceil$ bits per packet

For practical parameters ( $h = 10$ , $q = 256$ ), encoding adds ~10 XOR-like operations per packet, decoding requires a $10 \times 10$ GF(256) matrix inversion, and the header is 10 bytes. These costs are acceptable for most applications.

For real-time streaming, the main concern is decoding delay: the sink must collect $h$ linearly independent coded packets before decoding any source symbol. This adds a delay of $h$ packet intervals. Sliding-window RLNC reduces this delay by allowing partial decoding as packets arrive.

Practical Constraints

•
Decoding delay: $h$ packet intervals for block RLNC
•
GF(256) arithmetic: supported by hardware AES-NI instructions on modern CPUs
•
Memory: sink buffers $O(h^2)$ field elements for Gaussian elimination

Common Mistake: Network Coding Helps Unicast

Mistake:

Claiming that network coding can increase the throughput of unicast (single source, single sink) over a noiseless network.

Correction:

For unicast over noiseless networks, routing achieves the max-flow (Ford–Fulkerson theorem). Network coding does not increase the unicast throughput. Its advantage appears only for multicast and multi-session problems. Over noisy networks (Chapters 22–23), network coding can help even for unicast, because coding at relays can provide coding gain against noise.

Why This Matters: Physical-Layer Network Coding in Wireless Networks

In wireless networks, simultaneous transmissions naturally superimpose at receivers — a property traditionally viewed as interference. Physical-layer network coding (PLNC) turns this "bug" into a "feature": instead of avoiding simultaneous transmissions, PLNC lets them interfere constructively.

In the simplest two-way relay scenario, nodes A and B exchange messages through a relay R. With traditional routing, this requires 4 time slots (A→R, R→B, B→R, R→A). With network coding at R (XOR and broadcast), it takes 3 slots. With PLNC (A and B transmit simultaneously, R decodes the XOR from the superimposed signals), it takes only 2 slots — a 2x improvement over routing.

This principle underlies the compute-and-forward framework (Chapter 22) and is one of the bridges between network coding and physical-layer design.

Key Takeaway

The multi-source multi-sink network coding problem remains largely open. The cut-set bound provides an outer bound but is not always tight. Linear codes are insufficient for some multi-source problems, and the capacity region of even the two-unicast problem is unknown. The connection to index coding provides a bridge to combinatorial optimization, while the link to interference channels connects to the deep open problems of multiuser information theory.

Network Coding: The Butterfly Network and Beyond Chapter Summary

Multi-Source Multi-Sink Networks

Beyond Single-Source Multicast

Definition: Multi-Source Multi-Sink Network Coding Problem

Theorem: Cut-Set Bound for Multi-Source Networks

Proof sketch

Example: Two-Unicast Network: When Cut-Set Is Not Tight

Analysis

Key insight

Open Problems in Network Coding

Historical Note: The Evolution of Network Coding (2000–2020)

Definition: Index Coding

Index coding

Multicast capacity

Butterfly Network: Routing vs. Network Coding

Parameters

Network Coding Throughput Gain vs. Network Size

Parameters

Quick Check

Computational Complexity of Random Linear Network Coding

Common Mistake: Network Coding Helps Unicast

Why This Matters: Physical-Layer Network Coding in Wireless Networks

Key Takeaway

Definition:
Multi-Source Multi-Sink Network Coding Problem

Definition:
Index Coding