Ferkans — Interactive Telecom Tutor

Why Index Coding?

After Chapters 2–3 we know what the MAN scheme does: combinatorial placement, XOR delivery, rate $R = K(1 - \mu)/(1 + K\mu)$ . But a natural follow-up question is: is this the best XOR delivery given the MAN placement? Or more ambitiously, given any placement, what is the minimum-length broadcast that satisfies all users?

This question has a name — the index coding problem — and it predates coded caching by a decade. In this chapter we formalize it, connect it rigorously to coded caching, and explain why the MAN scheme admits a surprisingly clean index-coding solution despite the general problem being NP-hard.

Definition:
Index Coding Problem

An index coding instance is a tuple $\mathcal{I} = (K, N, (\mathcal{S}_k)_{k=1}^K, (d_k)_{k=1}^K)$ where:

$N$ is the number of source messages $W_1, \ldots, W_N$ , each of size $F$ bits.
$K$ is the number of receivers. Receiver $k$ has a side information set $\mathcal{S}_k \subseteq [N]$ , meaning it already possesses $\{W_n : n \in \mathcal{S}_k\}$ .
Receiver $k$ demands file $W_{d_k}$ with $d_k \notin \mathcal{S}_k$ .

A code for $\mathcal{I}$ consists of an encoder $X: \{W_1, \ldots, W_N\} \to \mathbb{F}_2^{\ell F}$ and decoders $g_k: (X, \{W_n\}_{n \in \mathcal{S}_k}) \to W_{d_k}$ for every $k$ . The rate of the code is $R = \ell$ in file units.

The optimal rate is $\beta(\mathcal{I}) \;=\; \inf\{R : \text{a valid code of rate } R \text{ exists}\}.$

The index coding problem was introduced by Birk and Kol (1998) to model satellite broadcast systems where receivers have heterogeneous prior-held content. The information-theoretic treatment (Bar-Yossef et al. 2011) connected it to graph theory — a connection we exploit in §4.2.

,

Example: The Butterfly Index Coding Instance

Two files $W_1, W_2$ , two receivers. Receiver 1 has $\mathcal{S}_1 = \{2\}$ , wants $W_1$ . Receiver 2 has $\mathcal{S}_2 = \{1\}$ , wants $W_2$ . What is the optimal rate $\beta(\mathcal{I})$ ?

Solution

Uncoded lower bound

Each receiver is missing one file; without coding the server must send at least one file per receiver. Uncoded rate: $R = 2$ .

Coded scheme

Send $X = W_1 \oplus W_2$ (one file's worth of bits). Receiver 1 computes $W_1 = X \oplus W_2$ using its side information $W_2$ . Receiver 2 computes $W_2 = X \oplus W_1$ . Rate: $R = 1$ .

Optimality

$\beta(\mathcal{I}) \geq 1$ because each receiver must obtain $F$ new bits. Hence $\beta(\mathcal{I}) = 1$ . The XOR saves a factor of 2 — this tiny instance foreshadows the $(1 + t)$ -factor coded caching gain.

Key Takeaway

The butterfly instance is the atomic unit of coded multicasting. Every XOR in the MAN delivery is a butterfly: two (or more) receivers, each with one file as side information and wanting the other. The MAN scheme exploits $\binom{K}{t+1}$ such butterflies simultaneously — one per $(t+1)$ -subset.

The Butterfly Index Coding Instance

Two receivers, two files, each holds the other's demand as side information. A single XOR message

X = W_1 \oplus W_2

simultaneously satisfies both. Rate:

R = 1

file (vs. uncoded

R = 2

). This is the atomic unit of coded multicasting — every MAN XOR message is a generalized butterfly.

Index coding

A network-coding problem where a single sender has $N$ messages to distribute to $K$ receivers via a broadcast link. Each receiver has a priori knowledge of a subset of messages (side information) and demands one specific message. The objective is to minimize the broadcast length.

Conflict graph

A graph $G$ whose vertices correspond to the $K$ demand pairs $(d_k, \mathcal{S}_k)$ and whose edges encode when two demand pairs conflict — i.e., when a single coded transmission cannot simultaneously satisfy both receivers. The chromatic number of $G$ upper bounds the optimal index coding rate.

Related: Index coding

Side information

Prior knowledge at a receiver, typically the set of messages already held. In coded caching, side information is the cache $\mathcal{Z}_{k}$ . In general index coding, side information can be any subset of the library.

Related: Index coding

Coded Caching Is an Index Coding Problem

Fix a placement scheme for coded caching. The delivery phase, with a given demand vector $\mathbf{d}$ , is exactly an index coding instance:

The $N$ source messages are the subfiles $\{W_{n,\mathcal{S}}\}$ .
Each user $k$ is a receiver with side information $\mathcal{Z}_{k}$ and demand $W_{d_k}$ (which requires the user to collect all missing subfiles $W_{d_k, \mathcal{T}}$ for $\mathcal{T} \not\ni k$ ).
The broadcast rate $R$ is exactly the index coding rate.

The MAN scheme is thus a particular index coding scheme for a particular family of index coding instances — those induced by the combinatorial MAN placement. A general index coding theorem for arbitrary placements would immediately yield a general coded caching rate formula. This is why the two problems are intertwined.

Historical Note: From Satellite Broadcasts to Index Coding

1998–2014

Birk and Kol introduced the index coding problem in 1998 to model "Informed-Source Coding On Demand" (ISCOD) in satellite broadcast systems. Their motivating example was a satellite distributing encrypted pay-per-view video to subscribers, where each subscriber's receiver had cached previous episodes (side information) and wanted the latest episode (demand).

The information-theoretic formalization came a decade later, when Bar-Yossef, Birk, Jayram, and Kol (2011) characterized the linear index coding rate as the minrank of the side-information graph — a matrix-theoretic invariant that is in general NP-hard to compute. Maleki, Cadambe, and Jafar (2014) connected index coding to interference alignment, showing that the asymptotic rate equals the fractional chromatic number for certain families.

Coded caching's arrival in 2014 brought a fresh perspective: one could design the side information (cache contents) to ensure that the resulting index coding instance is easy. This is the conceptual leap that lets coded caching achieve a clean rate formula where general index coding cannot.

⚠️Engineering Note

General Index Coding Is Not Deployed — Coded Caching Is

A student reading the index coding literature might wonder why industry does not deploy it widely. The answer lies in the complexity of the problem:

NP-hardness. Finding the optimal linear index code is equivalent to computing the minrank of a graph — NP-hard in general.
Instance-specific design. Each demand vector is a different index coding instance. A system that re-solves the index coding problem on every peak-hour delivery round is infeasible.
Coded caching sidesteps both. The MAN placement pre-designs the side information so that the resulting index coding instance is structurally simple — the chromatic number equals the MAN rate exactly. The delivery algorithm is a constant-time lookup: for every $(t+1)$ -subset, send one XOR.

In short: coded caching is a specialization of index coding that is simultaneously tractable and close to optimal. This is why it is a practical framework, not just a theoretical one.

Practical Constraints

•
General index coding minrank is NP-hard (Peeters 1996 complexity)
•
MAN instances have known optimal linear code of rate $(K-t)/(t+1)$
•
Structured (combinatorial) side information enables efficient delivery

The Index Coding Problem