The Coded Caching Problem

Why Coded Caching?

The explosive growth of video streaming β€” Netflix, YouTube, and their successors β€” has made content delivery the dominant source of network traffic. The standard solution is uncoded caching: place popular files at edge caches close to users, so that frequently requested content can be served locally without burdening the backhaul. This is purely a local gain: each cached file benefits only the user who happens to request it.

In 2014, Maddah-Ali and Niesen discovered something remarkable: by designing the cache placement jointly across all users, the server can create coded multicast opportunities during delivery. The result is a global caching gain that scales with the number of users KK, not just with the cache size MM. A single coded transmission simultaneously satisfies the demands of multiple users, each of whom can decode its desired content using its cached side information.

The point is that caching is not just about storing popular files β€” it is about creating side information that enables efficient coded delivery. This is an information-theoretic insight with deep connections to index coding, broadcast channels, and network information theory.

Definition:

The Maddah-Ali-Niesen Caching Model

A server stores NN files W1,W2,…,WNW_1, W_2, \ldots, W_N, each of size FF bits. There are KK users, each with a cache of size MFMF bits (M∈[0,N]M \in [0, N]). The system operates in two phases:

Phase 1 β€” Placement (off-peak hours): The server fills the users' caches without knowledge of future demands. User kk's cache content is Zk=Ο•k(W1,…,WN)Z_k = \phi_k(W_1, \ldots, W_N), where Ο•k\phi_k is the caching function satisfying ∣Zkβˆ£β‰€MF|Z_k| \le MF.

Phase 2 β€” Delivery (peak hours): Each user kk reveals its demand dk∈[N]d_k \in [N]. The server transmits a coded message X=ψ(W1,…,WN,d1,…,dK)X = \psi(W_1, \ldots, W_N, d_1, \ldots, d_K) of size RFRF bits over a shared error-free link, where RR is the delivery load (in file units). User kk must decode WdkW_{d_k} from (X,Zk)(X, Z_k).

The goal is to minimize the worst-case delivery load: Rβˆ—(M)=min⁑ϕ,ψmax⁑(d1,…,dK)RR^*(M) = \min_{\phi, \psi} \max_{(d_1, \ldots, d_K)} R over all placement and delivery strategies.

The model assumes: (1) the shared link is error-free with unlimited bandwidth (we remove this assumption in Section 27.3), (2) placement is done without knowledge of future demands (this is critical β€” if demands were known during placement, the problem would be trivial), and (3) all files have equal size.

Coded caching

A caching strategy where the cache placement is designed to create multicast coding opportunities during the delivery phase, achieving a global caching gain that scales with the number of users.

Related: Coded multicasting gain, Cache placement

Cache placement

The first phase of a caching protocol, where the server fills user caches without knowledge of future demands. In coded caching, the placement is designed to maximize multicast opportunities during delivery.

Related: Coded caching

Coded Caching: The N=K=2N=K=2 Example

Step-by-step animation of coded caching with 2 files and 2 users. The placement phase splits files and distributes halves, then the delivery phase uses a single XOR multicast to serve both users simultaneously --- a 4x reduction in delivery load.

Example: Coded Caching with N=K=2N=K=2, M=1M=1

A server has N=2N = 2 files A,BA, B. There are K=2K = 2 users, each with cache size M=1M = 1 (one file). Compare uncoded and coded caching.

Definition:

The MAN Scheme

For integer-valued t=KM/N∈{0,1,…,K}t = KM/N \in \{0, 1, \ldots, K\}, the Maddah-Ali-Niesen (MAN) scheme operates as follows:

Placement: Each file WnW_n is split into (Kt)\binom{K}{t} equal-sized sub-files: Wn={Wn,T:TβŠ‚[K],β€…β€Šβˆ£T∣=t}W_n = \{W_{n,\mathcal{T}} : \mathcal{T} \subset [K],\; |\mathcal{T}| = t\} User kk caches all sub-files where k∈Tk \in \mathcal{T}: Zk={Wn,T:k∈T,β€…β€Šn∈[N]}.Z_k = \{W_{n,\mathcal{T}} : k \in \mathcal{T},\; n \in [N]\}. Cache size verification: user kk stores N(Kβˆ’1tβˆ’1)N \binom{K-1}{t-1} sub-files of size F/(Kt)F/\binom{K}{t}, totaling Nβ‹…(Kβˆ’1tβˆ’1)(Kt)β‹…F=tKβ‹…NF=MFN \cdot \frac{\binom{K-1}{t-1}}{\binom{K}{t}} \cdot F = \frac{t}{K} \cdot NF = MF. Correct.

Delivery: For each subset SβŠ‚[K]\mathcal{S} \subset [K] with ∣S∣=t+1|\mathcal{S}| = t+1, the server transmits: ⨁k∈SWdk,Sβˆ–{k}\bigoplus_{k \in \mathcal{S}} W_{d_k, \mathcal{S} \setminus \{k\}}

This is an XOR of t+1t+1 sub-files. Each user k∈Sk \in \mathcal{S} has all sub-files except Wdk,Sβˆ–{k}W_{d_k, \mathcal{S} \setminus \{k\}} (which it needs), so it can decode.

The delivery load is: RMAN(M)=(Kt+1)β‹…1(Kt)=Kβˆ’tt+1=K(1βˆ’M/N)1+KM/N.R_{\text{MAN}}(M) = \binom{K}{t+1} \cdot \frac{1}{\binom{K}{t}} = \frac{K-t}{t+1} = \frac{K(1 - M/N)}{1 + KM/N}.

Theorem: Delivery Load of the MAN Scheme

The MAN coded caching scheme achieves a delivery load of:

RMAN(M)=K(1βˆ’M/N)1+KM/NR_{\text{MAN}}(M) = \frac{K(1 - M/N)}{1 + KM/N}

for M=tN/KM = tN/K with t∈{0,1,…,K}t \in \{0, 1, \ldots, K\}, and by memory sharing (convex combination) for non-integer tt. This can be decomposed as:

RMAN(M)=K(1βˆ’M/N)⏟uncodedΒ cachingΒ loadβ‹…11+KM/N⏟codedΒ multicastingΒ gainR_{\text{MAN}}(M) = \underbrace{K(1 - M/N)}_{\text{uncoded caching load}} \cdot \underbrace{\frac{1}{1 + KM/N}}_{\text{coded multicasting gain}}

The first factor is the load that uncoded caching would achieve (each user's missing fraction times the number of users). The second factor is the coded multicasting gain: t+1=1+KM/Nt + 1 = 1 + KM/N users are simultaneously served by each multicast message.

The coded multicasting gain 1+KM/N1 + KM/N grows linearly with the number of users KK (at fixed M/NM/N). This is the fundamental surprise of coded caching: adding more users does not increase the load proportionally; instead, each new user contributes to more multicast opportunities. With K=100K = 100 users and M/N=0.1M/N = 0.1 (each user caches 10% of the library), the coded gain is 1+10=111 + 10 = 11, meaning each transmission serves 11 users simultaneously.

Memory-Load Tradeoff: Coded vs Uncoded Caching

Compare the delivery load R(M)R(M) of coded caching (MAN scheme) with uncoded caching as a function of cache size MM. Observe how the coded multicasting gain increases with the number of users KK.

Parameters
20
10

Historical Note: The Discovery of Coded Caching

2012-2014

Mohammad Ali Maddah-Ali and Urs Niesen, both at Bell Labs, published their coded caching result in 2014, though the first version appeared on arXiv in 2012. The paper was initially met with skepticism: the idea that caching could provide a gain proportional to KK (not just MM) seemed too good to be true. But the proof was elegantly simple β€” a counting argument showing that the XOR structure works. The result opened a new subfield of information theory, generating hundreds of papers on variations: decentralized placement, heterogeneous caches, online caching, multi-server systems, and the wireless extensions we study in Section 27.3. Caire's group at TU Berlin made fundamental contributions to the wireless extensions, connecting coded caching to MIMO broadcasting and D2D communication.

Common Mistake: The Subpacketization Bottleneck

Mistake:

Assuming the MAN scheme is practical for large KK. With K=100K = 100 users and t=10t = 10, each file must be split into (10010)β‰ˆ1.7Γ—1013\binom{100}{10} \approx 1.7 \times 10^{13} sub-files. For a 1 GB file, each sub-file would be less than 1 bit.

Correction:

The subpacketization (Kt)\binom{K}{t} grows exponentially in KK. This is the main practical limitation of coded caching. For a file of FF bits, we need Fβ‰₯(Kt)F \ge \binom{K}{t} for the scheme to operate. Research on reducing subpacketization includes: placement delivery arrays (Yan et al., 2017), combinatorial designs, and grouping users into clusters of manageable size. With clustering, the gain is 1+Kβ€²M/N1 + K'M/N where Kβ€²β‰ͺKK' \ll K is the cluster size.

Quick Check

In the MAN scheme with N=10N = 10 files, K=5K = 5 users, and M=2M = 2 files per cache (t=KM/N=1t = KM/N = 1), how many multicast transmissions are needed in the delivery phase?

(52)=10\binom{5}{2} = 10 transmissions, each serving 2 users

5 transmissions, one per user

1 transmission serving all 5 users

Coded multicasting gain

The factor 1+KM/N1 + KM/N by which coded caching reduces the delivery load compared to uncoded caching. Represents the number of users simultaneously served by each coded multicast transmission.

Related: Coded caching

Subpacketization

The number of sub-files each file is split into during the placement phase. In the MAN scheme, this is (Kt)\binom{K}{t}, which grows exponentially and is the main practical limitation.

Related: Coded caching

Key Takeaway

Coded caching converts memory into bandwidth. The MAN scheme achieves delivery load R=K(1βˆ’M/N)/(1+KM/N)R = K(1-M/N)/(1+KM/N), which is the uncoded load divided by the coded multicasting gain 1+KM/N1+KM/N. This gain is global (scales with KK) rather than local (dependent only on MM). The price is subpacketization: each file must be split into (Kt)\binom{K}{t} sub-files, which grows exponentially and is the main practical barrier.