Content Delivery Networks and the Shared-Link Bottleneck

Why Caching Is the Right Question

Every major internet service — Netflix, YouTube, Spotify, TikTok, 4K streaming to millions of simultaneous viewers — faces the same problem. A central content repository must serve a library of files $W_1, \ldots, W_N$ to a population of $K$ users, where both $K$ and $N$ are enormous. The bottleneck is almost never storage at the server; it is the delivery capacity to the users, especially during peak hours.

The industry's answer for the past two decades has been caching: push a subset of the content close to the user — into a set-top box, a mobile device, an edge server, or a content-delivery node — so that when the user requests something, a local copy may already be at hand. The question this book attacks is: given a fixed per-user cache budget $M$ , how much of the peak-hour traffic can we eliminate?

The naive answer — "cache the most popular files" — turns out to be provably suboptimal. Information theory lets us do much better, with a gain that grows with the number of users. This chapter sets up the problem; the gain shows up in Chapter 2.

Definition:
Shared-Link Coded Caching Network

The shared-link coded caching network consists of:

A server holding a library $\mathcal{W} = \{W_1, \ldots, W_{N}\}$ of $N$ files, each of size $F$ bits, modeled as independent random variables uniformly distributed over $[2^F]$ .
$K$ users connected to the server through a common, error-free, noiseless broadcast link — the shared link.
Each user is equipped with an isolated cache of size $M F$ bits (i.e., $M$ files' worth of storage).

The system operates in two phases:

Placement phase (off-peak). Before the users know what they will request, the server populates each user's cache $\mathcal{Z}_{k}$ . The placement is a function of the library: $\mathcal{Z}_{k} = \phi_k(W_1, \ldots, W_N)$ with $\mathrm{H}(\mathcal{Z}_{k}) \leq M F$ .
Delivery phase (peak hours). Each user reveals a demand $d_k \in [N]$ , collected into the demand vector $\mathbf{d} = (d_1, \ldots, d_K)$ . The server transmits a single message $X_{\mathbf{d}}$ over the shared link, of total length $R(\mathbf{d}) \cdot F$ bits, such that every user can recover $W_{d_k}$ from $X_{\mathbf{d}}$ and its cache $\mathcal{Z}_{k}$ : $\mathrm{H}\!\left(W_{d_k} \mid X_{\mathbf{d}}, \mathcal{Z}_{k}\right) = 0.$

The delivery rate $R(\mathbf{d})$ measures the load on the shared link, in file units. The worst-case rate is $R = \max_{\mathbf{d} \in [N]^K} R(\mathbf{d})$ .

The key asymmetry is that placement happens without knowledge of demands. The server must pre-position content robustly, so that however the users later request, a short delivery message suffices.

Shared-link network

A one-server, $K$ -user model in which a single error-free broadcast link connects the server to all users. The capacity bottleneck is the link; the per-user caches substitute for link capacity.

Placement phase

The off-peak phase in which the server populates user caches without knowing future demands. In the MAN scheme, the placement is combinatorial: each file is split into $\binom{K}{t}$ subfiles, and each subfile is cached by exactly $t$ specific users.

Related: Delivery phase, MAN subfile

Delivery phase

The peak-hour phase in which each user $k$ announces a demand $d_k$ and the server sends a single coded message $X_{\mathbf{d}}$ from which every user decodes its requested file using its cache. The length of this message, in file units, is the delivery rate $R$ .

Library

The collection $\mathcal{W} = \{W_1, \ldots, W_N\}$ of $N$ independent files at the server. Each user requests at most one file per delivery round.

Related: Shared-link network

Shared-link coded caching network topology — Server with library $W_1, \ldots, W_N$ ; shared broadcast link of capacity $R F$ bits per delivery round; $K$ users each with a cache of size $M F$ bits. The link is the bottleneck; caches absorb load.

The Content Delivery Bottleneck

Schematic of the shared-link network. The server on the left holds library

W_1, \ldots, W_N

; the shared link is the red bottleneck; the

K

users on the right have caches

\mathcal{Z}_k

. Any two phases of the scheme — placement and delivery — must together deliver the files requested by the demand vector

\mathbf{d}

The Peak/Off-Peak Asymmetry Is Everything

The split between placement (off-peak) and delivery (peak) is the conceptual heart of the model. Off-peak, the server has essentially unlimited link capacity — users are not watching movies at 3 AM, so bandwidth is cheap. The server can ship whatever content it wants into every user's cache, subject to the local cache capacity $M$ . Peak hours are the opposite: every user wants a movie now, and the aggregate demand overwhelms the link.

A good caching scheme must (i) pre-position content without knowing which movies users will want, and (ii) exploit the coincidences between what different users request to send a short peak-hour message that simultaneously satisfies everyone. The striking result of Chapter 2 is that careful placement — combined with coded multicasting in delivery — can reduce the peak-hour load by a factor that grows linearly with the aggregate cache size, not just the per-user cache size.

⚠️Engineering Note

Placement Is Not Free in Practice

The theoretical model assumes the placement phase is "free" — it happens off-peak and does not count against the delivery rate. In real systems, the placement traffic does cost bandwidth and energy, and the off-peak vs. peak ratio is not infinite. 5G/6G edge caching deployments limit cache refreshes to daily or hourly windows, which changes the relevant optimization: the effective cache size must account for how quickly content rotates.

A more refined metric is the total traffic (placement + delivery) over a content refresh cycle, normalized by the number of delivery rounds. For high-churn libraries (news, user-generated video), this metric can make popularity caching almost as good as coded caching. For relatively stable libraries (movie catalogs, cached software updates), coded caching dominates.

Practical Constraints

•
5G NR MBMS allows multicast placement but rate-limits it to prevent congesting the backhaul
•
Content providers report peak-to-average traffic ratios of 3:1 to 10:1
•
Cache replacement policies (LRU, LFU) implicitly assume zero-cost placement

Historical Note: From Akamai to Coded Caching

1998–2014

The idea of caching content closer to users is old. Akamai's distributed content-delivery network launched in 1998 and served 15–30% of all web traffic within a decade, entirely on the strategy of replicating popular content at thousands of edge nodes. The theoretical framework for what Akamai was doing — popularity-based caching over Zipf-distributed demands — was well understood by the early 2000s.

Yet one could sense that the story was incomplete: Akamai's edge servers collectively held much more than $N$ files, and users with different demands were independent. Why couldn't the edge collectively do more than each node individually? The question was sharpened by Maddah-Ali and Niesen in 2013–2014, who showed — with a combinatorial placement and coded multicasting delivery — that the answer is a multiplicative gain of $1 + KM/N$ over popularity caching. Twenty years after Akamai, a question the field had grown used to set aside suddenly had a beautiful information-theoretic answer.

Why This Matters: Caching on the Broadcast Channel

The shared-link model is the simplest (error-free) version of a wireless broadcast channel: one transmitter, $K$ receivers, common rate $R$ . In Chapter 5 we will replace the error-free link with a multi-antenna Gaussian broadcast channel; the number of transmit antennas $L$ then provides an additional spatial multiplexing gain on top of the caching gain. The remarkable CommIT result of Lampiris, Caire et al. is that the two gains add: the degrees of freedom satisfy $\mathrm{DoF} = t + L$ where $t = K \cdot M /N$ .

Prerequisites & Notation Popularity-Based (Uncoded) Caching