Content Delivery Networks and the Shared-Link Bottleneck

Why Caching Is the Right Question

Every major internet service β€” Netflix, YouTube, Spotify, TikTok, 4K streaming to millions of simultaneous viewers β€” faces the same problem. A central content repository must serve a library of files W1,…,WNW_1, \ldots, W_N to a population of KK users, where both KK and NN are enormous. The bottleneck is almost never storage at the server; it is the delivery capacity to the users, especially during peak hours.

The industry's answer for the past two decades has been caching: push a subset of the content close to the user β€” into a set-top box, a mobile device, an edge server, or a content-delivery node β€” so that when the user requests something, a local copy may already be at hand. The question this book attacks is: given a fixed per-user cache budget MM, how much of the peak-hour traffic can we eliminate?

The naive answer β€” "cache the most popular files" β€” turns out to be provably suboptimal. Information theory lets us do much better, with a gain that grows with the number of users. This chapter sets up the problem; the gain shows up in Chapter 2.

Placement phase

The off-peak phase in which the server populates user caches without knowing future demands. In the MAN scheme, the placement is combinatorial: each file is split into (Kt)\binom{K}{t} subfiles, and each subfile is cached by exactly tt specific users.

Related: Delivery phase, MAN subfile

Delivery phase

The peak-hour phase in which each user kk announces a demand dkd_k and the server sends a single coded message XdX_{\mathbf{d}} from which every user decodes its requested file using its cache. The length of this message, in file units, is the delivery rate RR.

Related: Placement phase, Coded multicasting

Library

The collection W={W1,…,WN}\mathcal{W} = \{W_1, \ldots, W_N\} of NN independent files at the server. Each user requests at most one file per delivery round.

Related: Shared-link network

The Content Delivery Bottleneck

Schematic of the shared-link network. The server on the left holds library W1,…,WNW_1, \ldots, W_N; the shared link is the red bottleneck; the KK users on the right have caches Zk\mathcal{Z}_k. Any two phases of the scheme β€” placement and delivery β€” must together deliver the files requested by the demand vector d\mathbf{d}.

The Peak/Off-Peak Asymmetry Is Everything

The split between placement (off-peak) and delivery (peak) is the conceptual heart of the model. Off-peak, the server has essentially unlimited link capacity β€” users are not watching movies at 3 AM, so bandwidth is cheap. The server can ship whatever content it wants into every user's cache, subject to the local cache capacity MM. Peak hours are the opposite: every user wants a movie now, and the aggregate demand overwhelms the link.

A good caching scheme must (i) pre-position content without knowing which movies users will want, and (ii) exploit the coincidences between what different users request to send a short peak-hour message that simultaneously satisfies everyone. The striking result of Chapter 2 is that careful placement β€” combined with coded multicasting in delivery β€” can reduce the peak-hour load by a factor that grows linearly with the aggregate cache size, not just the per-user cache size.

⚠️Engineering Note

Placement Is Not Free in Practice

The theoretical model assumes the placement phase is "free" β€” it happens off-peak and does not count against the delivery rate. In real systems, the placement traffic does cost bandwidth and energy, and the off-peak vs. peak ratio is not infinite. 5G/6G edge caching deployments limit cache refreshes to daily or hourly windows, which changes the relevant optimization: the effective cache size must account for how quickly content rotates.

A more refined metric is the total traffic (placement + delivery) over a content refresh cycle, normalized by the number of delivery rounds. For high-churn libraries (news, user-generated video), this metric can make popularity caching almost as good as coded caching. For relatively stable libraries (movie catalogs, cached software updates), coded caching dominates.

Practical Constraints
  • β€’

    5G NR MBMS allows multicast placement but rate-limits it to prevent congesting the backhaul

  • β€’

    Content providers report peak-to-average traffic ratios of 3:1 to 10:1

  • β€’

    Cache replacement policies (LRU, LFU) implicitly assume zero-cost placement

Historical Note: From Akamai to Coded Caching

1998–2014

The idea of caching content closer to users is old. Akamai's distributed content-delivery network launched in 1998 and served 15–30% of all web traffic within a decade, entirely on the strategy of replicating popular content at thousands of edge nodes. The theoretical framework for what Akamai was doing β€” popularity-based caching over Zipf-distributed demands β€” was well understood by the early 2000s.

Yet one could sense that the story was incomplete: Akamai's edge servers collectively held much more than NN files, and users with different demands were independent. Why couldn't the edge collectively do more than each node individually? The question was sharpened by Maddah-Ali and Niesen in 2013–2014, who showed β€” with a combinatorial placement and coded multicasting delivery β€” that the answer is a multiplicative gain of 1+KM/N1 + KM/N over popularity caching. Twenty years after Akamai, a question the field had grown used to set aside suddenly had a beautiful information-theoretic answer.

Why This Matters: Caching on the Broadcast Channel

The shared-link model is the simplest (error-free) version of a wireless broadcast channel: one transmitter, KK receivers, common rate RR. In Chapter 5 we will replace the error-free link with a multi-antenna Gaussian broadcast channel; the number of transmit antennas LL then provides an additional spatial multiplexing gain on top of the caching gain. The remarkable CommIT result of Lampiris, Caire et al. is that the two gains add: the degrees of freedom satisfy DoF=t+L\mathrm{DoF} = t + L where t=Kβ‹…M/Nt = K \cdot M /N.