Content Delivery Networks and the Shared-Link Bottleneck
Why Caching Is the Right Question
Every major internet service β Netflix, YouTube, Spotify, TikTok, 4K streaming to millions of simultaneous viewers β faces the same problem. A central content repository must serve a library of files to a population of users, where both and are enormous. The bottleneck is almost never storage at the server; it is the delivery capacity to the users, especially during peak hours.
The industry's answer for the past two decades has been caching: push a subset of the content close to the user β into a set-top box, a mobile device, an edge server, or a content-delivery node β so that when the user requests something, a local copy may already be at hand. The question this book attacks is: given a fixed per-user cache budget , how much of the peak-hour traffic can we eliminate?
The naive answer β "cache the most popular files" β turns out to be provably suboptimal. Information theory lets us do much better, with a gain that grows with the number of users. This chapter sets up the problem; the gain shows up in Chapter 2.
Placement phase
The off-peak phase in which the server populates user caches without knowing future demands. In the MAN scheme, the placement is combinatorial: each file is split into subfiles, and each subfile is cached by exactly specific users.
Related: Delivery phase, MAN subfile
Delivery phase
The peak-hour phase in which each user announces a demand and the server sends a single coded message from which every user decodes its requested file using its cache. The length of this message, in file units, is the delivery rate .
Related: Placement phase, Coded multicasting
Library
The collection of independent files at the server. Each user requests at most one file per delivery round.
Related: Shared-link network
The Content Delivery Bottleneck
The Peak/Off-Peak Asymmetry Is Everything
The split between placement (off-peak) and delivery (peak) is the conceptual heart of the model. Off-peak, the server has essentially unlimited link capacity β users are not watching movies at 3 AM, so bandwidth is cheap. The server can ship whatever content it wants into every user's cache, subject to the local cache capacity . Peak hours are the opposite: every user wants a movie now, and the aggregate demand overwhelms the link.
A good caching scheme must (i) pre-position content without knowing which movies users will want, and (ii) exploit the coincidences between what different users request to send a short peak-hour message that simultaneously satisfies everyone. The striking result of Chapter 2 is that careful placement β combined with coded multicasting in delivery β can reduce the peak-hour load by a factor that grows linearly with the aggregate cache size, not just the per-user cache size.
Placement Is Not Free in Practice
The theoretical model assumes the placement phase is "free" β it happens off-peak and does not count against the delivery rate. In real systems, the placement traffic does cost bandwidth and energy, and the off-peak vs. peak ratio is not infinite. 5G/6G edge caching deployments limit cache refreshes to daily or hourly windows, which changes the relevant optimization: the effective cache size must account for how quickly content rotates.
A more refined metric is the total traffic (placement + delivery) over a content refresh cycle, normalized by the number of delivery rounds. For high-churn libraries (news, user-generated video), this metric can make popularity caching almost as good as coded caching. For relatively stable libraries (movie catalogs, cached software updates), coded caching dominates.
- β’
5G NR MBMS allows multicast placement but rate-limits it to prevent congesting the backhaul
- β’
Content providers report peak-to-average traffic ratios of 3:1 to 10:1
- β’
Cache replacement policies (LRU, LFU) implicitly assume zero-cost placement
Historical Note: From Akamai to Coded Caching
1998β2014The idea of caching content closer to users is old. Akamai's distributed content-delivery network launched in 1998 and served 15β30% of all web traffic within a decade, entirely on the strategy of replicating popular content at thousands of edge nodes. The theoretical framework for what Akamai was doing β popularity-based caching over Zipf-distributed demands β was well understood by the early 2000s.
Yet one could sense that the story was incomplete: Akamai's edge servers collectively held much more than files, and users with different demands were independent. Why couldn't the edge collectively do more than each node individually? The question was sharpened by Maddah-Ali and Niesen in 2013β2014, who showed β with a combinatorial placement and coded multicasting delivery β that the answer is a multiplicative gain of over popularity caching. Twenty years after Akamai, a question the field had grown used to set aside suddenly had a beautiful information-theoretic answer.
Why This Matters: Caching on the Broadcast Channel
The shared-link model is the simplest (error-free) version of a wireless broadcast channel: one transmitter, receivers, common rate . In Chapter 5 we will replace the error-free link with a multi-antenna Gaussian broadcast channel; the number of transmit antennas then provides an additional spatial multiplexing gain on top of the caching gain. The remarkable CommIT result of Lampiris, Caire et al. is that the two gains add: the degrees of freedom satisfy where .