Non-Uniform Demand and Zipf Popularity
Real Libraries Have Popularity Skew
All results up to Chapter 12 assumed uniform demand: every file is equally likely to be requested. This simplification is essential for clean analysis but misses a dominant feature of real content libraries: popularity concentration. A few files (newsworthy videos, chart-topping songs) are requested thousands of times more often than the long tail. This Zipf-like structure has been empirically verified in web caching (Breslau 1999), video streaming (Netflix, YouTube), and other CDN workloads.
The key questions:
- Under Zipf demand, when does popularity caching beat MAN? MAN is worst-case optimal; popularity caching optimizes expected rate. Which wins in practice?
- Can we design a coded-caching scheme that adapts to demand popularity? Yes β and the results are order-optimal.
This section sets up the non-uniform demand framework and compares uncoded popularity caching with MAN's worst-case approach.
Definition: Zipf-Distributed Demand
Zipf-Distributed Demand
Under Zipf-distributed demand, the probability that a random user requests file is where files are ranked by popularity ( is most popular) and is the Zipf exponent. Larger means more concentrated demand.
Typical empirical values:
- Web traffic: (Breslau 1999).
- YouTube / video: (Cha et al. 2007).
- News articles: .
At : uniform demand. At : classical Zipf (probability ). At : heavy concentration β a small tail dominates.
The generalized harmonic number is a normalization constant. For : . For : (Riemann zeta) as .
Theorem: Expected Rate Under Popularity Caching
Under i.i.d. Zipf- demand with library and per-user cache containing the top popular files (popularity caching), the expected shared-link rate is where .
Each user misses independently with probability (the fraction of demand not in the top- files). Under uncoded delivery, each miss costs one unicast. Sum over users: expected rate.
Per-user miss probability
User has the top files cached. Its demand is Zipf-distributed; it's a hit iff . .
Expected rate
Users are independent; each miss costs one unicast. Expected total unicasts: .
Uniform limit
As : . Expected rate reduces to MAN uncoded worst-case β consistent.
Example: Popularity Caching for Video Streaming
For users, video files, (typical for YouTube), compute the expected rate at (1% cache) under popularity caching.
Compute harmonic numbers
. . Hit probability: .
Expected rate
files/round. Without cache: 100 files/round β popularity caching halves the load.
MAN worst-case
MAN at : files/round.
Comparison
Popularity caching (47) is better than MAN worst-case (49.5) in expected rate for this . At higher (more concentrated), popularity wins by more. At lower : MAN wins.
Which to use?
Pure popularity caching wins in expectation for concentrated demand; MAN wins in worst case (and uniformly average-case at low ). Hybrid schemes (Β§13.4) get the best of both.
Expected Rate vs Zipf Exponent
Compares three schemes as varies: (red dashed) popularity caching expected rate decreases sharply with due to hit ratio boost; (blue) MAN worst-case rate is -independent; (purple dashdot) demand-adaptive MAN fits in between. At low (uniform), MAN wins; at high (heavy concentration), popularity caching takes the lead.
Parameters
Placement Strategies Comparison
Three placement strategies under Zipf demand: popularity (best expected rate at high ), uniform (simple; asymptotically good for MAN gain), and hybrid (70% popular + 30% random) which balances hit rate with coded-multicast opportunity.
Parameters
Zipf Popularity and Cache Effectiveness
The Popularity vs MAN Tradeoff
For system designers, the key intuitions:
- MAN is worst-case optimal but demand-agnostic. Delivers guaranteed rate regardless of demand pattern. Good when user demands are unpredictable.
- Popularity caching is expectation-optimal but has bad worst-case. If users all request different unpopular files, MAN wins dramatically.
- For concentrated demand: popularity caching alone is within a constant factor of expectation-optimal. MAN's worst- case guarantee is often tighter than needed.
- For uniform demand: popularity has no advantage; MAN always wins.
In production CDNs, popularity-based (LFU/LRU) schemes dominate because (a) demands are concentrated enough to make popularity work, (b) coding has implementation costs. The CommIT research program has shown that coded schemes (even approximations like PDAs) can reclaim most of MAN's gain while handling Zipf demand β see Β§13.4 and Chapter 14.
Common Mistake: Worst-Case vs Expected Confusion
Mistake:
Comparing popularity caching's expected rate with MAN's worst-case rate and concluding "popularity wins."
Correction:
These are different metrics. Worst-case MAN guarantees a rate for any demand. Expected popularity is lower but has no worst-case guarantee. For the same metric:
- Expected MAN rate: slightly below worst-case (most demand vectors are not adversarial).
- Worst-case popularity: β bad when many users request unpopular files.
A full comparison requires specifying the metric (expected or worst-case) and demand model (uniform or Zipf). Meaningful design decisions require clarity here.
Key Takeaway
Zipf concentration helps caching but does not change the scaling order. At fixed memory ratio , popularity caching's expected rate scales as . Higher gives smaller rate for the same . MAN's coded gain survives β Chapter 3's worst-case factor is retained β but the operational rate in deployed systems depends on both mechanisms working together.