Non-Uniform Demand and Zipf Popularity

Real Libraries Have Popularity Skew

All results up to Chapter 12 assumed uniform demand: every file is equally likely to be requested. This simplification is essential for clean analysis but misses a dominant feature of real content libraries: popularity concentration. A few files (newsworthy videos, chart-topping songs) are requested thousands of times more often than the long tail. This Zipf-like structure has been empirically verified in web caching (Breslau 1999), video streaming (Netflix, YouTube), and other CDN workloads.

The key questions:

  1. Under Zipf demand, when does popularity caching beat MAN? MAN is worst-case optimal; popularity caching optimizes expected rate. Which wins in practice?
  2. Can we design a coded-caching scheme that adapts to demand popularity? Yes β€” and the results are order-optimal.

This section sets up the non-uniform demand framework and compares uncoded popularity caching with MAN's worst-case approach.

Definition:

Zipf-Distributed Demand

Under Zipf-distributed demand, the probability that a random user requests file nn is Pnβ€…β€Š=β€…β€Šnβˆ’Ξ±βˆ‘m=1Nmβˆ’Ξ±β€…β€Š=β€…β€Šnβˆ’Ξ±HN,Ξ±,P_n \;=\; \frac{n^{-\alpha}}{\sum_{m=1}^{N} m^{-\alpha}} \;=\; \frac{n^{-\alpha}}{H_{N,\alpha}}, where files are ranked by popularity (n=1n = 1 is most popular) and Ξ±β‰₯0\alpha \geq 0 is the Zipf exponent. Larger Ξ±\alpha means more concentrated demand.

Typical empirical values:

  • Web traffic: α∈[0.6,1.0]\alpha \in [0.6, 1.0] (Breslau 1999).
  • YouTube / video: α∈[0.8,1.2]\alpha \in [0.8, 1.2] (Cha et al. 2007).
  • News articles: α∈[1.0,1.5]\alpha \in [1.0, 1.5].

At Ξ±=0\alpha = 0: uniform demand. At Ξ±=1\alpha = 1: classical Zipf (probability ∝1/n\propto 1/n). At Ξ±>1\alpha > 1: heavy concentration β€” a small tail dominates.

The generalized harmonic number HN,Ξ±=βˆ‘n=1Nnβˆ’Ξ±H_{N,\alpha} = \sum_{n=1}^N n^{-\alpha} is a normalization constant. For Ξ±<1\alpha < 1: HN,α∼N1βˆ’Ξ±H_{N,\alpha} \sim N^{1-\alpha}. For Ξ±>1\alpha > 1: HN,Ξ±β†’ΞΆ(Ξ±)H_{N,\alpha} \to \zeta(\alpha) (Riemann zeta) as Nβ†’βˆžN \to \infty.

Theorem: Expected Rate Under Popularity Caching

Under i.i.d. Zipf-Ξ±\alpha demand with library NN and per-user cache MM containing the top MM popular files (popularity caching), the expected shared-link rate is E[Rpop]β€…β€Š=β€…β€ŠK(1βˆ’HM,Ξ±/HN,Ξ±),\mathbb{E}[R_{\text{pop}}] \;=\; K (1 - H_{M,\alpha}/H_{N,\alpha}), where HK,Ξ±=βˆ‘n=1Knβˆ’Ξ±H_{K,\alpha} = \sum_{n=1}^K n^{-\alpha}.

Each user misses independently with probability 1βˆ’HM,Ξ±/HN,Ξ±1 - H_{M,\alpha}/H_{N,\alpha} (the fraction of demand not in the top-MM files). Under uncoded delivery, each miss costs one unicast. Sum over KK users: expected rate.

Example: Popularity Caching for Video Streaming

For K=100K = 100 users, N=10,000N = 10{,}000 video files, Ξ±=1.0\alpha = 1.0 (typical for YouTube), compute the expected rate at M=100M = 100 (1% cache) under popularity caching.

Expected Rate vs Zipf Exponent

Compares three schemes as Ξ±\alpha varies: (red dashed) popularity caching expected rate decreases sharply with Ξ±\alpha due to hit ratio boost; (blue) MAN worst-case rate is Ξ±\alpha-independent; (purple dashdot) demand-adaptive MAN fits in between. At low Ξ±\alpha (uniform), MAN wins; at high Ξ±\alpha (heavy concentration), popularity caching takes the lead.

Parameters
20
200
0.2

Placement Strategies Comparison

Three placement strategies under Zipf demand: popularity (best expected rate at high Ξ±\alpha), uniform (simple; asymptotically good for MAN gain), and hybrid (70% popular + 30% random) which balances hit rate with coded-multicast opportunity.

Parameters
0.8
1000

Zipf Popularity and Cache Effectiveness

Zipf-distributed demand for three exponents Ξ±=0.3,0.8,1.5\alpha = 0.3, 0.8, 1.5. Larger Ξ±\alpha means steeper popularity concentration. Under Zipf, the effective memory ratio becomes ΞΌeff=ΞΌ1βˆ’Ξ±\mu_{\mathrm{eff}} = \mu^{1-\alpha} β€” amplifying the coded caching gain substantially for moderate-to-high Ξ±\alpha.

The Popularity vs MAN Tradeoff

For system designers, the key intuitions:

  1. MAN is worst-case optimal but demand-agnostic. Delivers guaranteed rate regardless of demand pattern. Good when user demands are unpredictable.
  2. Popularity caching is expectation-optimal but has bad worst-case. If users all request different unpopular files, MAN wins dramatically.
  3. For concentrated demand: popularity caching alone is within a constant factor of expectation-optimal. MAN's worst- case guarantee is often tighter than needed.
  4. For uniform demand: popularity has no advantage; MAN always wins.

In production CDNs, popularity-based (LFU/LRU) schemes dominate because (a) demands are concentrated enough to make popularity work, (b) coding has implementation costs. The CommIT research program has shown that coded schemes (even approximations like PDAs) can reclaim most of MAN's gain while handling Zipf demand β€” see Β§13.4 and Chapter 14.

Common Mistake: Worst-Case vs Expected Confusion

Mistake:

Comparing popularity caching's expected rate with MAN's worst-case rate and concluding "popularity wins."

Correction:

These are different metrics. Worst-case MAN guarantees a rate ≀RMAN\leq R_{\text{MAN}} for any demand. Expected popularity is lower but has no worst-case guarantee. For the same metric:

  • Expected MAN rate: slightly below worst-case (most demand vectors are not adversarial).
  • Worst-case popularity: K(1βˆ’M/N)K(1 - M/N) β€” bad when many users request unpopular files.

A full comparison requires specifying the metric (expected or worst-case) and demand model (uniform or Zipf). Meaningful design decisions require clarity here.

Key Takeaway

Zipf concentration helps caching but does not change the scaling order. At fixed memory ratio M/NM/N, popularity caching's expected rate scales as K(1βˆ’HM,Ξ±/HN,Ξ±)K(1 - H_{M,\alpha}/H_{N,\alpha}). Higher Ξ±\alpha gives smaller rate for the same MM. MAN's coded gain survives β€” Chapter 3's worst-case (1+t)(1 + t) factor is retained β€” but the operational rate in deployed systems depends on both mechanisms working together.