Ferkans — Interactive Telecom Tutor

Real Libraries Have Popularity Skew

All results up to Chapter 12 assumed uniform demand: every file is equally likely to be requested. This simplification is essential for clean analysis but misses a dominant feature of real content libraries: popularity concentration. A few files (newsworthy videos, chart-topping songs) are requested thousands of times more often than the long tail. This Zipf-like structure has been empirically verified in web caching (Breslau 1999), video streaming (Netflix, YouTube), and other CDN workloads.

The key questions:

Under Zipf demand, when does popularity caching beat MAN? MAN is worst-case optimal; popularity caching optimizes expected rate. Which wins in practice?
Can we design a coded-caching scheme that adapts to demand popularity? Yes — and the results are order-optimal.

This section sets up the non-uniform demand framework and compares uncoded popularity caching with MAN's worst-case approach.

Definition:
Zipf-Distributed Demand

Under Zipf-distributed demand, the probability that a random user requests file $n$ is $P_n \;=\; \frac{n^{-\alpha}}{\sum_{m=1}^{N} m^{-\alpha}} \;=\; \frac{n^{-\alpha}}{H_{N,\alpha}},$ where files are ranked by popularity ( $n = 1$ is most popular) and $\alpha \geq 0$ is the Zipf exponent. Larger $\alpha$ means more concentrated demand.

Typical empirical values:

Web traffic: $\alpha \in [0.6, 1.0]$ (Breslau 1999).
YouTube / video: $\alpha \in [0.8, 1.2]$ (Cha et al. 2007).
News articles: $\alpha \in [1.0, 1.5]$ .

At $\alpha = 0$ : uniform demand. At $\alpha = 1$ : classical Zipf (probability $\propto 1/n$ ). At $\alpha > 1$ : heavy concentration — a small tail dominates.

The generalized harmonic number $H_{N,\alpha} = \sum_{n=1}^N n^{-\alpha}$ is a normalization constant. For $\alpha < 1$ : $H_{N,\alpha} \sim N^{1-\alpha}$ . For $\alpha > 1$ : $H_{N,\alpha} \to \zeta(\alpha)$ (Riemann zeta) as $N \to \infty$ .

Theorem: Expected Rate Under Popularity Caching

Under i.i.d. Zipf- $\alpha$ demand with library $N$ and per-user cache $M$ containing the top $M$ popular files (popularity caching), the expected shared-link rate is $\mathbb{E}[R_{\text{pop}}] \;=\; K (1 - H_{M,\alpha}/H_{N,\alpha}),$ where $H_{K,\alpha} = \sum_{n=1}^K n^{-\alpha}$ .

Each user misses independently with probability $1 - H_{M,\alpha}/H_{N,\alpha}$ (the fraction of demand not in the top- $M$ files). Under uncoded delivery, each miss costs one unicast. Sum over $K$ users: expected rate.

Proof

Per-user miss probability

User $k$ has the top $M$ files cached. Its demand $d_k$ is Zipf-distributed; it's a hit iff $d_k \leq M$ . $\Pr(d_k \leq M) = \sum_{n=1}^M P_n = H_{M,\alpha}/H_{N,\alpha}$ .

Expected rate

Users are independent; each miss costs one unicast. Expected total unicasts: $K \Pr(\text{miss}) = K(1 - H_{M,\alpha}/H_{N,\alpha})$ . $\blacksquare$

Uniform limit

As $\alpha \to 0$ : $H_{M,0}/H_{N,0} = M/N$ . Expected rate reduces to MAN uncoded worst-case $K(1 - M/N)$ — consistent.

Example: Popularity Caching for Video Streaming

For $K = 100$ users, $N = 10{,}000$ video files, $\alpha = 1.0$ (typical for YouTube), compute the expected rate at $M = 100$ (1% cache) under popularity caching.

Solution

Compute harmonic numbers

$H_{10000, 1} \approx \ln 10000 + \gamma \approx 9.79$ . $H_{100, 1} \approx \ln 100 + \gamma \approx 5.19$ . Hit probability: $5.19/9.79 \approx 0.53$ .

Expected rate

$\mathbb{E}[R_{\text{pop}}] = 100 \cdot (1 - 0.53) = 47$ files/round. Without cache: 100 files/round — popularity caching halves the load.

MAN worst-case

MAN at $t = KM/N = 100 \cdot 100/10000 = 1$ : $R_{\text{MAN}} = 100 \cdot 0.99/2 = 49.5$ files/round.

Comparison

Popularity caching (47) is better than MAN worst-case (49.5) in expected rate for this $\alpha$ . At higher $\alpha$ (more concentrated), popularity wins by more. At lower $\alpha$ : MAN wins.

Which to use?

Pure popularity caching wins in expectation for concentrated demand; MAN wins in worst case (and uniformly average-case at low $\alpha$ ). Hybrid schemes (§13.4) get the best of both.

Expected Rate vs Zipf Exponent

Compares three schemes as $\alpha$ varies: (red dashed) popularity caching expected rate decreases sharply with $\alpha$ due to hit ratio boost; (blue) MAN worst-case rate is $\alpha$ -independent; (purple dashdot) demand-adaptive MAN fits in between. At low $\alpha$ (uniform), MAN wins; at high $\alpha$ (heavy concentration), popularity caching takes the lead.

Parameters

Users K20

Library size N200

Memory ratio M/N0.2

Placement Strategies Comparison

Three placement strategies under Zipf demand: popularity (best expected rate at high $\alpha$ ), uniform (simple; asymptotically good for MAN gain), and hybrid (70% popular + 30% random) which balances hit rate with coded-multicast opportunity.

Parameters

Zipf α0.8

Library N1000

Zipf Popularity and Cache Effectiveness

Zipf-distributed demand for three exponents

\alpha = 0.3, 0.8, 1.5

. Larger

\alpha

means steeper popularity concentration. Under Zipf, the effective memory ratio becomes

\mu_{\mathrm{eff}} = \mu^{1-\alpha}

— amplifying the coded caching gain substantially for moderate-to-high

\alpha

.

The Popularity vs MAN Tradeoff

For system designers, the key intuitions:

MAN is worst-case optimal but demand-agnostic. Delivers guaranteed rate regardless of demand pattern. Good when user demands are unpredictable.
Popularity caching is expectation-optimal but has bad worst-case. If users all request different unpopular files, MAN wins dramatically.
For concentrated demand: popularity caching alone is within a constant factor of expectation-optimal. MAN's worst- case guarantee is often tighter than needed.
For uniform demand: popularity has no advantage; MAN always wins.

In production CDNs, popularity-based (LFU/LRU) schemes dominate because (a) demands are concentrated enough to make popularity work, (b) coding has implementation costs. The CommIT research program has shown that coded schemes (even approximations like PDAs) can reclaim most of MAN's gain while handling Zipf demand — see §13.4 and Chapter 14.

Common Mistake: Worst-Case vs Expected Confusion

Mistake:

Comparing popularity caching's expected rate with MAN's worst-case rate and concluding "popularity wins."

Correction:

These are different metrics. Worst-case MAN guarantees a rate $\leq R_{\text{MAN}}$ for any demand. Expected popularity is lower but has no worst-case guarantee. For the same metric:

Expected MAN rate: slightly below worst-case (most demand vectors are not adversarial).
Worst-case popularity: $K(1 - M/N)$ — bad when many users request unpopular files.

A full comparison requires specifying the metric (expected or worst-case) and demand model (uniform or Zipf). Meaningful design decisions require clarity here.

Key Takeaway

Zipf concentration helps caching but does not change the scaling order. At fixed memory ratio $M/N$ , popularity caching's expected rate scales as $K(1 - H_{M,\alpha}/H_{N,\alpha})$ . Higher $\alpha$ gives smaller rate for the same $M$ . MAN's coded gain survives — Chapter 3's worst-case $(1 + t)$ factor is retained — but the operational rate in deployed systems depends on both mechanisms working together.

Non-Uniform Demand and Zipf Popularity

Real Libraries Have Popularity Skew

Definition: Zipf-Distributed Demand

Theorem: Expected Rate Under Popularity Caching

Per-user miss probability

Expected rate

Uniform limit

Example: Popularity Caching for Video Streaming

Compute harmonic numbers

Expected rate

MAN worst-case

Comparison

Which to use?

Expected Rate vs Zipf Exponent

Parameters

Placement Strategies Comparison

Parameters

Zipf Popularity and Cache Effectiveness

The Popularity vs MAN Tradeoff

Common Mistake: Worst-Case vs Expected Confusion

Key Takeaway

Definition:
Zipf-Distributed Demand