Ferkans — Interactive Telecom Tutor

Real Libraries Are Correlated

The baseline MAN model assumes files are independent. In practice, file libraries exhibit massive structural correlation: the same movie in multiple resolutions, compressed and uncompressed versions, variants for different language tracks, HDR and SDR encodings of the same scene. If the server knows this correlation, can it exploit it to shrink the delivery rate?

The answer — yes, substantially — was developed by Wan, Tuninetti, Ji, and Caire in a line of work culminating in a 2020 paper on coded caching with correlated files. The key technique is an interference-alignment-style delivery that jointly encodes across files, exploiting the correlation structure during both placement and delivery.

Definition:
Correlated Files Model

In the correlated-files model, each file $W_n$ is decomposed as $W_n \;=\; (W_{\text{common}},\, \Delta_n),$ where $W_{\text{common}}$ is a common component shared by all files (the parts identical across variants), and $\Delta_n$ is a variant- specific component of size $(1 - \rho) F$ bits, with $\rho \in [0, 1]$ the correlation parameter. $\rho = 0$ recovers the independent-files baseline; $\rho = 1$ means all files are identical.

The model is simplified: in practice correlations are richer (multi- level hierarchies, partial overlaps). But this two-component decomposition captures the first-order effect and admits a clean rate analysis.

Theorem: Rate Formula for Correlated Files

For the correlated-files model with correlation $\rho$ , $K$ users, $N$ variants, and per-user cache $M$ , the achievable worst-case rate under the Wan-Tuninetti-Ji-Caire (WTJC) scheme is $R_{\text{WTJC}}(M, \rho) \;=\; \rho \cdot R_{\text{common}}(M_{\text{c}}) + (1 - \rho) \cdot R_{\text{MAN}}(M_{\text{v}}),$ where the total cache $M$ is split into $M_{\text{c}}$ for the common part and $M_{\text{v}}$ for the variants, and $R_{\text{common}}$ is the common-part delivery rate.

Split the cache between common and variant content. The common content is identical across all files, so it acts as a shared library of effective size 1 file — delivery of the common part is nearly free (all users want the same thing). The variants behave as an independent-files problem of reduced "effective library" size; MAN applies there.

Proof

Place common content centrally

All users cache a fraction $M_{\text{c}}/F$ of the common component $W_{\text{common}}$ . Since all demands involve $W_{\text{common}}$ (every file has it), a single broadcast of size $(1 - M_{\text{c}}/F) \cdot \rho F$ bits satisfies all users. Rate contribution: $(1 - M_{\text{c}}/F) \cdot \rho$ .

MAN on variants

The variant content $\Delta_n$ has size $(1-\rho)F$ per file. With $K$ users, $N$ variants, cache $M_{\text{v}}$ , apply the MAN scheme on these variants. Rate contribution: $(1-\rho) \cdot K(1 - M_{\text{v}}/N)/(1 + K M_{\text{v}}/N)$ , with $M_{\text{v}} = (M - M_{\text{c}}/F \cdot 1)/(1 - \rho/F \cdot 1)$ (adjust for the variant-sized library and variant-sized cache).

Optimize the split

Treating the cache-allocation $(M_{\text{c}}, M_{\text{v}})$ as a free parameter, maximize over the split. The optimum balances the two rate terms; the closed form depends on $K, N, \rho$ in a relatively clean way.

Interference-alignment gain (sketched)

The full WTJC scheme further exploits the correlation via an interference-alignment delivery that reduces the variant-component rate below MAN. The detailed construction uses linear combinations of $\Delta_n$ 's aligned to cancel at non-requesting users. We do not write out the algebra here — see the 2020 paper for the construction. $\blacksquare$

🎓CommIT Contribution(2020)

Coded Caching with Correlated Files

K. Wan, D. Tuninetti, M. Ji, G. Caire — IEEE Transactions on Information Theory, vol. 66, no. 2

This result shows how file correlation — ubiquitous in real libraries (multiple resolutions of the same movie, language dubs, HDR/SDR variants) — reduces the coded-caching delivery rate substantially. The key innovation is an interference-alignment-based delivery that jointly encodes across correlated files, achieving gains beyond what naive MAN applied to each correlated group provides.

Key insight. The uncoded "variant" portion of the library behaves like an independent-files MAN problem; the "common" portion is a single broadcast. Optimal cache allocation between the two portions depends on the correlation $\rho$ — more correlation means allocating more cache to variants. The gain over independent-files MAN can be substantial (e.g., $2\times$ or more for video libraries with many resolutions per title).

The result illustrates the CommIT program's broader theme: extending the coded-caching machinery to richer library structures. Later work extends to heterogeneous caches, non-uniform popularity, and multi- resolution streaming.

coded-cachingcommitcorrelated-filesinterference-alignmentView Paper →

Correlated-Files Gain vs. Correlation $\rho$

Normalized WTJC delivery rate $R(\rho)/R(0)$ as a function of the correlation coefficient $\rho$ . For $\rho = 0$ (independent files), the rate equals MAN's baseline. As $\rho$ grows, the rate drops because the common-component cache allocation shrinks the effective variant-library size. By $\rho = 0.8$ (typical for multi-resolution video libraries), the rate is roughly one-fifth of the independent- files rate — a substantial practical gain.

Parameters

Number of users K10

Number of variants N20

Memory ratio M/N0.2

Example: Streaming Video with Resolution Variants

A video library has 1000 titles, each encoded at 4 resolutions (480p, 720p, 1080p, 4K), so $N = 4000$ files. Users select a resolution based on their connection; the files for the same title share the lower- resolution encoding as a base layer, with correlation $\rho \approx 0.7$ . For $K = 100$ users and per-user cache $M = 400$ files (10%), compare MAN (independent assumption) with WTJC.

Solution

MAN baseline

Treat as independent files. $t = KM/N = 100 \cdot 400 / 4000 = 10$ . $R_{\text{MAN}} = 100 \cdot 0.9 / 11 \approx 8.18$ .

WTJC with $\rho = 0.7$

Effective variant library $N_{\text{eff}} \approx N(1 - \rho) = 1200$ . Effective memory ratio in variants: $\mu_{\text{eff}} = \mu N / N_{\text{eff}} = 0.33$ . Variant rate: $R_{\text{var}} = 100 \cdot 0.67 / (1 + 33) \approx 1.97$ . Common rate: $\rho \cdot (1 - M_c) = 0.7 \cdot$ (small correction) $\approx 0.7$ . Total WTJC rate: $\approx 0.7 + 1.97 = 2.67$ — about $\sim 3\times$ better than MAN.

Engineering reading

For a video CDN optimizing peak-hour load, recognizing resolution variants as correlated (instead of independent) provides a substantial delivery-rate reduction. The analysis justifies jointly encoding multi-resolution streams rather than treating them as separate files — a design choice now visible in HTTP adaptive streaming standards (DASH, HLS, CMAF).

Common Mistake: Correlation Coefficient Is a Model, Not a Measurement

Mistake:

Reading " $\rho = 0.7$ " as a precise experimentally measured quantity.

Correction:

The WTJC model is a stylized decomposition — $W_n = (W_{\text{common}}, \Delta_n)$ — with $\rho$ the fraction of common content. Real files do not have a single scalar "correlation" in this operational sense; the decomposition is a design choice that approximates the true file relationship. In practice, $\rho$ is a tuning parameter, and the resulting rate is a lower bound on what a scheme exploiting the full correlation structure could achieve.

For a multi-resolution video library, a better (but more complex) model is hierarchical: base + enhancement layers. The WTJC analysis generalizes but the closed-form rate loses its clean shape.

Correlated Files: A CommIT Extension