Correlated Files: A CommIT Extension
Example: Streaming Video with Resolution Variants
A video library has 1000 titles, each encoded at 4 resolutions (480p, 720p, 1080p, 4K), so files. Users select a resolution based on their connection; the files for the same title share the lower- resolution encoding as a base layer, with correlation . For users and per-user cache files (10%), compare MAN (independent assumption) with WTJC.
MAN baseline
Treat as independent files. . .
WTJC with $\rho = 0.7$
Effective variant library . Effective memory ratio in variants: . Variant rate: . Common rate: (small correction) . Total WTJC rate: β about better than MAN.
Engineering reading
For a video CDN optimizing peak-hour load, recognizing resolution variants as correlated (instead of independent) provides a substantial delivery-rate reduction. The analysis justifies jointly encoding multi-resolution streams rather than treating them as separate files β a design choice now visible in HTTP adaptive streaming standards (DASH, HLS, CMAF).
Common Mistake: Correlation Coefficient Is a Model, Not a Measurement
Mistake:
Reading "" as a precise experimentally measured quantity.
Correction:
The WTJC model is a stylized decomposition β β with the fraction of common content. Real files do not have a single scalar "correlation" in this operational sense; the decomposition is a design choice that approximates the true file relationship. In practice, is a tuning parameter, and the resulting rate is a lower bound on what a scheme exploiting the full correlation structure could achieve.
For a multi-resolution video library, a better (but more complex) model is hierarchical: base + enhancement layers. The WTJC analysis generalizes but the closed-form rate loses its clean shape.