Fundamental Limits of Shared-Link Coded Caching

Is the MAN Scheme Optimal?

The MAN scheme achieves a delivery load of RMAN=K(1βˆ’M/N)/(1+KM/N)R_{\text{MAN}} = K(1-M/N)/(1+KM/N). But is this the best possible? Can a cleverer placement or a more sophisticated coding scheme do better?

For the case of uncoded placement (each user stores uncoded subsets of the files), the answer is: the MAN scheme is exactly optimal for the worst-case demand. This was proved by Wan, Tuninetti, and Piantanida (2020) via a clever induction argument. For general (possibly coded) placement, the MAN scheme is optimal within a constant factor of 2 (Yu and Maddah-Ali, 2018). The gap between coded and uncoded placement is still an open problem in general.

In this section, we present the converse for uncoded placement, which reveals the combinatorial structure underlying the optimality of the MAN scheme.

Memory-Load Tradeoff: Coded vs Uncoded

Animated comparison of the delivery load for coded caching (MAN scheme) and uncoded caching as cache size increases. The coded multicasting gain 1+KM/N1 + KM/N is highlighted as a multiplier between the two curves.

Definition:

Uncoded Cache Placement

A cache placement is called uncoded if each user stores a subset of the file bits without any coding:

ZkβŠ†{Wn,i:n∈[N],β€…β€Ši∈[F]}Z_k \subseteq \{W_{n,i} : n \in [N],\; i \in [F]\}

where Wn,iW_{n,i} is the ii-th bit of file WnW_n. In other words, each cached bit is a bit of some file, not a coded combination of bits from different files.

The MAN placement is uncoded: each user stores sub-files Wn,TW_{n,\mathcal{T}} where k∈Tk \in \mathcal{T}, which are just subsets of file bits.

Uncoded placement is a restriction, but a natural one: practical caching systems typically store file fragments, not coded combinations. Coded placement (e.g., storing W1,1βŠ•W2,1W_{1,1} \oplus W_{2,1}) can sometimes reduce the delivery load, but the gains are marginal (at most a factor of 2) and the complexity is substantially higher.

Theorem: Converse for Uncoded Cache Placement

For the coded caching problem with Nβ‰₯KN \ge K files, KK users, cache size MM, and uncoded cache placement, the worst-case delivery load satisfies:

Rβˆ—(M)β‰₯K(1βˆ’M/N)1+KM/N=RMAN(M).R^*(M) \ge \frac{K(1-M/N)}{1 + KM/N} = R_{\text{MAN}}(M).

Combined with the achievability of the MAN scheme, this establishes:

Runcodedβˆ—(M)=K(1βˆ’M/N)1+KM/NR^*_{\text{uncoded}}(M) = \frac{K(1-M/N)}{1 + KM/N}

for Nβ‰₯KN \ge K and uncoded placement.

The converse says: any uncoded placement creates a side-information structure that is at best as useful as the MAN symmetric placement. The symmetric placement (where every tt-subset of users caches the same fraction) is the most "balanced" way to distribute side information, and any deviation from this balance can only hurt the worst-case delivery load.

Example: Converse Verification for K=3K=3, N=3N=3, M=1M=1

Verify the converse bound for K=N=3K = N = 3 and M=1M = 1 (t=KM/N=1t = KM/N = 1). Show that any uncoded placement requires delivery load Rβ‰₯1R \ge 1.

Theorem: Converse with Coded Placement (Yu-Maddah-Ali)

For the coded caching problem with arbitrary (possibly coded) placement, the worst-case delivery load satisfies:

Rβˆ—(M)β‰₯12β‹…K(1βˆ’M/N)1+KM/NR^*(M) \ge \frac{1}{2} \cdot \frac{K(1-M/N)}{1 + KM/N}

for Nβ‰₯KN \ge K. This means the MAN scheme with uncoded placement is optimal within a factor of 2, even when compared to schemes that use arbitrarily complex coded placement.

This is a remarkable result: coded placement cannot improve by more than a factor of 2 over uncoded placement. The intuition is that the fundamental limitation comes from the structure of the demand (which files are requested), not the form of the cache content. Coded placement can create more efficient side information, but the improvement is bounded because the multicast opportunities are ultimately constrained by the demand combinatorics.

Coded Caching: Achievability and Converse Bounds

Compare the MAN achievable load, the uncoded placement converse, the Yu-Maddah-Ali general converse, and the uncoded caching baseline.

Parameters
20
10

Definition:

Coded Caching with Correlated Files

When files are correlated (e.g., different quality versions of the same video, or spatially correlated sensor data), the joint entropy H(W1,…,WN)<NFH(W_1, \ldots, W_N) < NF. The correlation can be exploited in both phases:

  • Placement: Cache the common information (Gacs-Korner or Wyner common information) to avoid redundancy across files.
  • Delivery: Compress the delivery message conditioned on the cached content, exploiting the conditional entropy reduction.

Wan, Tuninetti, and Ji (with Caire) showed that the memory-load tradeoff for correlated files is:

Rβˆ—(M)=K(1βˆ’M/NΛ‰)1+KM/NΛ‰R^*(M) = \frac{K(1 - M/\bar{N})}{1 + KM/\bar{N}}

where Nˉ\bar{N} is an effective number of files that accounts for the correlation structure. When files are independent, Nˉ=N\bar{N} = N and we recover the MAN result.

πŸŽ“CommIT Contribution(2020)

Coded Caching with Correlated Files

K. Wan, D. Tuninetti, M. Ji, G. Caire β€” IEEE Trans. Inf. Theory

This work extends the MAN coded caching framework to correlated file libraries. The key insight is that file correlation reduces the effective library size, allowing the same coded multicasting gain with less cache memory. The authors establish matching achievability and converse for the correlated setting, generalizing the original MAN result.

coded-cachingcorrelated-sourcesmemory-load-tradeoffView Paper β†’

Definition:

Demand-Private Coded Caching

In demand-private coded caching, the delivery message XX must not reveal any user's demand to the other users. Formally, for any two demand vectors d\mathbf{d} and dβ€²\mathbf{d}' that differ only in user kk's demand:

I(dk;X,ZkΛ‰βˆ£dkΛ‰)=0I(d_k; X, Z_{\bar{k}} | d_{\bar{k}}) = 0

where ZkˉZ_{\bar{k}} denotes the caches of all users except kk.

Wan and Caire showed that demand privacy can be achieved with only a small increase in delivery load: the private scheme achieves Rprivate=RMAN+O(1/K)R_{\text{private}} = R_{\text{MAN}} + O(1/K), and the gap vanishes as Kβ†’βˆžK \to \infty.

πŸŽ“CommIT Contribution(2021)

Demand-Private Coded Caching

K. Wan, G. Caire β€” IEEE Trans. Inf. Theory

This work introduces the demand privacy constraint to coded caching. The main result is that the coded multicasting gain can be preserved even when each user's demand must be hidden from the others. The key technique is "virtual users" padding that masks the demand structure without significantly increasing the load.

coded-cachingdemand-privacyinformation-theoretic-securityView Paper β†’

Common Mistake: The N<KN < K Regime Is Different

Mistake:

Applying the MAN converse formula Rβˆ—=K(1βˆ’M/N)/(1+KM/N)R^* = K(1-M/N)/(1+KM/N) when N<KN < K (fewer files than users). In this regime, some users must request the same file, and the delivery load can be lower.

Correction:

When N<KN < K, the worst case is no longer all-distinct demands (which is impossible). The optimal scheme must account for demand collisions: if two users request the same file and one has cached different parts, the coded multicast can exploit this. The MAN scheme remains a good starting point but is not exactly optimal for N<KN < K. The optimal memory-load tradeoff for N<KN < K is fully characterized only for specific regimes.

Quick Check

The MAN scheme with uncoded placement is optimal (under the uncoded placement constraint) for all Nβ‰₯KN \ge K. What is the key property of the MAN placement that makes it optimal?

It caches the most popular files

It creates the most symmetric side-information structure across users

It uses coded placement to compress the cached content

⚠️Engineering Note

Practical Coded Caching: From Theory to Systems

Despite the elegant theory, practical deployment of coded caching faces challenges:

  1. Subpacketization: (Kt)\binom{K}{t} grows combinatorially. For K=20,t=4K = 20, t = 4: (204)=4845\binom{20}{4} = 4845 sub-files per file. Manageable for ∼\sim1 GB files, but K=50K = 50 is intractable.
  2. Asynchronous demands: Users don't all reveal demands simultaneously. Online coded caching (where the server must respond before seeing all demands) loses some of the multicast gain.
  3. Heterogeneous caches: Not all users have the same cache size. Optimal schemes for heterogeneous {Mk}\{M_k\} are more complex but exist.
  4. File popularity: In practice, some files are much more popular than others (Zipf distribution). Popularity-aware coded caching combines coded placement for popular files with uncoded caching for less popular files.

Current research focuses on reducing subpacketization via combinatorial designs and grouping strategies, which trade some multicast gain for practical feasibility.

Practical Constraints
  • β€’

    Subpacketization must satisfy Fβ‰₯(Kt)F \ge \binom{K}{t} bits per file

  • β€’

    Practical video files: 0.5-10 GB (enough for K≀20K \le 20 typically)

  • β€’

    CDN cache sizes: 1-100 TB per edge server

Key Takeaway

The MAN scheme is optimal under uncoded placement. For Nβ‰₯KN \ge K, the delivery load Rβˆ—=K(1βˆ’M/N)/(1+KM/N)R^* = K(1-M/N)/(1+KM/N) is the exact minimum under uncoded cache placement. Even with arbitrary coded placement, the load cannot be reduced by more than a factor of 2. File correlation reduces the effective library size, and demand privacy can be achieved with negligible overhead. The main practical barrier is subpacketization, not optimality.