Ferkans — Interactive Telecom Tutor

Is the MAN Scheme Optimal?

The MAN scheme achieves a delivery load of $R_{\text{MAN}} = K(1-M/N)/(1+KM/N)$ . But is this the best possible? Can a cleverer placement or a more sophisticated coding scheme do better?

For the case of uncoded placement (each user stores uncoded subsets of the files), the answer is: the MAN scheme is exactly optimal for the worst-case demand. This was proved by Wan, Tuninetti, and Piantanida (2020) via a clever induction argument. For general (possibly coded) placement, the MAN scheme is optimal within a constant factor of 2 (Yu and Maddah-Ali, 2018). The gap between coded and uncoded placement is still an open problem in general.

In this section, we present the converse for uncoded placement, which reveals the combinatorial structure underlying the optimality of the MAN scheme.

Memory-Load Tradeoff: Coded vs Uncoded

Animated comparison of the delivery load for coded caching (MAN scheme) and uncoded caching as cache size increases. The coded multicasting gain

1 + KM/N

is highlighted as a multiplier between the two curves.

Definition:
Uncoded Cache Placement

A cache placement is called uncoded if each user stores a subset of the file bits without any coding:

$Z_k \subseteq \{W_{n,i} : n \in [N],\; i \in [F]\}$

where $W_{n,i}$ is the $i$ -th bit of file $W_n$ . In other words, each cached bit is a bit of some file, not a coded combination of bits from different files.

The MAN placement is uncoded: each user stores sub-files $W_{n,\mathcal{T}}$ where $k \in \mathcal{T}$ , which are just subsets of file bits.

Uncoded placement is a restriction, but a natural one: practical caching systems typically store file fragments, not coded combinations. Coded placement (e.g., storing $W_{1,1} \oplus W_{2,1}$ ) can sometimes reduce the delivery load, but the gains are marginal (at most a factor of 2) and the complexity is substantially higher.

Theorem: Converse for Uncoded Cache Placement

For the coded caching problem with $N \ge K$ files, $K$ users, cache size $M$ , and uncoded cache placement, the worst-case delivery load satisfies:

$R^*(M) \ge \frac{K(1-M/N)}{1 + KM/N} = R_{\text{MAN}}(M).$

Combined with the achievability of the MAN scheme, this establishes:

$R^*_{\text{uncoded}}(M) = \frac{K(1-M/N)}{1 + KM/N}$

for $N \ge K$ and uncoded placement.

The converse says: any uncoded placement creates a side-information structure that is at best as useful as the MAN symmetric placement. The symmetric placement (where every $t$ -subset of users caches the same fraction) is the most "balanced" way to distribute side information, and any deviation from this balance can only hurt the worst-case delivery load.

Proof

Induction on $(K, t)$

The proof proceeds by induction on the number of users $K$ . The base case $K = 1$ is trivial: $R^*(M) = 1 - M/N$ . For the inductive step, consider $K$ users and fix a worst-case demand pattern where all users request distinct files.

Lower bound via acyclic index coding

Given the uncoded placement and a demand pattern, the delivery problem reduces to an index coding problem. The server has $K$ messages (the missing sub-files for each user), and each user has side information (its cached sub-files). The index coding converse gives:

$RF \ge \sum_{k=1}^{K} H(W_{d_k} | Z_k, W_{d_1}, \ldots, W_{d_{k-1}})$

for any ordering of users. Averaging over orderings and using the symmetry of the MAN placement yields the desired bound.

Tightness argument

The bound is achieved by the MAN delivery scheme, which precisely encodes the missing sub-files using XOR operations that respect the side-information structure. The combinatorial identity $\sum_{k=1}^K \frac{1}{\binom{K}{t}} \binom{K-1}{t} = K(K-t)/((t+1)\binom{K}{t}) = (K-t)/(t+1)$ confirms the exact match.

Example: Converse Verification for $K=3$ , $N=3$ , $M=1$

Verify the converse bound for $K = N = 3$ and $M = 1$ ( $t = KM/N = 1$ ). Show that any uncoded placement requires delivery load $R \ge 1$ .

Solution

MAN scheme achievability

$R_{\text{MAN}} = 3(1-1/3)/(1+1) = 2/2 = 1$ .

Converse argument

With $N = K = 3$ , each user caches $MF = F$ bits out of $3F$ total bits. In the worst case, all users request different files: $d_1 = 1, d_2 = 2, d_3 = 3$ .

User 1 needs $F$ bits of file 1, minus what it has cached from file 1. With uncoded placement, user 1 caches at most $F$ bits distributed across the three files.

By a counting argument: the total number of "useful" cached bits (bits that help satisfy demands) is at most $\sum_k |Z_k \cap W_{d_k}| \le KMF/N = F$ (each bit can be useful for at most one user). So the server must deliver at least $KF - F = 2F$ bits. But with coding, the load can be reduced to $F$ (by the index coding gain). The MAN bound $R = 1$ is tight.

Theorem: Converse with Coded Placement (Yu-Maddah-Ali)

For the coded caching problem with arbitrary (possibly coded) placement, the worst-case delivery load satisfies:

$R^*(M) \ge \frac{1}{2} \cdot \frac{K(1-M/N)}{1 + KM/N}$

for $N \ge K$ . This means the MAN scheme with uncoded placement is optimal within a factor of 2, even when compared to schemes that use arbitrarily complex coded placement.

This is a remarkable result: coded placement cannot improve by more than a factor of 2 over uncoded placement. The intuition is that the fundamental limitation comes from the structure of the demand (which files are requested), not the form of the cache content. Coded placement can create more efficient side information, but the improvement is bounded because the multicast opportunities are ultimately constrained by the demand combinatorics.

Proof

Cut-set argument

For any subset $\mathcal{S} \subseteq [K]$ of users and any demand where users in $\mathcal{S}$ request distinct files: $|\mathcal{S}|RF + |\mathcal{S}|MF \ge |\mathcal{S}|F$ (the delivery plus cache must cover all demanded files). This gives $R \ge (1 - M/N)|\mathcal{S}|/|\mathcal{S}| = 1 - M/N$ (trivial).

Refined bound via submodularity

The refined bound uses the submodularity of entropy and a careful accounting of how many bits of each file are "useful" across different demand patterns. By averaging over all $\binom{N}{K}$ worst-case demand patterns (where all $K$ users request distinct files), the factor-of-2 gap emerges from a combinatorial counting argument.

Coded Caching: Achievability and Converse Bounds

Compare the MAN achievable load, the uncoded placement converse, the Yu-Maddah-Ali general converse, and the uncoded caching baseline.

Parameters

Number of files

N

20

Number of users

K

10

Definition:
Coded Caching with Correlated Files

When files are correlated (e.g., different quality versions of the same video, or spatially correlated sensor data), the joint entropy $H(W_1, \ldots, W_N) < NF$ . The correlation can be exploited in both phases:

Placement: Cache the common information (Gacs-Korner or Wyner common information) to avoid redundancy across files.
Delivery: Compress the delivery message conditioned on the cached content, exploiting the conditional entropy reduction.

Wan, Tuninetti, and Ji (with Caire) showed that the memory-load tradeoff for correlated files is:

$R^*(M) = \frac{K(1 - M/\bar{N})}{1 + KM/\bar{N}}$

where $\bar{N}$ is an effective number of files that accounts for the correlation structure. When files are independent, $\bar{N} = N$ and we recover the MAN result.

🎓CommIT Contribution(2020)

Coded Caching with Correlated Files

K. Wan, D. Tuninetti, M. Ji, G. Caire — IEEE Trans. Inf. Theory

This work extends the MAN coded caching framework to correlated file libraries. The key insight is that file correlation reduces the effective library size, allowing the same coded multicasting gain with less cache memory. The authors establish matching achievability and converse for the correlated setting, generalizing the original MAN result.

coded-cachingcorrelated-sourcesmemory-load-tradeoffView Paper →

Definition:
Demand-Private Coded Caching

In demand-private coded caching, the delivery message $X$ must not reveal any user's demand to the other users. Formally, for any two demand vectors $\mathbf{d}$ and $\mathbf{d}'$ that differ only in user $k$ 's demand:

$I(d_k; X, Z_{\bar{k}} | d_{\bar{k}}) = 0$

where $Z_{\bar{k}}$ denotes the caches of all users except $k$ .

Wan and Caire showed that demand privacy can be achieved with only a small increase in delivery load: the private scheme achieves $R_{\text{private}} = R_{\text{MAN}} + O(1/K)$ , and the gap vanishes as $K \to \infty$ .

🎓CommIT Contribution(2021)

Demand-Private Coded Caching

K. Wan, G. Caire — IEEE Trans. Inf. Theory

This work introduces the demand privacy constraint to coded caching. The main result is that the coded multicasting gain can be preserved even when each user's demand must be hidden from the others. The key technique is "virtual users" padding that masks the demand structure without significantly increasing the load.

coded-cachingdemand-privacyinformation-theoretic-securityView Paper →

Common Mistake: The $N < K$ Regime Is Different

Mistake:

Applying the MAN converse formula $R^* = K(1-M/N)/(1+KM/N)$ when $N < K$ (fewer files than users). In this regime, some users must request the same file, and the delivery load can be lower.

Correction:

When $N < K$ , the worst case is no longer all-distinct demands (which is impossible). The optimal scheme must account for demand collisions: if two users request the same file and one has cached different parts, the coded multicast can exploit this. The MAN scheme remains a good starting point but is not exactly optimal for $N < K$ . The optimal memory-load tradeoff for $N < K$ is fully characterized only for specific regimes.

Quick Check

The MAN scheme with uncoded placement is optimal (under the uncoded placement constraint) for all $N \ge K$ . What is the key property of the MAN placement that makes it optimal?

It caches the most popular files

It creates the most symmetric side-information structure across users

It uses coded placement to compress the cached content

Correction:

It creates the most symmetric side-information structure across users

The MAN placement ensures that every $t$ -subset of users caches the same fraction of every file. This maximal symmetry creates the maximum number of multicast opportunities for any demand pattern.

⚠️Engineering Note

Practical Coded Caching: From Theory to Systems

Despite the elegant theory, practical deployment of coded caching faces challenges:

Subpacketization: $\binom{K}{t}$ grows combinatorially. For $K = 20, t = 4$ : $\binom{20}{4} = 4845$ sub-files per file. Manageable for $\sim$ 1 GB files, but $K = 50$ is intractable.
Asynchronous demands: Users don't all reveal demands simultaneously. Online coded caching (where the server must respond before seeing all demands) loses some of the multicast gain.
Heterogeneous caches: Not all users have the same cache size. Optimal schemes for heterogeneous $\{M_k\}$ are more complex but exist.
File popularity: In practice, some files are much more popular than others (Zipf distribution). Popularity-aware coded caching combines coded placement for popular files with uncoded caching for less popular files.

Current research focuses on reducing subpacketization via combinatorial designs and grouping strategies, which trade some multicast gain for practical feasibility.

Practical Constraints

•
Subpacketization must satisfy $F \ge \binom{K}{t}$ bits per file
•
Practical video files: 0.5-10 GB (enough for $K \le 20$ typically)
•
CDN cache sizes: 1-100 TB per edge server

Key Takeaway

The MAN scheme is optimal under uncoded placement. For $N \ge K$ , the delivery load $R^* = K(1-M/N)/(1+KM/N)$ is the exact minimum under uncoded cache placement. Even with arbitrary coded placement, the load cannot be reduced by more than a factor of 2. File correlation reduces the effective library size, and demand privacy can be achieved with negligible overhead. The main practical barrier is subpacketization, not optimality.

Fundamental Limits of Shared-Link Coded Caching