Ferkans — Interactive Telecom Tutor

Why Privacy Matters in Coded Caching

Traditional coded caching has a subtle privacy issue: in the MAN delivery, each user must know the full demand vector $\mathbf{d} = (d_1, \ldots, d_K)$ to decode its own file. User 1 XOR-cancels using $W_{d_2, \mathcal{S}}$ from its cache — but to do so, it must know the file index $d_2$ . User 1 therefore learns what users 2, 3, ..., K are watching.

For a CDN operator, this may seem innocuous (users knowing each other's anonymous demands). But in real deployments, demands can reveal personal information: viewing habits, medical searches, political preferences. Information-theoretic privacy requires that the delivery mechanism leaks zero information about others' demands to any individual user.

Remarkably, the Wan-Caire CommIT result shows that demand privacy can be achieved at zero rate cost in the shared-link setting. The MAN rate $R = K(1-M/N)/(1+KM/N)$ remains achievable, with privacy added via shared randomness. This is the central message of Chapter 12.

Definition:
Information-Theoretic Demand Privacy

A coded caching scheme achieves demand privacy if, for every user $k$ , the received delivery message $X_{\mathbf{d}}$ together with user $k$ 's cache $\mathcal{Z}_k$ and own demand $d_k$ is statistically independent of the other users' demands $\mathbf{d}_{-k} = (d_1, \ldots, d_{k-1}, d_{k+1}, \ldots, d_K)$ : $I(\mathbf{d}_{-k};\; X_{\mathbf{d}}, \mathcal{Z}_k \mid d_k) \;=\; 0, \quad \forall k \in [K].$ Equivalently: user $k$ 's posterior over $\mathbf{d}_{-k}$ given its observations equals the prior.

This is a strict information-theoretic privacy definition. No amount of computation can reveal $\mathbf{d}_{-k}$ from $X_{\mathbf{d}}$ and $\mathcal{Z}_k$ . It is stronger than cryptographic privacy (which only limits polynomial-time adversaries).

Definition:
Adversary Models

The privacy guarantee depends on who the adversary is:

Single-user adversary. One user $k$ tries to learn $\mathbf{d}_{-k}$ from its cache + received message. (Most common setting; Wan-Caire 2021.)
Coalition of colluding users. A group $\mathcal{Z} \subseteq [K]$ , $|\mathcal{Z}| = z$ , jointly tries to learn demands of non-colluding users. (D2D setting; Wan-Sun-Ji-Tuninetti-Caire.)
Server adversary. The server itself tries to learn users' demands. In the shared-link model, the server has the demand vector by construction, so this is not applicable. In D2D or cloud-RAN models, it can matter.
Eavesdropper adversary. An outside observer who sees the broadcast messages but not any cache. Simpler to analyze; typically achieved by trivial encryption.

Chapter 12 focuses on (1) and (2) — the interesting cases.

Why the MAN Scheme Leaks

In the standard MAN delivery of Chapter 2, user 1's decoding of $W_{d_1}$ from $X_{\mathcal{S}'} = \bigoplus_{k \in \mathcal{S}'} W_{d_k, \mathcal{S}' \setminus \{k\}}$ requires knowing the indices $\{d_k : k \in \mathcal{S}'\}$ . From these indices, user 1 learns exactly what other users are watching. Concretely:

User 1 sees the message $W_{d_2, \{1,3\}} \oplus W_{d_3, \{1,2\}} \oplus W_{d_1, \{2,3\}}$ for $\mathcal{S}' = \{1, 2, 3\}$ .
To recover $W_{d_1, \{2,3\}}$ , user 1 needs to XOR out $W_{d_2, \{1,3\}}$ and $W_{d_3, \{1,2\}}$ .
User 1 knows which file each summand is (it read the "label" $d_2, d_3$ to find the right subfiles in its cache).

Hence the MAN scheme leaks $\mathbf{d}_{-k}$ to each user — not zero leakage.

The Wan-Caire scheme avoids this by masking the summand labels with shared randomness. We develop this in §12.2.

Example: Quantifying Leakage in MAN

For $K = 10$ , $N = 100$ , under standard MAN delivery with user 1 observing $X_{\mathbf{d}}$ and its cache, compute the leakage $I(\mathbf{d}_{-1}; X_{\mathbf{d}}, \mathcal{Z}_1)$ .

Solution

User 1's observations

User 1 sees all 3-subset XOR messages containing user 1, which involve all other users 2, 3, ..., 10 as "participants" — revealing their demands $d_2, ..., d_{10}$ .

Leakage

$I(\mathbf{d}_{-1}; X, \mathcal{Z}_1) = H(\mathbf{d}_{-1}) - H(\mathbf{d}_{-1} | X, \mathcal{Z}_1) = H(\mathbf{d}_{-1}) - 0 = H(\mathbf{d}_{-1})$ .

Numerical

With 9 other users each choosing one of 100 files (independent, uniform): $H(\mathbf{d}_{-1}) = 9 \log_2 100 \approx 59.8$ bits. User 1 learns ~60 bits about others' demands.

Wan-Caire private scheme

With the CommIT private scheme: leakage is reduced to 0 exactly. User 1 learns nothing.

Definition:
Shared Randomness

The Wan-Caire private scheme relies on shared randomness — a secret key $\mathcal{K}$ known to all users but unknown to an external adversary. Specifically, users receive a random permutation (or similar structure) via a pre-shared key. The server uses this randomness to mask the file-index labels in the delivery messages, so that users cannot identify which file is which without the shared key.

The key $\mathcal{K}$ is pre-distributed offline (during placement) at zero delivery-phase cost. Each user holds its own share of the key.

Practically, shared randomness is established via secure key exchange protocols (Diffie-Hellman, PKI). Costs are operational (key management) rather than communication-rate.

Private coded caching setup — Server holds library $\mathcal{W}$ . Each user has a cache $\mathcal{Z}_k$ and a share of shared randomness $\mathcal{K}_k$ . The server computes delivery $X_{\mathbf{d}, \mathcal{K}}$ — masked by shared randomness — broadcast to all users. Each user decodes using its cache and key share; cannot learn others' demands.

Key Takeaway

Demand privacy is free in the information-theoretic sense. The Wan-Caire 2021 result establishes that the MAN rate $R = K(1-M/N)/ (1+KM/N)$ is achievable with zero leakage about other users' demands. The cost is only in the shared-randomness setup, which is asymptotically negligible for large file sizes. This is a remarkable positive result: privacy and rate are not in tension.

⚠️Engineering Note

Information-Theoretic vs Cryptographic Privacy

A key distinction:

Information-theoretic privacy. Adversary has unbounded computation. Security is absolute — no amount of computation can break it. Requires shared randomness as long as the private content.
Cryptographic privacy. Adversary bounded to polynomial time. Security assumes hardness of some math problem (factoring, discrete log, etc.). Public-key encryption, RSA, etc. require no shared randomness but are breakable with enough compute.

Information-theoretic privacy is stronger but more expensive (shared randomness per message). For coded caching, the Wan- Caire scheme uses the fact that shared randomness is essentially free in the placement phase — users establish keys offline; keys cost no delivery bandwidth.

For applications where post-quantum security matters (where cryptographic assumptions might eventually fail), information- theoretic privacy is preferred despite the shared-randomness overhead.

Practical Constraints

•
Shared randomness ~ K log(N) bits per delivery round
•
Pre-distributed via key exchange (Diffie-Hellman typical)
•
Wan-Caire achieves zero leakage with no delivery overhead
•
Cryptographic schemes (AES, RSA) easier but break under quantum adversary

The Demand Privacy Model