Ferkans — Interactive Telecom Tutor

Side Information at the User

Classical PIR (Chapter 13) and its extensions (Chapter 14) all assume the user starts from zero knowledge — they receive only the answers from databases. In many real systems, however, the user already has partial library content from past interactions:

Browser caches with previously-fetched files.
CDN edge caches with prefetched popular content.
Mobile clients with pinned files.

This side information can be exploited to reduce the active download from databases. Intuitively: the user can frame the query so that the answer combines the desired file with an already-known file — discarding the known side info recovers the desired file with less network bandwidth.

The capacity result was settled by Wei, Banawan, and Ulukus (2019) for the case of $M$ uncoded prefetched files. This section presents their result.

Definition:
PIR with Side Information

Setup:

$K$ files $W_1, \ldots, W_K$ , replicated across $N$ databases (as in Chapter 13).
The user has $M$ files of side information: a subset $\mathcal{S} \subset [K]$ of size $M$ , with $\theta \notin \mathcal{S}$ (the desired file is not in the cache).
The side-info set $\mathcal{S}$ may or may not be known to the databases. Two flavors: public side info (databases know $\mathcal{S}$ ) and private side info (databases don't know $\mathcal{S}$ ).

Privacy requirement (user privacy, classical): $I(\theta;\, Q^{(\theta, n)}) \;=\; 0 \quad \forall n \in [N].$

Goal: minimize the total download $D = \sum_n |A^{(\theta, n)}|$ subject to the user being able to decode $W_\theta$ from the answers and the side info.

PIR rate: $R = L / D$ .

Theorem: PIR with Side Information Capacity (Wei–Banawan–Ulukus 2019)

For PIR with $K$ files, $N$ databases, and $M$ uncoded prefetched files at the user (with the side-info set known to the databases), the PIR capacity is $C_{\text{PIR-SI}}(N, K, M) \;=\; \left(1 + \frac{1}{N} + \cdots + \frac{1}{N^{K - M - 1}}\right)^{-1}.$ The formula is identical to the Sun-Jafar formula but with $K$ replaced by $K - M$ . Setting $M = 0$ recovers classical PIR; setting $M = K - 1$ gives capacity $1$ (the user only needs to retrieve a "missing" piece of one file).

Proof

Achievability

Modify Sun-Jafar's scheme: the user uses the side info as if it were already-known interference symbols. The query design ensures that the "effective" file count is $K - M$ (the unknown files), and the Sun-Jafar achievability scheme runs with $K - M$ files.

Converse

The cut-set converse extends naturally: the entropy bound on the queries treats the side-info as a known constant. The recursive symmetrization gives $K - M$ as the effective file count.

Side info as a free trade-up

Each side-info file effectively reduces $K$ by 1. As $M$ grows, the rate improves toward $1$ (single-file retrieval). The improvement is monotone in $M$ .

Example: Side Info at $N = 4, K = 6$

Compute $C_{\text{PIR-SI}}(4, 6, M)$ for $M = 0, 1, 2, 3, 4, 5$ .

Solution

$M = 0$ (no side info)

$C(4, 6, 0) = (1 + 1/4 + 1/16 + 1/64 + 1/256 + 1/1024)^{-1} = 1024/1365 \approx 0.750$ .

$M = 1$

Effective $K = 5$ . $C = (1 + 1/4 + 1/16 + 1/64 + 1/256)^{-1} = 256/341 \approx 0.750$ (slight improvement, since geometric series almost saturated).

$M = 2, 3, 4, 5$

Effective $K - M = 4, 3, 2, 1$ . $M = 2$ : $C \approx 0.752$ . $M = 3$ : $C \approx 0.762$ . $M = 4$ : $C \approx 0.800$ . $M = 5$ : $C = 1$ (only one unknown file).

Pattern

Side info matters most at large $M$ (close to $K - 1$ ). At $M = K - 1$ , rate jumps to $1$ . At small $M$ , the improvement is marginal because Sun-Jafar is already near $1 - 1/N$ .

Theorem: Private Side Information PIR (Kadhe et al. 2020)

For PIR where the side info $\mathcal{S}$ is private (databases don't know which $M$ files the user has cached), the capacity is $C_{\text{PIR-PSI}}(N, K, M) \;\leq\; \left(1 + \frac{1}{N} + \cdots + \frac{1}{N^{K - M - 1}}\right)^{-1}$ matching the public-side-info capacity in most regimes. Achievability holds via a randomized prefetching strategy. The exact capacity is settled when the side-info selection is uniformly random over $\binom{[K]}{M}$ .

Proof

Why no rate loss

Privately holding the side info adds no rate cost compared to public side info — at least under uniform random selection. The user can encode $\mathcal{S}$ implicitly via a careful query construction; the uniform-random selection over $\binom{[K]}{M}$ ensures the "effective" privacy is satisfied.

When does private cost rate?

For non-uniform side-info selection (e.g., user cache reflects skewed file popularity), the capacity may be lower. Open problem in non-uniform regimes.

Operational

For uniformly-random caches, privacy of the cache contents is free. For skewed caches, expect a small rate cost.

PIR with Side Information: Rate vs. $M$

Plot the PIR-SI capacity $C_{\text{PIR-SI}}(N, K, M)$ as a function of $M$ for fixed $N$ and $K$ . The rate increases monotonically from the classical Sun-Jafar value at $M = 0$ to $1$ at $M = K - 1$ . The marginal rate gain per additional cached file grows as $M$ approaches $K - 1$ .

Parameters

N

— databases5

K

— files10

PIR-SI Variants — Operational Comparison

Variant	Side-Info Type	Capacity	Best Use
Classical PIR ( $M = 0$ )	None	$(1 + 1/N + \cdots + 1/N^{K-1})^{-1}$	Cold-start retrieval
Public side info	Databases know cached files	$(1 + 1/N + \cdots + 1/N^{K-M-1})^{-1}$	CDN with public cache state
Private side info	Databases don't know cached files	Same as public (uniform); open in general	Browser cache, mobile prefetch
$M \to K - 1$ limit	Almost everything cached	$\to 1$	Rare; very cheap retrieval

⚠️Engineering Note

Deploying PIR with Side Information

Practical guidelines:

Cache sizing: side info is most useful when $M$ is a substantial fraction of $K$ . For a $1\%$ cache ( $M/K = 0.01$ ), rate improvement is negligible. For a $50\%$ cache, improvement is significant.
Cache content selection: for uniformly- random caches, no extra rate cost for privacy. For popularity-driven caches (Zipf distribution), expect a small rate cost — quantification is open.
Combine with $T$ -colluding: the side-info and $T$ -colluding extensions compose. At $M, T$ , the capacity is conjectured to be $C(N, K - M, T)$ — i.e., side info reduces effective $K$ even under collusion. Verification open.
Public vs. private: if the cache state is observable (e.g., HTTP cache headers), use public side-info. If hidden, use private side-info (small or zero rate cost).
Reuse PIR primitives: a deployed PIR system can add side-info support by changing only the query construction; the database protocol stays the same.

Practical Constraints

•
Cache fraction $M/K$ : significant gain requires non-trivial fraction
•
Uniform caches: privacy free
•
Skewed caches: privacy may cost rate
•
$T$ -colluding composition: open in general

📋 Ref: Wei-Banawan-Ulukus 2019; Kadhe et al. 2020

Common Mistake: Cached Files Must Not Be the Desired File

Mistake:

Assume that PIR with side info works even if $\theta \in \mathcal{S}$ — i.e., the user already has the desired file but still wants to query "to be safe."

Correction:

PIR with side information assumes $\theta \notin \mathcal{S}$ explicitly. If the user already has $W_\theta$ , no PIR query is needed (and starting one would leak that the user has $W_\theta$ already, defeating the privacy purpose). The user must check whether $\theta$ is in their cache first; if yes, return the cached copy; if no, run PIR-SI. This pre-check is the standard CDN pattern (cache hit → respond locally; cache miss → fetch upstream). Don't conflate the two.

Key Takeaway

Side information at the user reduces the effective file count. PIR-SI capacity is $C(N, K - M)$ — Sun-Jafar's formula with $K$ replaced by $K - M$ . The improvement is monotone in $M$ , with the largest gains as $M \to K - 1$ . Public and private side info achieve the same capacity for uniformly-random caches; non-uniform private caches may incur a (small, open) rate cost.

Quick Check

For PIR with $N = 4$ databases, $K = 10$ files, and $M = 3$ side-info files, the capacity is:

Same as classical PIR with $K = 10$

Same as classical PIR with $K = 7$

$\sim 1.0$ regardless of $K$

Cannot be computed from the given info