PIR with Side Information

Side Information at the User

Classical PIR (Chapter 13) and its extensions (Chapter 14) all assume the user starts from zero knowledge — they receive only the answers from databases. In many real systems, however, the user already has partial library content from past interactions:

  • Browser caches with previously-fetched files.
  • CDN edge caches with prefetched popular content.
  • Mobile clients with pinned files.

This side information can be exploited to reduce the active download from databases. Intuitively: the user can frame the query so that the answer combines the desired file with an already-known file — discarding the known side info recovers the desired file with less network bandwidth.

The capacity result was settled by Wei, Banawan, and Ulukus (2019) for the case of MM uncoded prefetched files. This section presents their result.

Definition:

PIR with Side Information

Setup:

  • KK files W1,,WKW_1, \ldots, W_K, replicated across NN databases (as in Chapter 13).
  • The user has MM files of side information: a subset S[K]\mathcal{S} \subset [K] of size MM, with θS\theta \notin \mathcal{S} (the desired file is not in the cache).
  • The side-info set S\mathcal{S} may or may not be known to the databases. Two flavors: public side info (databases know S\mathcal{S}) and private side info (databases don't know S\mathcal{S}).

Privacy requirement (user privacy, classical): I(θ;Q(θ,n))  =  0n[N].I(\theta;\, Q^{(\theta, n)}) \;=\; 0 \quad \forall n \in [N].

Goal: minimize the total download D=nA(θ,n)D = \sum_n |A^{(\theta, n)}| subject to the user being able to decode WθW_\theta from the answers and the side info.

PIR rate: R=L/DR = L / D.

Theorem: PIR with Side Information Capacity (Wei–Banawan–Ulukus 2019)

For PIR with KK files, NN databases, and MM uncoded prefetched files at the user (with the side-info set known to the databases), the PIR capacity is CPIR-SI(N,K,M)  =  (1+1N++1NKM1)1.C_{\text{PIR-SI}}(N, K, M) \;=\; \left(1 + \frac{1}{N} + \cdots + \frac{1}{N^{K - M - 1}}\right)^{-1}. The formula is identical to the Sun-Jafar formula but with KK replaced by KMK - M. Setting M=0M = 0 recovers classical PIR; setting M=K1M = K - 1 gives capacity 11 (the user only needs to retrieve a "missing" piece of one file).

Example: Side Info at N=4,K=6N = 4, K = 6

Compute CPIR-SI(4,6,M)C_{\text{PIR-SI}}(4, 6, M) for M=0,1,2,3,4,5M = 0, 1, 2, 3, 4, 5.

Theorem: Private Side Information PIR (Kadhe et al. 2020)

For PIR where the side info S\mathcal{S} is private (databases don't know which MM files the user has cached), the capacity is CPIR-PSI(N,K,M)    (1+1N++1NKM1)1C_{\text{PIR-PSI}}(N, K, M) \;\leq\; \left(1 + \frac{1}{N} + \cdots + \frac{1}{N^{K - M - 1}}\right)^{-1} matching the public-side-info capacity in most regimes. Achievability holds via a randomized prefetching strategy. The exact capacity is settled when the side-info selection is uniformly random over ([K]M)\binom{[K]}{M}.

PIR with Side Information: Rate vs. MM

Plot the PIR-SI capacity CPIR-SI(N,K,M)C_{\text{PIR-SI}}(N, K, M) as a function of MM for fixed NN and KK. The rate increases monotonically from the classical Sun-Jafar value at M=0M = 0 to 11 at M=K1M = K - 1. The marginal rate gain per additional cached file grows as MM approaches K1K - 1.

Parameters
5
10

PIR-SI Variants — Operational Comparison

VariantSide-Info TypeCapacityBest Use
Classical PIR (M=0M = 0)None(1+1/N++1/NK1)1(1 + 1/N + \cdots + 1/N^{K-1})^{-1}Cold-start retrieval
Public side infoDatabases know cached files(1+1/N++1/NKM1)1(1 + 1/N + \cdots + 1/N^{K-M-1})^{-1}CDN with public cache state
Private side infoDatabases don't know cached filesSame as public (uniform); open in generalBrowser cache, mobile prefetch
MK1M \to K - 1 limitAlmost everything cached1\to 1Rare; very cheap retrieval
⚠️Engineering Note

Deploying PIR with Side Information

Practical guidelines:

  • Cache sizing: side info is most useful when MM is a substantial fraction of KK. For a 1%1\% cache (M/K=0.01M/K = 0.01), rate improvement is negligible. For a 50%50\% cache, improvement is significant.

  • Cache content selection: for uniformly- random caches, no extra rate cost for privacy. For popularity-driven caches (Zipf distribution), expect a small rate cost — quantification is open.

  • Combine with TT-colluding: the side-info and TT-colluding extensions compose. At M,TM, T, the capacity is conjectured to be C(N,KM,T)C(N, K - M, T) — i.e., side info reduces effective KK even under collusion. Verification open.

  • Public vs. private: if the cache state is observable (e.g., HTTP cache headers), use public side-info. If hidden, use private side-info (small or zero rate cost).

  • Reuse PIR primitives: a deployed PIR system can add side-info support by changing only the query construction; the database protocol stays the same.

Practical Constraints
  • Cache fraction M/KM/K: significant gain requires non-trivial fraction

  • Uniform caches: privacy free

  • Skewed caches: privacy may cost rate

  • TT-colluding composition: open in general

📋 Ref: Wei-Banawan-Ulukus 2019; Kadhe et al. 2020

Common Mistake: Cached Files Must Not Be the Desired File

Mistake:

Assume that PIR with side info works even if θS\theta \in \mathcal{S} — i.e., the user already has the desired file but still wants to query "to be safe."

Correction:

PIR with side information assumes θS\theta \notin \mathcal{S} explicitly. If the user already has WθW_\theta, no PIR query is needed (and starting one would leak that the user has WθW_\theta already, defeating the privacy purpose). The user must check whether θ\theta is in their cache first; if yes, return the cached copy; if no, run PIR-SI. This pre-check is the standard CDN pattern (cache hit → respond locally; cache miss → fetch upstream). Don't conflate the two.

Key Takeaway

Side information at the user reduces the effective file count. PIR-SI capacity is C(N,KM)C(N, K - M) — Sun-Jafar's formula with KK replaced by KMK - M. The improvement is monotone in MM, with the largest gains as MK1M \to K - 1. Public and private side info achieve the same capacity for uniformly-random caches; non-uniform private caches may incur a (small, open) rate cost.

Quick Check

For PIR with N=4N = 4 databases, K=10K = 10 files, and M=3M = 3 side-info files, the capacity is:

Same as classical PIR with K=10K = 10

Same as classical PIR with K=7K = 7

1.0\sim 1.0 regardless of KK

Cannot be computed from the given info