PIR with Side Information
Side Information at the User
Classical PIR (Chapter 13) and its extensions (Chapter 14) all assume the user starts from zero knowledge — they receive only the answers from databases. In many real systems, however, the user already has partial library content from past interactions:
- Browser caches with previously-fetched files.
- CDN edge caches with prefetched popular content.
- Mobile clients with pinned files.
This side information can be exploited to reduce the active download from databases. Intuitively: the user can frame the query so that the answer combines the desired file with an already-known file — discarding the known side info recovers the desired file with less network bandwidth.
The capacity result was settled by Wei, Banawan, and Ulukus (2019) for the case of uncoded prefetched files. This section presents their result.
Definition: PIR with Side Information
PIR with Side Information
Setup:
- files , replicated across databases (as in Chapter 13).
- The user has files of side information: a subset of size , with (the desired file is not in the cache).
- The side-info set may or may not be known to the databases. Two flavors: public side info (databases know ) and private side info (databases don't know ).
Privacy requirement (user privacy, classical):
Goal: minimize the total download subject to the user being able to decode from the answers and the side info.
PIR rate: .
Theorem: PIR with Side Information Capacity (Wei–Banawan–Ulukus 2019)
For PIR with files, databases, and uncoded prefetched files at the user (with the side-info set known to the databases), the PIR capacity is The formula is identical to the Sun-Jafar formula but with replaced by . Setting recovers classical PIR; setting gives capacity (the user only needs to retrieve a "missing" piece of one file).
Achievability
Modify Sun-Jafar's scheme: the user uses the side info as if it were already-known interference symbols. The query design ensures that the "effective" file count is (the unknown files), and the Sun-Jafar achievability scheme runs with files.
Converse
The cut-set converse extends naturally: the entropy bound on the queries treats the side-info as a known constant. The recursive symmetrization gives as the effective file count.
Side info as a free trade-up
Each side-info file effectively reduces by 1. As grows, the rate improves toward (single-file retrieval). The improvement is monotone in .
Example: Side Info at
Compute for .
$M = 0$ (no side info)
.
$M = 1$
Effective . (slight improvement, since geometric series almost saturated).
$M = 2, 3, 4, 5$
Effective . : . : . : . : (only one unknown file).
Pattern
Side info matters most at large (close to ). At , rate jumps to . At small , the improvement is marginal because Sun-Jafar is already near .
Theorem: Private Side Information PIR (Kadhe et al. 2020)
For PIR where the side info is private (databases don't know which files the user has cached), the capacity is matching the public-side-info capacity in most regimes. Achievability holds via a randomized prefetching strategy. The exact capacity is settled when the side-info selection is uniformly random over .
Why no rate loss
Privately holding the side info adds no rate cost compared to public side info — at least under uniform random selection. The user can encode implicitly via a careful query construction; the uniform-random selection over ensures the "effective" privacy is satisfied.
When does private cost rate?
For non-uniform side-info selection (e.g., user cache reflects skewed file popularity), the capacity may be lower. Open problem in non-uniform regimes.
Operational
For uniformly-random caches, privacy of the cache contents is free. For skewed caches, expect a small rate cost.
PIR with Side Information: Rate vs.
Plot the PIR-SI capacity as a function of for fixed and . The rate increases monotonically from the classical Sun-Jafar value at to at . The marginal rate gain per additional cached file grows as approaches .
Parameters
PIR-SI Variants — Operational Comparison
| Variant | Side-Info Type | Capacity | Best Use |
|---|---|---|---|
| Classical PIR () | None | Cold-start retrieval | |
| Public side info | Databases know cached files | CDN with public cache state | |
| Private side info | Databases don't know cached files | Same as public (uniform); open in general | Browser cache, mobile prefetch |
| limit | Almost everything cached | Rare; very cheap retrieval |
Deploying PIR with Side Information
Practical guidelines:
-
Cache sizing: side info is most useful when is a substantial fraction of . For a cache (), rate improvement is negligible. For a cache, improvement is significant.
-
Cache content selection: for uniformly- random caches, no extra rate cost for privacy. For popularity-driven caches (Zipf distribution), expect a small rate cost — quantification is open.
-
Combine with -colluding: the side-info and -colluding extensions compose. At , the capacity is conjectured to be — i.e., side info reduces effective even under collusion. Verification open.
-
Public vs. private: if the cache state is observable (e.g., HTTP cache headers), use public side-info. If hidden, use private side-info (small or zero rate cost).
-
Reuse PIR primitives: a deployed PIR system can add side-info support by changing only the query construction; the database protocol stays the same.
- •
Cache fraction : significant gain requires non-trivial fraction
- •
Uniform caches: privacy free
- •
Skewed caches: privacy may cost rate
- •
-colluding composition: open in general
Common Mistake: Cached Files Must Not Be the Desired File
Mistake:
Assume that PIR with side info works even if — i.e., the user already has the desired file but still wants to query "to be safe."
Correction:
PIR with side information assumes explicitly. If the user already has , no PIR query is needed (and starting one would leak that the user has already, defeating the privacy purpose). The user must check whether is in their cache first; if yes, return the cached copy; if no, run PIR-SI. This pre-check is the standard CDN pattern (cache hit → respond locally; cache miss → fetch upstream). Don't conflate the two.
Key Takeaway
Side information at the user reduces the effective file count. PIR-SI capacity is — Sun-Jafar's formula with replaced by . The improvement is monotone in , with the largest gains as . Public and private side info achieve the same capacity for uniformly-random caches; non-uniform private caches may incur a (small, open) rate cost.
Quick Check
For PIR with databases, files, and side-info files, the capacity is:
Same as classical PIR with
Same as classical PIR with
regardless of
Cannot be computed from the given info
.