Exercises
ex-ch13-01
EasyState the classical PIR threat model and give the privacy guarantee in mutual-information terms.
Threat model
honest-but-curious databases (each follows the protocol but analyzes its received query), non-colluding (no inter-database information sharing). User wants file with uniform.
Privacy guarantee
For each database : . The query distribution at any single database is independent of the desired index β perfect privacy in the information-theoretic sense.
ex-ch13-02
EasyCompute the Sun-Jafar PIR capacity for the following pairs: , , .
.
$(2, 3)$
.
$(5, 5)$
.
$(10, 10)$
.
Pattern
Capacity grows toward as grows, approaches as grows.
ex-ch13-03
EasyWhy does the trivial "download everything" PIR achieve rate ? What's the privacy guarantee?
Rate
File size , total download (all files). Rate .
Privacy
Database knows the user wants some file, but doesn't know which β query is identical regardless of . Privacy holds (information-theoretically).
Why suboptimal
Sun-Jafar achieves for any . The trivial scheme wastes bandwidth by downloading unwanted files.
ex-ch13-04
EasyWhy must classical PIR have databases? What goes wrong with ?
$N = 1$ trivial PIR
With one database, the user must download from it. The query for necessarily distinguishes from other files (else the database can't return the right answer). Privacy is impossible with information-theoretic guarantees.
Workaround
PIR is feasible computationally (Kushilevitz-Ostrovsky 1997) using homomorphic encryption β but with weaker guarantees and higher per-query cost. Information-theoretic PIR strictly requires non-colluding databases.
ex-ch13-05
MediumSketch the Sun-Jafar PIR scheme for , , file length chunks each. Specify the queries, answers, and rate.
Setup
, . User wants ().
User generates random labels
Random permutation of indices ensures privacy. The Sun-Jafar paper gives the precise symmetrization, here we use a representative scheme.
Query DB1
Ask for: β 4 chunks.
Query DB2
Ask for: β 4 chunks.
Decoding
From DB1: . From DB2: , , , . XOR with : gets . Has all of . β
Rate
Total download: 8 chunks. File size: 4 chunks. Rate . The capacity-optimal rate is , so this scheme is sub-optimal β illustrating that the Sun-Jafar capacity-achieving scheme is more involved than a naive symmetric XOR.
ex-ch13-06
MediumShow that the Sun-Jafar capacity formula can be rewritten as . Verify for .
Use the geometric series sum formula.
Geometric series
.
Inverting
.
Verify $N = 3, K = 4$
. Direct: . β
ex-ch13-07
MediumDerive the asymptotic limits of as (a) with fixed , and (b) with fixed .
Use the closed form .
(a) $N \to \infty$
and . . With infinite databases, no privacy overhead.
(b) $K \to \infty$
stays bounded above 0; . . With infinite library, the asymptote depends on .
Operational
Increase for unbounded gain; increase for diminishing returns capped at .
ex-ch13-08
MediumShow that no PIR scheme can achieve rate β even ignoring privacy. What does this mean operationally?
Upper bound
. The download must contain at least bits (the file's information). Hence and .
Sun-Jafar approaches 1
as . So PIR can be almost free of overhead with many databases β but never strictly free.
Operational
Any PIR scheme has some overhead (the privacy "tax"). The capacity formula quantifies the minimum tax: always (the asymptote).
ex-ch13-09
MediumExplain why classical PIR cannot use a single database with information-theoretic privacy.
Single-database limitation
With one database, the query must let the database compute as a function. If the query is independent of (privacy requirement), the database can't tell which file to send β the answer must contain all files (rate ).
Computational alternative
Kushilevitz-Ostrovsky 1997: single-database PIR with computational privacy uses homomorphic encryption. The database learns no information about under cryptographic assumptions (e.g., DDH) but processes encrypted queries β much more expensive per query.
Trade-off
Information-theoretic: requires , no computational assumptions. Computational: works with , relies on cryptographic hardness.
ex-ch13-10
MediumCompare PIR (this chapter) with secure aggregation (Chapter 10) in terms of what is hidden and who the adversary is.
What is hidden
SecAgg: individual gradient values (server can compute the sum but not the components). PIR: identity of accessed file (database can serve files but not learn which one was requested).
Adversary
SecAgg: the server (and possibly colluding users). PIR: each individual database (or colluding databases in the -colluding extension).
Communication structure
SecAgg: users contribute to one server-side aggregate. Direction: users β server. PIR: one user retrieves from databases. Direction: user β databases.
Algebraic kinship
Both rely on finite-field IA (Chapter 4). Both have cut-set converses (Chapter 2 Β§2.4 recipe). Different applications of the same machinery.
ex-ch13-11
HardSketch the Sun-Jafar capacity converse: prove for any classical PIR scheme.
Use induction on .
Setup
Let be the supremum of achievable rates. We prove by induction on .
Base case $K = 1$
With one file, no privacy overhead is needed β user just downloads from any database. . Matches the formula: .
Inductive step $K \to K + 1$
Given a scheme for files with rate . Conditioning on knowing one file (say ) and applying the privacy constraint to the remaining files, the residual download rate satisfies a Sun-Jafar-style bound.
Combine
The recursion gives . Plugging in the inductive hypothesis: , matching the formula.
Notes
The full proof is in Sun-Jafar 2017 Β§IV. It involves careful tracking of conditional entropies and the "shaping" of query distributions under the privacy constraint.
ex-ch13-12
HardWhy does the Sun-Jafar scheme require file length ? Why not smaller?
Block-length necessity
The Sun-Jafar scheme uses "levels" of query structure. At level , each database returns symbols, and the user combines across databases for total. The file must be split into chunks small enough that this counting works out.
Counting
Over all levels, the file is split into chunks. This is the smallest block length for capacity achievement.
Sub-optimal smaller schemes
Schemes with achieve sub-capacity rate. The gap closes as increases (rate-distortion-like behavior).
Practical workaround
For small files, aggregate files into one "virtual file" and run PIR on the virtual file. This trades latency for capacity-achievement.
ex-ch13-13
HardCompose Sun-Jafar PIR with -colluding extension (Chapter 14 Β§14.2). Derive the capacity formula informally and identify when the protocol becomes infeasible.
Heuristic
With colluders, the privacy constraint is harder. Each unit of additional collusion "consumes" one unit of the privacy structure. Heuristically, the capacity becomes .
Verification
Sun-Jafar 2018 (cited reference) confirms this is the exact capacity. At : recovers Sun-Jafar 2017.
Infeasibility
: β equivalent to downloading everything (full collusion defeats PIR). : similarly.
Practical: for meaningful rate. Production PIR with provides robust protection at acceptable rate.
ex-ch13-14
HardDiscuss why PIR's privacy guarantee is non-adaptive: the user's queries are designed before observing answers. Could an adaptive variant improve the rate?
Non-adaptive nature
In the standard formulation, the user's queries depend only on and private randomness β not on database responses. Adaptive variants would let later queries depend on earlier responses.
Adaptive doesn't help (mostly)
Sun-Jafar's converse is information- theoretic β adaptivity cannot bypass it. The capacity remains for adaptive PIR as well.
Where adaptivity matters
Adaptive PIR can improve latency (process queries in stages) and robustness (handle partial database failures gracefully). These are engineering properties, not information-theoretic capacity.
Side information variants
Cache-aided PIR (Chapter 15) is effectively a form of adaptivity: the user has prior cached information and adapts queries accordingly. Capacity does improve with side information β the cache is a form of "adaptive" knowledge.
ex-ch13-15
ChallengeOpen problem. The Sun-Jafar PIR scheme uses file length . Can capacity-achieving PIR be constructed for smaller block lengths? Sketch what tradeoffs arise.
Block length minimum
Sun-Jafar requires . Below this, the scheme is sub-capacity. A cleaner construction at smaller block length would be useful in deployment.
Approximate-capacity at smaller L
Approximate-capacity schemes at block length exist with rate close to (but not matching) . The gap shrinks as grows.
Optimal block-length-rate tradeoff
The full characterization of the rate achievable at finite is open. Recent work (Banawan et al. 2018+) provides partial results but the optimal scheme at sub-Sun-Jafar block length remains an active research area.
Operational implication
For deployment, is usually manageable (with small enough). For very large libraries, block-length reduction would be a real engineering win. Worth tracking the literature.