IA for Private Information Retrieval — A Preview
Why PIR Is an IA Problem
Private Information Retrieval (PIR) is the topic of Chapter 13: a user wants one file from a library replicated across databases, without revealing which file to any single database. The trick that makes PIR efficient — beating the trivial rate of downloading the whole library — is again finite-field interference alignment. Each query to a database is a linear combination of the user's query over all files; the user cancels out the "interference" (undesired files) using the algebraic structure of the queries across databases. The point is that the privacy requirement and the efficiency requirement are in tension: stronger privacy (resistance against more colluding databases) lowers the achievable rate. That tension is exactly the golden thread of this book.
This section previews the PIR capacity result (Sun–Jafar 2017) and shows where the alignment occurs. Chapter 13 will prove the capacity formula rigorously and extend to coded storage, - colluding databases, and symmetric PIR. Reading this preview is enough to understand why Chapter 4 — a "tool" chapter — is placed in Part I: the tool is used in at least three later chapters.
Definition: Classical PIR Problem
Classical PIR Problem
A classical PIR system has:
- A library of files , each replicated on databases.
- A user who wants to retrieve a single file with chosen uniformly.
- A privacy requirement: the query sent to database must be statistically independent of , .
- A download budget: database returns as a deterministic function of .
The PIR rate is the fraction of useful bits in the total download: . The PIR capacity is the supremum of achievable rates.
Private Information Retrieval (PIR)
A protocol that lets a user retrieve file from databases without revealing to any single database. The PIR rate is bits-of-file per bit-downloaded; the classical PIR capacity is (Sun-Jafar 2017).
Theorem: Classical PIR Capacity (Sun–Jafar)
For replicated databases and files, the classical PIR capacity is The capacity is matched by an explicit finite-field IA achievability scheme over for .
The formula counts "coded interference": each database returns sums of sub-files scaled so that, when all answers are combined, the undesired file components align into a common -dimensional subspace and can be cancelled. The factor at depth measures how much of the -th "interference layer" survives the alignment; the geometric sum is the total surviving overhead.
Operationally, the capacity approaches as (more databases means less structural overhead) and drops towards as (a single database cannot hide the query). Chapter 13 develops the full converse proof.
Achievability sketch
Split each file into equal-size chunks, indexed by . The user generates a random permutation of and sends to database a query that is a linear combination of chunks, with coefficients chosen so that (i) the interference from undesired files aligns into a common subspace, and (ii) each database's view is independent of . The user decodes by projecting along the alignment subspace.
Converse (cut-set-style)
For any scheme, a careful entropy-inequality argument bounds the total download: after conditioning on the desired file's partial reveal, the remaining download is at least of the file size on average — yielding the geometric-series sum. The details are in Sun–Jafar 2017 Thm. 1; Chapter 13 gives a full presentation.
Match
Achievability equals converse at every point, establishing the capacity exactly.
Example: PIR Capacity for ,
Compute the PIR capacity for 2 databases and 2 files, and describe an explicit scheme achieving the rate.
Capacity
.
Scheme
Split each file into two chunks: . Suppose the user wants .
- Query to DB 1: random chunk of plus random chunk of , e.g., .
- Query to DB 2: different chunk of XOR the same chunk of , e.g., .
Databases return the requested sums without knowing (the queries look uniform). The user recovers (directly), (directly), and (directly). Total download: chunks, file size chunks, rate . ✓
Generality
The same construction generalizes to arbitrary — each "level" of the alignment cancels a fraction of the interference, and the total cost is the geometric sum.
PIR Capacity vs. Databases (Preview)
Plot the classical PIR capacity as a function of for several library sizes . As , (no privacy overhead). As grows, the capacity decreases — there is more "interference" to hide. Chapter 13 develops coded-storage PIR (-colluding, symmetric) and explores further tradeoffs.
Parameters
Range of databases to plot
Number of files (library size)
Why This Matters: The Full Treatment Is Chapter 13
This preview has two goals: to justify Chapter 4's position in Part I (a tool chapter feeding multiple later parts) and to plant the vocabulary needed for Part IV. Chapter 13 begins with the Sun-Jafar scheme in detail, then treats -colluding databases ( databases may collude without learning the query), coded-storage PIR (files stored via MDS codes rather than replicated), and symmetric PIR (the user learns only the requested file). Each extension refines the IA alignment argument of this section.
Key Takeaway
Finite-field IA is the common thread. Coded matrix multiplication (§4.2), coded caching delivery (§4.3), and classical PIR (§4.4) are all IA constructions — the same algebraic machinery, specialized to different communication objectives. The tradeoff in each case is between the parameter whose privacy / redundancy we buy (storage, users, colluders) and the parameter whose cost we pay (communication rate, download, delivery rate). Knowing Section 4.1's alignment recipe is enough to understand why the numerical capacity formulas of Chapters 5, 7, and 13 come out the way they do.
Common Mistake: PIR Is NOT "One-Time Pad Over All Files"
Mistake:
Claim that private retrieval can be achieved trivially by downloading the entire library — "no single database learns the query because every query looks the same".
Correction:
Downloading the entire library does hide the query, but at rate (one file's worth of useful data per files of download). The Sun-Jafar capacity is which approaches as — strictly better than for . The non-trivial content of PIR is precisely that the rate can be beaten via finite-field IA.
PIR in Modern Cloud-Storage Systems
Classical information-theoretic PIR has seen limited deployment in production clouds — the required non-colluding databases is a strong trust assumption. Computationally-secure PIR (Kushilevitz-Ostrovsky 1997 and successors) is the variant used in some niche systems (ProtonMail message retrieval, certain academic medical-record stores). The information- theoretic PIR of Part IV of this book remains the benchmark — it represents what is achievable without computational assumptions and so establishes how much the computationally- secure variants give up.
Encrypted query systems (CryptDB, SealPIR) offer a middle ground but with their own tradeoffs in latency and database- side compute.
- •
Classical IT-PIR requires non-colluding databases — often unrealistic
- •
Computationally-secure PIR (SealPIR, XPIR) adds latency, weaker guarantees
- •
Multi-server PIR in production systems remains a research area
Quick Check
For databases and files, the classical PIR capacity is:
for files; plug in to get .