IA for Private Information Retrieval — A Preview

Why PIR Is an IA Problem

Private Information Retrieval (PIR) is the topic of Chapter 13: a user wants one file from a library replicated across NN databases, without revealing which file to any single database. The trick that makes PIR efficient — beating the trivial rate of downloading the whole library — is again finite-field interference alignment. Each query to a database is a linear combination of the user's query over all FF files; the user cancels out the "interference" (undesired files) using the algebraic structure of the queries across databases. The point is that the privacy requirement and the efficiency requirement are in tension: stronger privacy (resistance against more colluding databases) lowers the achievable rate. That tension is exactly the golden thread of this book.

This section previews the PIR capacity result (Sun–Jafar 2017) and shows where the alignment occurs. Chapter 13 will prove the capacity formula rigorously and extend to coded storage, TT- colluding databases, and symmetric PIR. Reading this preview is enough to understand why Chapter 4 — a "tool" chapter — is placed in Part I: the tool is used in at least three later chapters.

Definition:

Classical PIR Problem

A classical PIR system has:

  • A library of FF files W1,,WFW_1, \ldots, W_F, each replicated on NN databases.
  • A user who wants to retrieve a single file WθW_{\theta} with θ[F]\theta \in [F] chosen uniformly.
  • A privacy requirement: the query Q(k)Q^{(k)} sent to database kk must be statistically independent of θ\theta, I(θ;Q(k))=0I(\theta; Q^{(k)}) = 0.
  • A download budget: database kk returns A(k)A^{(k)} as a deterministic function of (Q(k),W1,,WF)(Q^{(k)}, W_1, \ldots, W_F).

The PIR rate is the fraction of useful bits in the total download: RPIR=W/kA(k)R_{\text{PIR}} = |W| / \sum_k |A^{(k)}|. The PIR capacity CPIR(N,F)C_{\text{PIR}}(N, F) is the supremum of achievable rates.

Private Information Retrieval (PIR)

A protocol that lets a user retrieve file WθW_\theta from NN databases without revealing θ\theta to any single database. The PIR rate is bits-of-file per bit-downloaded; the classical PIR capacity is CPIR=(1+1/N++1/NF1)1C_{\text{PIR}} = (1 + 1/N + \cdots + 1/N^{F-1})^{-1} (Sun-Jafar 2017).

Theorem: Classical PIR Capacity (Sun–Jafar)

For N2N \geq 2 replicated databases and F2F \geq 2 files, the classical PIR capacity is CPIR(N,F)  =  (1+1N+1N2++1NF1)1  =  11/N11/NF.C_{\text{PIR}}(N, F) \;=\; \left(1 + \frac{1}{N} + \frac{1}{N^2} + \cdots + \frac{1}{N^{F-1}}\right)^{-1} \;=\; \frac{1 - 1/N}{1 - 1/N^F}. The capacity is matched by an explicit finite-field IA achievability scheme over Fq\mathbb{F}_q for qNq \geq N.

The formula counts "coded interference": each database returns sums of sub-files scaled so that, when all NN answers are combined, the undesired file components align into a common (F1)/N(F - 1)/N-dimensional subspace and can be cancelled. The factor 1/Ni1/N^i at depth ii measures how much of the ii-th "interference layer" survives the alignment; the geometric sum is the total surviving overhead.

Operationally, the capacity approaches 11 as NN \to \infty (more databases means less structural overhead) and drops towards 1/F1/F as N1N \to 1 (a single database cannot hide the query). Chapter 13 develops the full converse proof.

Example: PIR Capacity for N=2N = 2, F=2F = 2

Compute the PIR capacity for 2 databases and 2 files, and describe an explicit scheme achieving the rate.

PIR Capacity vs. Databases (Preview)

Plot the classical PIR capacity CPIR(N,F)C_{\text{PIR}}(N, F) as a function of NN for several library sizes FF. As NN \to \infty, CPIR1C_{\text{PIR}} \to 1 (no privacy overhead). As FF grows, the capacity decreases — there is more "interference" to hide. Chapter 13 develops coded-storage PIR (TT-colluding, symmetric) and explores further tradeoffs.

Parameters
15

Range of databases to plot

5

Number of files (library size)

Why This Matters: The Full Treatment Is Chapter 13

This preview has two goals: to justify Chapter 4's position in Part I (a tool chapter feeding multiple later parts) and to plant the vocabulary needed for Part IV. Chapter 13 begins with the Sun-Jafar scheme in detail, then treats TT-colluding databases (T\leq T databases may collude without learning the query), coded-storage PIR (files stored via MDS codes rather than replicated), and symmetric PIR (the user learns only the requested file). Each extension refines the IA alignment argument of this section.

Key Takeaway

Finite-field IA is the common thread. Coded matrix multiplication (§4.2), coded caching delivery (§4.3), and classical PIR (§4.4) are all IA constructions — the same algebraic machinery, specialized to different communication objectives. The tradeoff in each case is between the parameter whose privacy / redundancy we buy (storage, users, colluders) and the parameter whose cost we pay (communication rate, download, delivery rate). Knowing Section 4.1's alignment recipe is enough to understand why the numerical capacity formulas of Chapters 5, 7, and 13 come out the way they do.

Common Mistake: PIR Is NOT "One-Time Pad Over All Files"

Mistake:

Claim that private retrieval can be achieved trivially by downloading the entire library — "no single database learns the query because every query looks the same".

Correction:

Downloading the entire library does hide the query, but at rate 1/F1/F (one file's worth of useful data per FF files of download). The Sun-Jafar capacity is (11/N)/(11/NF)(1 - 1/N) / (1 - 1/N^F) which approaches 11 as NN \to \inftystrictly better than 1/F1/F for N2N \geq 2. The non-trivial content of PIR is precisely that the rate can be beaten via finite-field IA.

🔧Engineering Note

PIR in Modern Cloud-Storage Systems

Classical information-theoretic PIR has seen limited deployment in production clouds — the required N2N \geq 2 non-colluding databases is a strong trust assumption. Computationally-secure PIR (Kushilevitz-Ostrovsky 1997 and successors) is the variant used in some niche systems (ProtonMail message retrieval, certain academic medical-record stores). The information- theoretic PIR of Part IV of this book remains the benchmark — it represents what is achievable without computational assumptions and so establishes how much the computationally- secure variants give up.

Encrypted query systems (CryptDB, SealPIR) offer a middle ground but with their own tradeoffs in latency and database- side compute.

Practical Constraints
  • Classical IT-PIR requires non-colluding databases — often unrealistic

  • Computationally-secure PIR (SealPIR, XPIR) adds latency, weaker guarantees

  • Multi-server PIR in production systems remains a research area

📋 Ref: Microsoft SealPIR 2018; Beimel survey 2007

Quick Check

For N=4N = 4 databases and F=3F = 3 files, the classical PIR capacity is:

(1+1/4+1/16)10.762(1 + 1/4 + 1/16)^{-1} \approx 0.762

1/F=1/31/F = 1/3

11/N=0.751 - 1/N = 0.75

1/N=0.251/N = 0.25