PIR Capacity and the Matching Converse
Why We Need a Converse
Section 13.2 established the Sun-Jafar achievability: an explicit scheme reaching rate . The natural question: is this rate the best possible? Could a cleverer scheme achieve a higher rate?
A rigorous answer requires a converse: a proof that no scheme can do better. Without it, the achievability is just an upper bound β one might always hope for an improvement. The converse closes the rate region: no scheme, no matter how clever, can exceed the Sun-Jafar rate.
This section develops the converse. The technique is the cut-set / data-processing inequality machinery from Chapter 2's recipe, specialized to the PIR problem. The point is that the same template β "cut, entropy bound, symmetrize, normalize" β applies here as it did to coded matrix multiplication and secure aggregation.
Theorem: SunβJafar PIR Capacity
For any databases and files, each replicated across all databases, the PIR capacity is exactly This is achieved by the Sun-Jafar scheme (Β§13.2) and matched by the cut-set converse (this section). Hence is information-theoretically tight.
Asymptotic behavior:
- as (more databases β no privacy overhead).
- as (more files β fixed asymptote depending on ).
- : β converging to from above.
The capacity formula has a beautiful structure: it is the reciprocal of a geometric sum. Each term corresponds to one "level" of interference cancellation in the Sun-Jafar achievability β the -th level requires overhead per useful bit. Summing the overheads and inverting gives the rate.
Operationally: the PIR cost is bounded by in the worst case (large ), and approaches zero as grows. For practical deployments (), the capacity is between and β much better than the trivial .
Achievability
Sun-Jafar scheme of Β§13.2. Achieves .
Converse β setup
Use the cut-set / chain-rule argument. Consider the user's view β the union of all answers β and bound the entropy from below.
Converse β entropy bound
For correctness: , hence .
For privacy: (i.e., the answer's distribution is independent of ). Using the chain rule on and carefully tracking which contributions are from desired vs. interferer files, one obtains a lower bound on aggregate .
Converse β divide and conclude
Total download . Hence . This matches the Sun-Jafar achievability β closing the rate region.
Note on technical details
The full converse proof is in Sun-Jafar 2017 Section IV. It uses an inductive argument on the file count : assuming the capacity bound for files, derive the bound for files. The induction step uses the privacy constraint and the chain rule.
Key Takeaway
PIR capacity is exactly . Achievability via finite-field IA; converse via cut-set / chain-rule. The formula has a clean structure: more databases increase the rate; more files decrease it. The asymptote is as and as . This is the central result of Part IV.
PIR Capacity
Explore the PIR capacity as a function of (databases) for several (file counts). The capacity surface shows: (i) capacity is monotone increasing in approaching , (ii) capacity is monotone decreasing in approaching , (iii) the exact geometric-series structure of the formula. Adjust to see how library size affects the achievable rate.
Parameters
Example: PIR Capacity Across Combinations
Compute for several representative pairs and interpret operationally.
Small $N$, small $K$
. . .
Moderate $N$, large $K$
. With many databases, even large libraries are PIR-efficient.
Large $N$, any $K$
for all practical . PIR overhead becomes negligible.
Operational guidance
For PIR to be efficient, prioritize more databases over fewer files. Doubling roughly halves the privacy overhead. Doubling has a smaller effect on rate (asymptotically zero).
Theorem: PIR Converse via Cut-Set
Any PIR scheme with replicated databases and files satisfies The bound is tight: Sun-Jafar achieves it.
The converse follows the four-step Chapter 2 template:
- Cut: separate the user from the databases' information.
- Entropy bound: the user's downloaded data must contain bits about ; privacy adds extra overhead.
- Symmetrize: averaging over the random choice of removes any file-dependent asymmetry.
- Normalize: divide by to convert from absolute bits to rate.
The privacy constraint forces each database's answer to be independent of , which forces the user's downloaded information to "spread" across files in a structured way β exactly the geometric-series overhead.
Step 1 β Cut and entropy
For any PIR scheme: (correctness). The user's view must contain in full.
Step 2 β Per-database privacy
Privacy: for each . Hence each database's answer has the same marginal distribution regardless of β its size cannot depend on .
Step 3 β Sun-Jafar's induction
Inductively bound the per-database answer size. Base case : trivial (the single file is fully downloaded). Induction step : the privacy constraint forces the answer for files to contain at least of the -th file's information, on top of the previous -file overhead.
Step 4 β Sum the geometric series
Summing the per-level overheads: . Hence . Achievability matches; capacity established.
PIR Capacity in Context
| Privacy primitive | Capacity / overhead | Asymptote |
|---|---|---|
| Sun-Jafar PIR (this chapter) | as | |
| Coded shuffling (Ch. 2 Β§2.3) | as | |
| Bonawitz secure agg. (Ch. 10) | per round | Quadratic |
| CCESA (Ch. 12) | Sub-quadratic |
The Same Cut-Set Recipe
The PIR converse follows the same four-step template as Chapter 2's coded-shuffling converse and Chapter 10's Caire et al. optimality:
- Identify the cut (databases vs. user).
- Apply an entropy bound (correctness + privacy).
- Symmetrize (over ).
- Normalize (divide by total download).
The specific entropy quantities differ β coded shuffling uses output-conditional entropy; PIR uses query-independent entropy β but the structure is identical. Mastering the recipe once, in Chapter 2, lets the reader follow later converses much faster.
Common Mistake: PIR Capacity Approaches , Not
Mistake:
Read the formula and conclude that PIR capacity approaches 1 for all parameter regimes.
Correction:
The PIR capacity approaches only as (with fixed). For fixed and , the capacity approaches β not 1. The structure of the geometric sum: as . Inverting: .
Operational implication: with databases, even infinite-library PIR has overhead. To beat overhead, need databases.
Designing PIR Systems with the Capacity Formula
System designers can use the capacity formula as a deployment guide:
-
Provisioning databases: target to keep PIR overhead under . For typical files, this requires databases.
-
Library size sensitivity: doubling affects less than doubling . Prefer scaling out the database count over scaling down the library partitioning.
-
Latency vs. rate: more databases mean more parallel queries (higher throughput) but also more network round-trips (higher latency). Sun-Jafar's per-database symmetric design helps amortize this cost.
The capacity formula is a clean target β production systems typically achieve β of capacity in practice, with the gap due to constant-factor overheads and finite block length.
- β’
Target for rate at
- β’
Library size sensitivity: weak for large
- β’
Production rate: 90β95% of capacity typical
Historical Note: Sun-Jafar 2017: A Field-Defining Result
2017βpresentHua Sun and Syed Jafar's 2017 paper "The Capacity of Private Information Retrieval" settled a 22-year-old open problem in information-theoretic cryptography. Before Sun-Jafar, the best known PIR rates were sub-capacity by a constant factor; the optimality question was open.
The paper's clean two-part structure (achievability via finite-field IA + converse via inductive cut-set bound) became the template for the entire post-2017 PIR literature. Coded-storage PIR (Chapter 14) followed within a year; cache-aided PIR (Chapter 15) within two; symmetric PIR, multi-message PIR, and secure PIR all built on the same algebraic foundation.
The result earned Sun and Jafar the 2018 IEEE Information Theory Society Paper Award. The capacity formula is now standard in every modern treatment of PIR.
Quick Check
As the number of files with fixed, the PIR capacity approaches:
(no privacy overhead)
(depends on )
(PIR becomes infeasible)
(linear decay)
Correct. The geometric sum as . Inverting: .