PIR Capacity and the Matching Converse

Why We Need a Converse

Section 13.2 established the Sun-Jafar achievability: an explicit scheme reaching rate R=(1+1/N+β‹―+1/NKβˆ’1)βˆ’1R = (1 + 1/N + \cdots + 1/N^{K-1})^{-1}. The natural question: is this rate the best possible? Could a cleverer scheme achieve a higher rate?

A rigorous answer requires a converse: a proof that no scheme can do better. Without it, the achievability is just an upper bound β€” one might always hope for an improvement. The converse closes the rate region: no scheme, no matter how clever, can exceed the Sun-Jafar rate.

This section develops the converse. The technique is the cut-set / data-processing inequality machinery from Chapter 2's recipe, specialized to the PIR problem. The point is that the same template β€” "cut, entropy bound, symmetrize, normalize" β€” applies here as it did to coded matrix multiplication and secure aggregation.

Theorem: Sun–Jafar PIR Capacity

For any Nβ‰₯2N \geq 2 databases and Kβ‰₯1K \geq 1 files, each replicated across all databases, the PIR capacity is exactly CPIR(N,K)β€…β€Š=β€…β€Š1βˆ’1/N1βˆ’1/NKβ€…β€Š=β€…β€Š(1+1N+1N2+β‹―+1NKβˆ’1)βˆ’1.C_{\text{PIR}}(N, K) \;=\; \frac{1 - 1/N}{1 - 1/N^K} \;=\; \left(1 + \frac{1}{N} + \frac{1}{N^2} + \cdots + \frac{1}{N^{K-1}}\right)^{-1}. This is achieved by the Sun-Jafar scheme (Β§13.2) and matched by the cut-set converse (this section). Hence CPIRC_{\text{PIR}} is information-theoretically tight.

Asymptotic behavior:

  • CPIR(N,K)β†’1C_{\text{PIR}}(N, K) \to 1 as Nβ†’βˆžN \to \infty (more databases β†’ no privacy overhead).
  • CPIR(N,K)β†’1βˆ’1/NC_{\text{PIR}}(N, K) \to 1 - 1/N as Kβ†’βˆžK \to \infty (more files β†’ fixed asymptote depending on NN).
  • CPIR(2,K)C_{\text{PIR}}(2, K): 1,2/3,4/7,8/15,…1, 2/3, 4/7, 8/15, \ldots β€” converging to 1/21/2 from above.

The capacity formula has a beautiful structure: it is the reciprocal of a geometric sum. Each term 1/Ni1/N^i corresponds to one "level" of interference cancellation in the Sun-Jafar achievability β€” the ii-th level requires 1/Ni1/N^i overhead per useful bit. Summing the overheads and inverting gives the rate.

Operationally: the PIR cost is bounded by 1βˆ’1/N1 - 1/N in the worst case (large KK), and approaches zero as NN grows. For practical deployments (N=4,K∈[10,100]N = 4, K \in [10, 100]), the capacity is between 0.70.7 and 0.750.75 β€” much better than the trivial 1/K1/K.

Key Takeaway

PIR capacity is exactly (1+1/N+β‹―+1/NKβˆ’1)βˆ’1(1 + 1/N + \cdots + 1/N^{K-1})^{-1}. Achievability via finite-field IA; converse via cut-set / chain-rule. The formula has a clean structure: more databases increase the rate; more files decrease it. The asymptote is 1βˆ’1/N1 - 1/N as Kβ†’βˆžK \to \infty and 11 as Nβ†’βˆžN \to \infty. This is the central result of Part IV.

PIR Capacity CPIR(N,K)C_{\text{PIR}}(N, K)

Explore the PIR capacity as a function of NN (databases) for several KK (file counts). The capacity surface shows: (i) capacity is monotone increasing in NN approaching 11, (ii) capacity is monotone decreasing in KK approaching 1βˆ’1/N1 - 1/N, (iii) the exact geometric-series structure of the formula. Adjust KK to see how library size affects the achievable rate.

Parameters
5
15

Example: PIR Capacity Across (N,K)(N, K) Combinations

Compute CPIR(N,K)C_{\text{PIR}}(N, K) for several representative (N,K)(N, K) pairs and interpret operationally.

Theorem: PIR Converse via Cut-Set

Any PIR scheme with NN replicated databases and KK files satisfies Rβ€…β€Šβ‰€β€…β€Š(1+1N+β‹―+1NKβˆ’1)βˆ’1.R \;\leq\; \left(1 + \frac{1}{N} + \cdots + \frac{1}{N^{K-1}}\right)^{-1}. The bound is tight: Sun-Jafar achieves it.

The converse follows the four-step Chapter 2 template:

  1. Cut: separate the user from the databases' information.
  2. Entropy bound: the user's downloaded data must contain LL bits about WΞΈW_\theta; privacy adds extra overhead.
  3. Symmetrize: averaging over the random choice of ΞΈ\theta removes any file-dependent asymmetry.
  4. Normalize: divide by DD to convert from absolute bits to rate.

The privacy constraint forces each database's answer to be independent of ΞΈ\theta, which forces the user's downloaded information to "spread" across files in a structured way β€” exactly the geometric-series overhead.

PIR Capacity in Context

Privacy primitiveCapacity / overheadAsymptote
Sun-Jafar PIR (this chapter)(1+1/N+β‹―+1/NKβˆ’1)βˆ’1(1 + 1/N + \cdots + 1/N^{K-1})^{-1}β†’1\to 1 as Nβ†’βˆžN \to \infty
Coded shuffling (Ch. 2 Β§2.3)N(1βˆ’ΞΌ)/(1+NΞΌ)N(1 - \mu)/(1 + N\mu)β†’0\to 0 as ΞΌβ†’1\mu \to 1
Bonawitz secure agg. (Ch. 10)O(n2)O(n^2) per roundQuadratic
CCESA (Ch. 12)O(nn/log⁑n)O(n\sqrt{n/\log n})Sub-quadratic

The Same Cut-Set Recipe

The PIR converse follows the same four-step template as Chapter 2's coded-shuffling converse and Chapter 10's Caire et al. optimality:

  1. Identify the cut (databases vs. user).
  2. Apply an entropy bound (correctness + privacy).
  3. Symmetrize (over ΞΈ\theta).
  4. Normalize (divide by total download).

The specific entropy quantities differ β€” coded shuffling uses output-conditional entropy; PIR uses query-independent entropy β€” but the structure is identical. Mastering the recipe once, in Chapter 2, lets the reader follow later converses much faster.

Common Mistake: PIR Capacity Approaches 1βˆ’1/N1 - 1/N, Not 11

Mistake:

Read the formula and conclude that PIR capacity approaches 1 for all parameter regimes.

Correction:

The PIR capacity approaches 11 only as Nβ†’βˆžN \to \infty (with KK fixed). For fixed NN and Kβ†’βˆžK \to \infty, the capacity approaches 1βˆ’1/N1 - 1/N β€” not 1. The structure of the geometric sum: βˆ‘i=0Kβˆ’1(1/N)iβ†’1/(1βˆ’1/N)=N/(Nβˆ’1)\sum_{i=0}^{K-1} (1/N)^i \to 1 / (1 - 1/N) = N/(N-1) as Kβ†’βˆžK \to \infty. Inverting: CPIRβ†’(Nβˆ’1)/N=1βˆ’1/NC_{\text{PIR}} \to (N-1)/N = 1 - 1/N.

Operational implication: with N=2N = 2 databases, even infinite-library PIR has β‰₯50%\geq 50\% overhead. To beat 50%50\% overhead, need Nβ‰₯3N \geq 3 databases.

⚠️Engineering Note

Designing PIR Systems with the Capacity Formula

System designers can use the capacity formula as a deployment guide:

  • Provisioning databases: target CPIRβ‰₯0.9C_{\text{PIR}} \geq 0.9 to keep PIR overhead under 10%10\%. For typical K∼100K \sim 100 files, this requires Nβ‰₯10N \geq 10 databases.

  • Library size sensitivity: doubling KK affects CPIRC_{\text{PIR}} less than doubling NN. Prefer scaling out the database count over scaling down the library partitioning.

  • Latency vs. rate: more databases mean more parallel queries (higher throughput) but also more network round-trips (higher latency). Sun-Jafar's per-database symmetric design helps amortize this cost.

The capacity formula is a clean target β€” production systems typically achieve 9090–95%95\% of capacity in practice, with the gap due to constant-factor overheads and finite block length.

Practical Constraints
  • β€’

    Target Nβ‰₯10N \geq 10 for β‰₯90%\geq 90\% rate at K∼100K \sim 100

  • β€’

    Library size sensitivity: weak for large NN

  • β€’

    Production rate: 90–95% of capacity typical

πŸ“‹ Ref: Sun-Jafar 2017 Β§V; Microsoft SealPIR design notes

Historical Note: Sun-Jafar 2017: A Field-Defining Result

2017–present

Hua Sun and Syed Jafar's 2017 paper "The Capacity of Private Information Retrieval" settled a 22-year-old open problem in information-theoretic cryptography. Before Sun-Jafar, the best known PIR rates were sub-capacity by a constant factor; the optimality question was open.

The paper's clean two-part structure (achievability via finite-field IA + converse via inductive cut-set bound) became the template for the entire post-2017 PIR literature. Coded-storage PIR (Chapter 14) followed within a year; cache-aided PIR (Chapter 15) within two; symmetric PIR, multi-message PIR, and secure PIR all built on the same algebraic foundation.

The result earned Sun and Jafar the 2018 IEEE Information Theory Society Paper Award. The capacity formula CPIR=(1+1/N+β‹―+1/NKβˆ’1)βˆ’1C_{\text{PIR}} = (1 + 1/N + \cdots + 1/N^{K-1})^{-1} is now standard in every modern treatment of PIR.

Quick Check

As the number of files Kβ†’βˆžK \to \infty with NN fixed, the PIR capacity CPIR(N,K)C_{\text{PIR}}(N, K) approaches:

11 (no privacy overhead)

1βˆ’1/N1 - 1/N (depends on NN)

00 (PIR becomes infeasible)

1/K1/K (linear decay)