Ferkans — Interactive Telecom Tutor

Why We Need a Converse

Section 13.2 established the Sun-Jafar achievability: an explicit scheme reaching rate $R = (1 + 1/N + \cdots + 1/N^{K-1})^{-1}$ . The natural question: is this rate the best possible? Could a cleverer scheme achieve a higher rate?

A rigorous answer requires a converse: a proof that no scheme can do better. Without it, the achievability is just an upper bound — one might always hope for an improvement. The converse closes the rate region: no scheme, no matter how clever, can exceed the Sun-Jafar rate.

This section develops the converse. The technique is the cut-set / data-processing inequality machinery from Chapter 2's recipe, specialized to the PIR problem. The point is that the same template — "cut, entropy bound, symmetrize, normalize" — applies here as it did to coded matrix multiplication and secure aggregation.

Theorem: Sun–Jafar PIR Capacity

For any $N \geq 2$ databases and $K \geq 1$ files, each replicated across all databases, the PIR capacity is exactly $C_{\text{PIR}}(N, K) \;=\; \frac{1 - 1/N}{1 - 1/N^K} \;=\; \left(1 + \frac{1}{N} + \frac{1}{N^2} + \cdots + \frac{1}{N^{K-1}}\right)^{-1}.$ This is achieved by the Sun-Jafar scheme (§13.2) and matched by the cut-set converse (this section). Hence $C_{\text{PIR}}$ is information-theoretically tight.

Asymptotic behavior:

$C_{\text{PIR}}(N, K) \to 1$ as $N \to \infty$ (more databases → no privacy overhead).
$C_{\text{PIR}}(N, K) \to 1 - 1/N$ as $K \to \infty$ (more files → fixed asymptote depending on $N$ ).
$C_{\text{PIR}}(2, K)$ : $1, 2/3, 4/7, 8/15, \ldots$ — converging to $1/2$ from above.

The capacity formula has a beautiful structure: it is the reciprocal of a geometric sum. Each term $1/N^i$ corresponds to one "level" of interference cancellation in the Sun-Jafar achievability — the $i$ -th level requires $1/N^i$ overhead per useful bit. Summing the overheads and inverting gives the rate.

Operationally: the PIR cost is bounded by $1 - 1/N$ in the worst case (large $K$ ), and approaches zero as $N$ grows. For practical deployments ( $N = 4, K \in [10, 100]$ ), the capacity is between $0.7$ and $0.75$ — much better than the trivial $1/K$ .

Proof

Achievability

Sun-Jafar scheme of §13.2. Achieves $R = 1 / \sum_{i=0}^{K-1} (1/N)^i$ .

Converse — setup

Use the cut-set / chain-rule argument. Consider the user's view — the union of all answers $\{A^{(\theta, n)}\}_n$ — and bound the entropy from below.

Converse — entropy bound

For correctness: $H(W_\theta \mid A) = 0$ , hence $H(A) \geq H(W_\theta) = L$ .

For privacy: $H(A^{(\theta, n)} \mid \theta) = H(A^{(\theta, n)})$ (i.e., the answer's distribution is independent of $\theta$ ). Using the chain rule on $A$ and carefully tracking which contributions are from desired vs. interferer files, one obtains a lower bound on aggregate $H(A) \geq L \cdot \sum_{i=0}^{K-1} N^{-i}$ .

Converse — divide and conclude

Total download $D \geq H(A) \geq L \cdot \sum_{i=0}^{K-1} (1/N)^i$ . Hence $R = L / D \leq (1 + 1/N + \cdots + 1/N^{K-1})^{-1} = C_{\text{PIR}}$ . This matches the Sun-Jafar achievability — closing the rate region. $\blacksquare$

Note on technical details

The full converse proof is in Sun-Jafar 2017 Section IV. It uses an inductive argument on the file count $K$ : assuming the capacity bound for $K - 1$ files, derive the bound for $K$ files. The induction step uses the privacy constraint and the chain rule.

Key Takeaway

PIR capacity is exactly $(1 + 1/N + \cdots + 1/N^{K-1})^{-1}$ . Achievability via finite-field IA; converse via cut-set / chain-rule. The formula has a clean structure: more databases increase the rate; more files decrease it. The asymptote is $1 - 1/N$ as $K \to \infty$ and $1$ as $N \to \infty$ . This is the central result of Part IV.

PIR Capacity $C_{\text{PIR}}(N, K)$

Explore the PIR capacity as a function of $N$ (databases) for several $K$ (file counts). The capacity surface shows: (i) capacity is monotone increasing in $N$ approaching $1$ , (ii) capacity is monotone decreasing in $K$ approaching $1 - 1/N$ , (iii) the exact geometric-series structure of the formula. Adjust $K$ to see how library size affects the achievable rate.

Parameters

K

— number of files5

N

max — databases15

Example: PIR Capacity Across $(N, K)$ Combinations

Compute $C_{\text{PIR}}(N, K)$ for several representative $(N, K)$ pairs and interpret operationally.

Solution

Small $N$, small $K$

$C_{\text{PIR}}(2, 2) = 2/3 \approx 0.667$ . $C_{\text{PIR}}(3, 2) = 3/4 = 0.75$ . $C_{\text{PIR}}(2, 5) = 16/31 \approx 0.516$ .

Moderate $N$, large $K$

$C_{\text{PIR}}(10, 100) \approx 1 - 1/10 = 0.9$ . With many databases, even large libraries are PIR-efficient.

Large $N$, any $K$

$C_{\text{PIR}}(100, K) \approx 0.99$ for all practical $K$ . PIR overhead becomes negligible.

Operational guidance

For PIR to be efficient, prioritize more databases over fewer files. Doubling $N$ roughly halves the privacy overhead. Doubling $K$ has a smaller effect on rate (asymptotically zero).

Theorem: PIR Converse via Cut-Set

Any PIR scheme with $N$ replicated databases and $K$ files satisfies $R \;\leq\; \left(1 + \frac{1}{N} + \cdots + \frac{1}{N^{K-1}}\right)^{-1}.$ The bound is tight: Sun-Jafar achieves it.

The converse follows the four-step Chapter 2 template:

Cut: separate the user from the databases' information.
Entropy bound: the user's downloaded data must contain $L$ bits about $W_\theta$ ; privacy adds extra overhead.
Symmetrize: averaging over the random choice of $\theta$ removes any file-dependent asymmetry.
Normalize: divide by $D$ to convert from absolute bits to rate.

The privacy constraint forces each database's answer to be independent of $\theta$ , which forces the user's downloaded information to "spread" across files in a structured way — exactly the geometric-series overhead.

Proof

Step 1 — Cut and entropy

For any PIR scheme: $H(\{A^{(\theta, n)}\}_n) \geq L$ (correctness). The user's view must contain $W_\theta$ in full.

Step 2 — Per-database privacy

Privacy: $I(\theta; A^{(\theta, n)}) = 0$ for each $n$ . Hence each database's answer has the same marginal distribution regardless of $\theta$ — its size cannot depend on $\theta$ .

Step 3 — Sun-Jafar's induction

Inductively bound the per-database answer size. Base case $K = 1$ : trivial (the single file is fully downloaded). Induction step $K \to K + 1$ : the privacy constraint forces the answer for $K + 1$ files to contain at least $1/N^K$ of the $(K+1)$ -th file's information, on top of the previous $K$ -file overhead.

Step 4 — Sum the geometric series

Summing the per-level overheads: $D \geq L \cdot \sum_{i=0}^{K-1} N^{-i}$ . Hence $R \leq (1 + 1/N + \cdots + 1/N^{K-1})^{-1}$ . Achievability matches; capacity established. $\blacksquare$

PIR Capacity in Context

Privacy primitive	Capacity / overhead	Asymptote
Sun-Jafar PIR (this chapter)	$(1 + 1/N + \cdots + 1/N^{K-1})^{-1}$	$\to 1$ as $N \to \infty$
Coded shuffling (Ch. 2 §2.3)	$N(1 - \mu)/(1 + N\mu)$	$\to 0$ as $\mu \to 1$
Bonawitz secure agg. (Ch. 10)	$O(n^2)$ per round	Quadratic
CCESA (Ch. 12)	$O(n\sqrt{n/\log n})$	Sub-quadratic

The Same Cut-Set Recipe

The PIR converse follows the same four-step template as Chapter 2's coded-shuffling converse and Chapter 10's Caire et al. optimality:

Identify the cut (databases vs. user).
Apply an entropy bound (correctness + privacy).
Symmetrize (over $\theta$ ).
Normalize (divide by total download).

The specific entropy quantities differ — coded shuffling uses output-conditional entropy; PIR uses query-independent entropy — but the structure is identical. Mastering the recipe once, in Chapter 2, lets the reader follow later converses much faster.

Common Mistake: PIR Capacity Approaches $1 - 1/N$ , Not $1$

Mistake:

Read the formula and conclude that PIR capacity approaches 1 for all parameter regimes.

Correction:

The PIR capacity approaches $1$ only as $N \to \infty$ (with $K$ fixed). For fixed $N$ and $K \to \infty$ , the capacity approaches $1 - 1/N$ — not 1. The structure of the geometric sum: $\sum_{i=0}^{K-1} (1/N)^i \to 1 / (1 - 1/N) = N/(N-1)$ as $K \to \infty$ . Inverting: $C_{\text{PIR}} \to (N-1)/N = 1 - 1/N$ .

Operational implication: with $N = 2$ databases, even infinite-library PIR has $\geq 50\%$ overhead. To beat $50\%$ overhead, need $N \geq 3$ databases.

⚠️Engineering Note

Designing PIR Systems with the Capacity Formula

System designers can use the capacity formula as a deployment guide:

Provisioning databases: target $C_{\text{PIR}} \geq 0.9$ to keep PIR overhead under $10\%$ . For typical $K \sim 100$ files, this requires $N \geq 10$ databases.
Library size sensitivity: doubling $K$ affects $C_{\text{PIR}}$ less than doubling $N$ . Prefer scaling out the database count over scaling down the library partitioning.
Latency vs. rate: more databases mean more parallel queries (higher throughput) but also more network round-trips (higher latency). Sun-Jafar's per-database symmetric design helps amortize this cost.

The capacity formula is a clean target — production systems typically achieve $90$ – $95\%$ of capacity in practice, with the gap due to constant-factor overheads and finite block length.

Practical Constraints

•
Target $N \geq 10$ for $\geq 90\%$ rate at $K \sim 100$
•
Library size sensitivity: weak for large $N$
•
Production rate: 90–95% of capacity typical

📋 Ref: Sun-Jafar 2017 §V; Microsoft SealPIR design notes

Historical Note: Sun-Jafar 2017: A Field-Defining Result

2017–present

Hua Sun and Syed Jafar's 2017 paper "The Capacity of Private Information Retrieval" settled a 22-year-old open problem in information-theoretic cryptography. Before Sun-Jafar, the best known PIR rates were sub-capacity by a constant factor; the optimality question was open.

The paper's clean two-part structure (achievability via finite-field IA + converse via inductive cut-set bound) became the template for the entire post-2017 PIR literature. Coded-storage PIR (Chapter 14) followed within a year; cache-aided PIR (Chapter 15) within two; symmetric PIR, multi-message PIR, and secure PIR all built on the same algebraic foundation.

The result earned Sun and Jafar the 2018 IEEE Information Theory Society Paper Award. The capacity formula $C_{\text{PIR}} = (1 + 1/N + \cdots + 1/N^{K-1})^{-1}$ is now standard in every modern treatment of PIR.

Quick Check

As the number of files $K \to \infty$ with $N$ fixed, the PIR capacity $C_{\text{PIR}}(N, K)$ approaches:

$1$ (no privacy overhead)

$1 - 1/N$ (depends on $N$ )

$0$ (PIR becomes infeasible)

$1/K$ (linear decay)