The Sun–Jafar PIR Scheme

From Trivial to Capacity-Achieving

Section 13.1 presented the trivial rate-1/K1/K PIR (download all files) and showed the Sun-Jafar capacity is much higher. This section constructs the Sun-Jafar capacity-achieving scheme.

The construction is striking in its simplicity. The user's queries are random linear combinations over Fq\mathbb{F}_q; each database returns a single linear combination of its files weighted by the query. The combinations are designed so that the desired file's contribution survives across the NN answers, while the other files' contributions cancel out via finite-field interference alignment — the same machinery from Chapter 4.

The point is that PIR is structurally an interference-channel problem: K1K - 1 "interferers" (undesired files) need to be aligned into the nullspace at the user's decoder, leaving only the desired file's contribution. The Sun-Jafar scheme achieves this with explicit, deterministic constructions over Fq\mathbb{F}_q.

Example: Sun–Jafar PIR for N=2,K=2N = 2, K = 2 (Walked Through)

Construct a capacity-achieving PIR scheme for N=2N = 2 databases and K=2K = 2 files. The capacity is CPIR(2,2)=(1+1/2)1=2/3C_{\text{PIR}}(2, 2) = (1 + 1/2)^{-1} = 2/3. Specify the user's queries, the databases' answers, and verify privacy + correctness.

Definition:

Sun–Jafar Capacity-Achieving PIR Scheme

The Sun-Jafar PIR scheme for NN databases and KK files achieves the capacity CPIR(N,K)=(1+1/N++1/NK1)1C_{\text{PIR}}(N, K) = (1 + 1/N + \cdots + 1/N^{K-1})^{-1}. The construction uses files of length L=NKL = N^K chunks each (file extension to enable the structure):

  1. Symbol partitioning. Each file WkW_k is split into NKN^K symbols over Fq\mathbb{F}_q, indexed by [N]K\boldsymbol{\ell} \in [N]^K: Wk={wk()}W_k = \{w_k(\boldsymbol{\ell})\}_{\boldsymbol{\ell}}.

  2. Query construction. The user's randomness generates a random permutation π\pi of [N]K[N]^K (equivalent to a random labeling of the NKN^K "positions" among files). For each database nn and "level" i[K]i \in [K], the user generates a query asking for specific symbols. The query structure satisfies:

    • At "level 1" (single-symbol queries): each database returns 11 symbol of the desired file plus K1K - 1 symbols of other files (in carefully arranged combinations).
    • At "level 2" (two-symbol XORs): each database returns (K11)\binom{K-1}{1} XORs of one desired chunk with one undesired chunk.
    • At "level ii": the user requests (K1i1)\binom{K-1}{i-1} XOR-combinations of ii chunks (i1i-1 undesired + 1 desired).
  3. Counting answers. Database nn returns i=1K(K1i1)NKi\sum_{i=1}^K \binom{K-1}{i-1} N^{K-i} symbols in total. Aggregate download: D=Ni(K1i1)NKiD = N \cdot \sum_i \binom{K-1}{i-1} N^{K-i}.

  4. Decoding. From the NN answers, the user recovers all NKN^K symbols of WθW_\theta by successive elimination: Level-KK XORs remove level-(K1)(K-1) contributions, and so on down to level 1. The "interferers" (other files' symbols) align across the levels and cancel.

The rate is R=L/D=NK/D=(1+1/N++1/NK1)1=CPIRR = L / D = N^K / D = (1 + 1/N + \cdots + 1/N^{K-1})^{-1} = C_{\text{PIR}}. The privacy is enforced by the random labeling π\pi, which uniformly randomizes the queries over all possible structures.

The construction is called the "Sun-Jafar scheme" or sometimes the "specific-PIR scheme". The explicit construction uses KK levels of progressively-larger XOR groups, each canceling the interferers from the level below. This is the PIR-specific instantiation of the finite-field IA framework from Chapter 4.

Sun–Jafar PIR Scheme

The capacity-achieving construction for classical PIR. Each file is split into NKN^K chunks; queries request progressive levels of XOR combinations (single-chunk, 2-chunk XORs, ..., KK-chunk XORs) structured so the desired file's contribution survives while interferers cancel via finite-field IA. Rate matches CPIR=(1+1/N++1/NK1)1C_{\text{PIR}} = (1 + 1/N + \cdots + 1/N^{K-1})^{-1}.

Sun–Jafar PIR Protocol

Complexity: O(NK)O(N^K) symbols decoded; O(KNK)O(K \cdot N^K) field operations.
Input: Files {Wk}k=1K\{W_k\}_{k=1}^K, each of length
NKN^K symbols over Fq\mathbb{F}_q. User's desired
index θ\theta.
Setup (User):
1. Generate uniform random permutation π\pi of
[N]K[N]^K.
Query Phase:
2. for each database n=1,,Nn = 1, \ldots, N do
3. \quad for each level i=1,,Ki = 1, \ldots, K do
4. \qquad Construct (K1i1)\binom{K-1}{i-1} queries
at level ii for database nn. Each query is a
linear combination over Fq\mathbb{F}_q of
ii symbols: one from the desired file
WθW_\theta + i1i - 1 from interferer files.
The specific symbol indices are determined by
π\pi and i,ni, n.
5. \quad end for
6. end for
Answer Phase:
7. for each database nn do
8. \quad Compute the requested linear
combinations from its stored files.
9. \quad Send back the answers A(θ,n)A^{(\theta, n)}.
10. end for
Decoding (User):
11. for i=K,K1,,1i = K, K-1, \ldots, 1 do
12. \quad For each level-ii XOR received, subtract
out the previously-decoded interferer symbols
from level i+1,,Ki+1, \ldots, K. The remainder is
a level-ii desired-file chunk (with
contribution from ii desired symbols).
13. end for
14. return WθW_\theta.

The block length NKN^K is exponential in KK — the construction uses very large files (NKN^K chunks per file). This is necessary for exact capacity achievement; for finite block length, the rate is slightly below capacity. In practice, choose K,NK, N small enough that NKN^K is manageable.

Sun–Jafar PIR Protocol Flow

Animation of the Sun-Jafar PIR scheme for N=2N = 2, K=2K = 2. Shows the user's random labeling, the query construction at each level, the database answers, and the user's decoding. Highlights how the interferer-cancellation works via the level structure.

Sun-Jafar Rate vs. NN and KK

Plot the achieved rate of the Sun-Jafar scheme R=NK/DR = N^K / D against NN for fixed KK, and against KK for fixed NN. The rate matches the capacity CPIR=(1+1/N++1/NK1)1C_{\text{PIR}} = (1 + 1/N + \cdots + 1/N^{K-1})^{-1}. The plot shows: (i) rate increases with NN approaching 11, (ii) rate decreases with KK approaching 11/N1 - 1/N for large KK. Operationally: more databases is much better; more files is slightly worse.

Parameters
6
10

PIR Is Finite-Field IA in Disguise

The Sun-Jafar construction can be seen as a specialization of the finite-field IA framework from Chapter 4. Specifically:

  • Each "level" ii corresponds to a "subspace alignment" in the IA sense: at level ii, the user demands that the ii-fold interferers from K1K - 1 unwanted files align into a common subspace where the user can cancel them.
  • The level-KK structure is the highest alignment: K1K - 1 interferer chunks pack into a single XOR with one desired chunk.
  • The capacity formula CPIR=(1+1/N++1/NK1)1C_{\text{PIR}} = (1 + 1/N + \cdots + 1/N^{K-1})^{-1} has the geometric structure characteristic of IA-based achievability.

The point is that PIR is a concrete IA application — one of the few where the finite-field IA framework gives explicit, capacity-achieving, deployable schemes. This is why Section 4.4 of Chapter 4 introduced PIR as a forward reference: it is the cleanest success story of finite-field IA.

Theorem: Sun–Jafar Achievability

For any N2N \geq 2 databases and K1K \geq 1 files, the Sun-Jafar scheme (Algorithm 13.2.1) achieves PIR rate R  =  (1+1N+1N2++1NK1)1.R \;=\; \left(1 + \frac{1}{N} + \frac{1}{N^2} + \cdots + \frac{1}{N^{K-1}}\right)^{-1}. The scheme uses files of length L=NKL = N^K symbols over Fq\mathbb{F}_q for qNq \geq N. Privacy holds information-theoretically against each individual database.

Total download: D=Ni=1K(K1i1)NKiD = N \cdot \sum_{i=1}^K \binom{K-1}{i-1} N^{K-i}. Algebraic simplification using the binomial identity gives D=NKi=0K1(1/N)iD = N^K \cdot \sum_{i=0}^{K-1} (1/N)^i. Hence R=L/D=1/i=0K1(1/N)i=(1+1/N++1/NK1)1R = L / D = 1 / \sum_{i=0}^{K-1} (1/N)^i = (1 + 1/N + \cdots + 1/N^{K-1})^{-1}.

Operationally: each "level" ii contributes 1/Ni1/N^i to the total overhead — the geometric sum is the structural cost of privacy. As NN grows, higher-order terms vanish faster, and the overhead approaches the lower bound 11/N1 - 1/N.

Example: Sun–Jafar at N=2,K=3N = 2, K = 3

Compute the Sun-Jafar rate for N=2N = 2 databases and K=3K = 3 files. Compare with the trivial download-everything baseline.

Common Mistake: Sun-Jafar Requires Large Files

Mistake:

Apply Sun-Jafar PIR to small files (e.g., a few bytes per record).

Correction:

The Sun-Jafar construction requires file length L=NKL = N^K symbols. For N=4,K=5N = 4, K = 5, this is 45=10244^5 = 1024 symbols per file — fine for medium records but large for small ones. For fine-grained files (single-byte records), one must aggregate many records into a "virtual file" of NKN^K symbols, then run PIR on the virtual file.

Alternatively, computational PIR variants (which use cryptographic primitives instead of information theory) can handle arbitrary file sizes at the cost of weaker privacy guarantees.

⚠️Engineering Note

PIR Deployment: Where Sun-Jafar Fits

Sun-Jafar PIR is the information-theoretic benchmark — it gives the maximum-rate retrieval that can be achieved without computational assumptions. Production deployments include:

  • Multi-cloud database queries: a user's query to AWS + GCP + Azure replicas can use Sun-Jafar PIR if the providers are non-colluding (a strong assumption).
  • Privacy-preserving genome queries: research queries against genome databases at multiple institutions.
  • Encrypted DNS over multiple resolvers: query patterns split across non-colluding DNS servers.

Computational PIR (single-database, based on homomorphic encryption) is more common in deployments because it doesn't require the non-colluding assumption — but with computational rather than information-theoretic privacy. The Sun-Jafar capacity is the benchmark against which computational PIR is measured.

Practical Constraints
  • Information-theoretic PIR: N2N \geq 2 non-colluding databases required

  • Sun-Jafar block length: L=NKL = N^K — requires large files

  • Production: niche; computational PIR more common

📋 Ref: Beimel survey 2007; Microsoft SealPIR for computational case

Key Takeaway

The Sun-Jafar scheme achieves PIR capacity via finite-field IA across KK levels of XOR structure. Each level cancels one layer of interferer contributions, and the geometric sum over levels gives the exact capacity formula. The construction is explicit, deterministic, and information-theoretically optimal. Section 13.3 proves the matching converse — closing the rate region of classical PIR.

Quick Check

The Sun-Jafar PIR scheme uses file length:

L=KL = K (one symbol per file)

L=NKL = N^K — exponential in KK

L=KlogKL = K \log K

L=NL = N (one per database)