Symmetric PIR (SPIR)
Two-Sided Privacy
Classical PIR (Chapter 13, §14.1, §14.2) provides one-sided privacy: the databases learn nothing about , but the user is free to learn anything they can compute from the answers — in particular, the user might learn linear combinations of files for .
In settings like medical research or inter-organizational data exchange, the databases also need privacy: the user should learn and only . Anything else (other files, derived information) must remain hidden.
This is symmetric PIR (SPIR). The additional database-privacy requirement fundamentally changes the achievable rate: for all (Sun-Jafar 2018c, with earlier bounds by Gertner et al. 1998).
Definition: Symmetric PIR (SPIR)
Symmetric PIR (SPIR)
A PIR scheme is symmetric (an SPIR scheme) if it satisfies both:
-
User privacy (as in classical PIR):
-
Database privacy: That is, the user's combined view (queries + answers) reveals nothing about the files for .
SPIR additionally requires shared randomness: a common-randomness symbol is shared across all databases (but unknown to the user). This randomness is essential for hiding the other files; without it, SPIR is impossible.
Theorem: SPIR Capacity (Sun–Jafar 2018c)
For symmetric PIR with files replicated across databases, , , with shared randomness across databases: The capacity is independent of (unlike classical PIR, which depends on ).
Achievability sketch
The achievability uses an additional layer of common randomness at the databases. Each database adds a randomized mask to its answer such that any single answer is uninformative (about every file), but the user can decode from the combination of all answers. The rate is achievable because database's answer must be devoted to canceling the mask.
Converse
Two constraints jointly: user privacy () and database privacy (). Cut-set bound on the user's view of the answers gives regardless of — the database-privacy constraint forces one database's answer to be "wasted" in canceling the mask. Yields .
Why $K$-independence?
In classical PIR, the user can extract information about all files (just not about ). The capacity depends on because more files means more interference to align. In SPIR, the user is prohibited from learning about other files at all — so the -dependence disappears.
Common-randomness requirement
SPIR is impossible without shared randomness across databases. With a random oracle (e.g., a public-coin protocol), the rate is bounded by as above. Without it, no positive rate is achievable.
Example: SPIR vs. Classical at
Compare with for .
$K = 2$
. (for any ). Gap: — small.
$K = 5$
. . Gap: — negligible.
$K = 10, 50$
For , classical PIR is (within of SPIR). The asymptote is essentially achieved by .
Operational
For small , classical PIR has a non-trivial advantage over SPIR (the user can be greedy with derived information). For large , the gap closes — both approach . SPIR is the natural rate floor.
SPIR Capacity vs. Classical PIR
Plot and on the same axes as a function of for fixed . The classical curve starts above the SPIR floor and asymptotes to it as . The gap quantifies the cost of two-sided privacy in small-library settings.
Parameters
Theorem: SPIR Requires Common Randomness
SPIR with any positive rate requires the databases to share a non-trivial common randomness : bits per file. Without shared randomness (), the only achievable rate is .
Why randomness is needed
Database privacy requires that the user's view be marginally independent of . Without randomness, the answers are deterministic functions of the queries and files — making this independence impossible (the user can always construct a function that depends on given deterministic answers).
How much randomness?
The lower bound comes from a counting argument: the randomness must mask the information about other files in each database's answer, which requires at least bits of mutual unpredictability.
Where does $S$ come from?
In practice: a pre-distributed key, a common public seed (with PRG), or a secure multi-party setup phase. The randomness is pure protocol overhead — it's part of the SPIR cost.
Classical PIR vs. SPIR — Operational Comparison
| Property | Classical PIR | SPIR |
|---|---|---|
| User privacy | Yes | Yes |
| Database privacy | No (user learns derived info on other files) | Yes (user learns only ) |
| Common randomness needed | No | Yes () |
| Capacity | ||
| -dependence | Yes | No |
| Operational use | Read-only retrieval | Genomic, medical, financial PIR |
Deploying SPIR
Practical guidelines for SPIR deployments:
-
Common randomness setup: requires a pre-distributed shared secret across databases. This is typically the largest operational hurdle. Methods: trusted setup phase, secret-sharing of a master key, tamper-resistant HSM.
-
Rate cost: for large . The SPIR cost is concentrated in small-library settings.
-
Use cases:
- Genomic research (researcher learns one record; database hides everything else)
- Encrypted DNS resolution (user learns one record; resolver hides everything else)
- Medical record retrieval (compliance with HIPAA-style minimization)
-
Anti-use cases: SPIR is overkill when the user is trusted not to abuse derived information. Classical PIR with proper access controls suffices.
- •
Shared randomness:
- •
Trusted setup or HSM-distributed master key
- •
Use cases: genomic, medical, financial
- •
Anti-use: trusted-user contexts (use classical PIR)
Common Mistake: SPIR Doesn't Add Authentication
Mistake:
Treat SPIR as covering all "two-sided" properties — including the user authenticating that the database is honest.
Correction:
SPIR is about information theoretic privacy of the databases against the user — it bounds what the user learns, not what the user receives. The databases could still send malformed or incorrect answers (and SPIR provides no guarantee against this). For database authentication, use cryptographic signatures on the answers; for honest-user constraints, use access controls. SPIR + auth + access is the proper stack.
Key Takeaway
SPIR adds two-sided privacy with capacity (independent of ). Compared with classical PIR, the rate cost is small for large and notable for small . SPIR requires shared randomness across databases — typically via trusted setup or HSM-distributed keys. Use SPIR when both sides need privacy guarantees; use classical PIR when only the user needs privacy.
Historical Note: Origin of SPIR — Gertner et al.
Symmetric PIR was introduced by Gertner, Ishai, Kushilevitz, and Malkin in 1998 — motivated by a database-privacy goal that classical PIR did not address. Their original formulation considered SPIR with a constant number of databases and proved a rate is achievable. The exact capacity was settled by Sun and Jafar in 2018 — completing the SPIR analog of their classical PIR result. Wang and Skoglund (2019) extended SPIR to MDS-coded storage. The historical arc parallels classical PIR (Chor et al. 1995 → Sun-Jafar 2017): a 20-year gap between problem definition and capacity characterization.
Quick Check
For SPIR with databases and files, the capacity is:
(close to )
(regardless of )
(close to )
Same as classical PIR
. The -independence is the signature property of SPIR.