Ferkans — Interactive Telecom Tutor

Two-Sided Privacy

Classical PIR (Chapter 13, §14.1, §14.2) provides one-sided privacy: the databases learn nothing about $\theta$ , but the user is free to learn anything they can compute from the answers — in particular, the user might learn linear combinations of files $W_j$ for $j \neq \theta$ .

In settings like medical research or inter-organizational data exchange, the databases also need privacy: the user should learn $W_\theta$ and only $W_\theta$ . Anything else (other files, derived information) must remain hidden.

This is symmetric PIR (SPIR). The additional database-privacy requirement fundamentally changes the achievable rate: $C_{\text{SPIR}}(N, K) = 1 - 1/N$ for all $K \geq 2$ (Sun-Jafar 2018c, with earlier bounds by Gertner et al. 1998).

,

Definition:
Symmetric PIR (SPIR)

A PIR scheme is symmetric (an SPIR scheme) if it satisfies both:

User privacy (as in classical PIR): $I\!\left(\theta;\, Q^{(\theta, n)}\right) \;=\; 0 \quad \forall n \in [N].$
Database privacy: $I\!\left(\{W_k\}_{k \neq \theta};\, A^{(\theta, 1)}, \ldots, A^{(\theta, N)}, Q^{(\theta, 1)}, \ldots, Q^{(\theta, N)}\right) \;=\; 0.$ That is, the user's combined view (queries + answers) reveals nothing about the files $W_k$ for $k \neq \theta$ .

SPIR additionally requires shared randomness: a common-randomness symbol $S$ is shared across all $N$ databases (but unknown to the user). This randomness is essential for hiding the other files; without it, SPIR is impossible.

,

Theorem: SPIR Capacity (Sun–Jafar 2018c)

For symmetric PIR with $K$ files replicated across $N$ databases, $N \geq 2$ , $K \geq 2$ , with shared randomness across databases: $C_{\text{SPIR}}(N, K) \;=\; 1 - \frac{1}{N}.$ The capacity is independent of $K$ (unlike classical PIR, which depends on $K$ ).

Proof

Achievability sketch

The achievability uses an additional layer of common randomness $S$ at the databases. Each database adds a randomized mask to its answer such that any single answer is uninformative (about every file), but the user can decode $W_\theta$ from the combination of all $N$ answers. The $1 - 1/N$ rate is achievable because $1$ database's answer must be devoted to canceling the mask.

Converse

Two constraints jointly: user privacy ( $I(\theta; Q^{(\theta, n)}) = 0$ ) and database privacy ( $I(W_{k \neq \theta}; \text{view}) = 0$ ). Cut-set bound on the user's view of the answers gives $L \leq (N - 1) \cdot D / N$ regardless of $K$ — the database-privacy constraint forces one database's answer to be "wasted" in canceling the mask. Yields $R \leq 1 - 1/N$ .

Why $K$-independence?

In classical PIR, the user can extract information about all $K$ files (just not about $\theta$ ). The capacity depends on $K$ because more files means more interference to align. In SPIR, the user is prohibited from learning about other files at all — so the $K$ -dependence disappears.

Common-randomness requirement

SPIR is impossible without shared randomness across databases. With a random oracle (e.g., a public-coin protocol), the rate is bounded by $1 - 1/N$ as above. Without it, no positive rate is achievable.

Example: SPIR vs. Classical at $N = 5$

Compare $C_{\text{PIR}}(5, K)$ with $C_{\text{SPIR}}(5, K)$ for $K = 2, 5, 10, 50$ .

Solution

$K = 2$

$C_{\text{PIR}}(5, 2) = (1 + 1/5)^{-1} = 5/6 \approx 0.833$ . $C_{\text{SPIR}}(5, K) = 1 - 1/5 = 0.800$ (for any $K$ ). Gap: $\approx 0.033$ — small.

$K = 5$

$C_{\text{PIR}}(5, 5) = (1 + 1/5 + 1/25 + 1/125 + 1/625)^{-1} \approx 0.802$ . $C_{\text{SPIR}}(5, 5) = 0.800$ . Gap: $\approx 0.002$ — negligible.

$K = 10, 50$

For $K \geq 5$ , classical PIR is $\leq 0.800001$ (within $10^{-6}$ of SPIR). The $1 - 1/N$ asymptote is essentially achieved by $K = 5$ .

Operational

For small $K$ , classical PIR has a non-trivial advantage over SPIR (the user can be greedy with derived information). For large $K$ , the gap closes — both approach $1 - 1/N$ . SPIR is the natural rate floor.

SPIR Capacity vs. Classical PIR

Plot $C_{\text{PIR}}(N, K)$ and $C_{\text{SPIR}}(N, K) = 1 - 1/N$ on the same axes as a function of $K$ for fixed $N$ . The classical curve starts above the SPIR floor and asymptotes to it as $K \to \infty$ . The gap quantifies the cost of two-sided privacy in small-library settings.

Parameters

N

— databases5

K_{\max}

— file count range20

Theorem: SPIR Requires Common Randomness

SPIR with any positive rate requires the databases to share a non-trivial common randomness $S$ : $H(S) \;\geq\; \frac{1}{N - 1} \cdot L$ bits per file. Without shared randomness ( $H(S) = 0$ ), the only achievable rate is $R = 0$ .

Proof

Why randomness is needed

Database privacy requires that the user's view be marginally independent of $\{W_k\}_{k \neq \theta}$ . Without randomness, the answers are deterministic functions of the queries and files — making this independence impossible (the user can always construct a function that depends on $W_j$ given deterministic answers).

How much randomness?

The lower bound $H(S) \geq L / (N-1)$ comes from a counting argument: the randomness must mask the information about other files in each database's answer, which requires at least $L/(N-1)$ bits of mutual unpredictability.

Where does $S$ come from?

In practice: a pre-distributed key, a common public seed (with PRG), or a secure multi-party setup phase. The randomness is pure protocol overhead — it's part of the SPIR cost.

Classical PIR vs. SPIR — Operational Comparison

Property	Classical PIR	SPIR
User privacy	Yes	Yes
Database privacy	No (user learns derived info on other files)	Yes (user learns only $W_\theta$ )
Common randomness needed	No	Yes ( $H(S) \geq L/(N-1)$ )
Capacity	$(1 + 1/N + \cdots + 1/N^{K-1})^{-1}$	$1 - 1/N$
$K$ -dependence	Yes	No
Operational use	Read-only retrieval	Genomic, medical, financial PIR

⚠️Engineering Note

Deploying SPIR

Practical guidelines for SPIR deployments:

Common randomness setup: requires a pre-distributed shared secret across databases. This is typically the largest operational hurdle. Methods: trusted setup phase, secret-sharing of a master key, tamper-resistant HSM.
Rate cost: $C_{\text{SPIR}}/C_{\text{PIR}} = (N-1)/N \cdot (1 + 1/N + \cdots) \approx 1$ for large $K$ . The SPIR cost is concentrated in small-library settings.
Use cases:
- Genomic research (researcher learns one record; database hides everything else)
- Encrypted DNS resolution (user learns one record; resolver hides everything else)
- Medical record retrieval (compliance with HIPAA-style minimization)
Anti-use cases: SPIR is overkill when the user is trusted not to abuse derived information. Classical PIR with proper access controls suffices.

Practical Constraints

•
Shared randomness: $H(S) \geq L/(N-1)$
•
Trusted setup or HSM-distributed master key
•
Use cases: genomic, medical, financial
•
Anti-use: trusted-user contexts (use classical PIR)

📋 Ref: Sun-Jafar 2018; Wang-Skoglund 2019

Common Mistake: SPIR Doesn't Add Authentication

Mistake:

Treat SPIR as covering all "two-sided" properties — including the user authenticating that the database is honest.

Correction:

SPIR is about information theoretic privacy of the databases against the user — it bounds what the user learns, not what the user receives. The databases could still send malformed or incorrect answers (and SPIR provides no guarantee against this). For database authentication, use cryptographic signatures on the answers; for honest-user constraints, use access controls. SPIR + auth + access is the proper stack.

Key Takeaway

SPIR adds two-sided privacy with capacity $1 - 1/N$ (independent of $K$ ). Compared with classical PIR, the rate cost is small for large $K$ and notable for small $K$ . SPIR requires shared randomness across databases — typically via trusted setup or HSM-distributed keys. Use SPIR when both sides need privacy guarantees; use classical PIR when only the user needs privacy.

Historical Note: Origin of SPIR — Gertner et al.

Symmetric PIR was introduced by Gertner, Ishai, Kushilevitz, and Malkin in 1998 — motivated by a database-privacy goal that classical PIR did not address. Their original formulation considered SPIR with a constant number of databases and proved a $\Theta(1)$ rate is achievable. The exact capacity $1 - 1/N$ was settled by Sun and Jafar in 2018 — completing the SPIR analog of their classical PIR result. Wang and Skoglund (2019) extended SPIR to MDS-coded storage. The historical arc parallels classical PIR (Chor et al. 1995 → Sun-Jafar 2017): a 20-year gap between problem definition and capacity characterization.

Quick Check

For SPIR with $N = 4$ databases and $K = 100$ files, the capacity is:

$\approx 0.999$ (close to $1$ )

$0.75$ (regardless of $K$ )

$\approx 0.01$ (close to $1/K$ )

Same as classical PIR