-Colluding PIR
From Independent to Colluding Adversaries
Classical Sun-Jafar PIR (Chapter 13) assumes that no two databases share information about received queries. This non-collusion assumption is fragile: a single side channel (shared operator, log aggregation, hypervisor breach) can void it.
-colluding PIR strengthens the threat model: any subset of databases may pool their queries, and the protocol must remain information-theoretically private against this combined view. Setting recovers the classical case; makes PIR impossible (capacity drops to the trivial ). The capacity formula was settled by Sun and Jafar in 2018.
Definition: -Colluding PIR (Formal)
-Colluding PIR (Formal)
A PIR scheme is -private if for every desired index and every subset with : That is, the joint view of any databases — queries and answers combined — is statistically independent of .
The classical Sun-Jafar setting is (each database alone learns nothing). The fully- colluding setting is (no privacy achievable beyond the trivial baseline).
Theorem: -Colluding PIR Capacity (Sun–Jafar 2018)
For PIR with files replicated across non-Byzantine databases, with privacy required against any colluding databases, the PIR capacity is Setting gives Sun-Jafar's classical formula; setting gives (the trivial baseline).
Achievability via Shamir-style sharing
Modify Sun-Jafar's scheme: each query symbol is constructed as a Shamir-style sharing — a polynomial of degree evaluated at points. Any databases see only random-looking queries (information- theoretically), satisfying -privacy. The decoding works because the polynomial is designed to leak the desired symbol only when all databases respond.
Converse
The cut-set converse extends naturally: the entropy of any -subset of queries is (privacy → independence constraint), and the recursive symmetrization yields the geometric formula with replacing .
Asymptotic analysis
For large with fixed: . For large with fixed : . Generalizes Sun-Jafar's limit of (set ).
Example: -Colluding Capacity at
Compute for and verify it interpolates between Sun-Jafar () and the trivial baseline ().
$T = 1$ (classical)
. Same as Chapter 13.
$T = 2$
. About lower than .
$T = 3$
. About lower than .
$T = 4$ (almost-fully-colluding)
.
Trivial baseline at $T = N = 5$
— exactly the trivial rate . Verifies the formula's limiting behavior.
-Colluding PIR Capacity
Plot the -colluding capacity as a function of for fixed and . The curve starts at the Sun-Jafar capacity () and decays monotonically to the trivial baseline at . The shape illustrates the rate cost of collusion tolerance.
Parameters
Shamir Sharing as the -Privacy Primitive
The achievability scheme uses Shamir secret sharing as a black-box primitive (Chapter 3). The query polynomial of degree ensures that any evaluations reveal nothing about the coefficients (Shamir's privacy property), which in turn ensures that any database queries reveal nothing about .
The coding-theoretic intuition: Shamir's -secret-sharing achieves -privacy with optimal rate . The PIR scheme inherits this optimal trade-off structure, modified by the geometric-series capacity formula.
Operational implication: every unit of additional collusion tolerance () costs a multiplicative factor of in the geometric series — equivalent to the Shamir overhead.
-Colluding PIR vs. Trivial and Classical
| Privacy Strength | Capacity Formula | Numerical at | |
|---|---|---|---|
| (classical) | Each DB alone learns nothing | ||
| Any pair of DBs learns nothing | |||
| Any DBs learn nothing | Geometric in | ||
| (trivial) | All DBs learn nothing — no privacy possible |
Side-Channel Risks and Choosing
Choosing in a -colluding PIR deployment:
-
(no collusion tolerance): adequate when databases are operated by mutually-independent administrators with no shared infrastructure.
-
or : protects against small- scale collusion (compromised admin pair, hypervisor leak). Recommended baseline for cross-cloud deployments.
-
: defends against majority-honest threat model (similar to BFT). Significantly lower rate but standard for high-security deployments.
-
: defends against everything except universal collusion. The rate is barely above the trivial baseline — usually not worth it.
Side channels to consider when picking : shared log aggregators, cross-tenant hypervisors, MITM on database-to-database backplane, and accidental telemetry leaks.
- •
: independent admins, no shared infra
- •
: cross-cloud baseline
- •
: high-security; major rate loss
- •
: marginal benefit over trivial
Common Mistake: Collusion Tolerance Doesn't Help Against Byzantine Databases
Mistake:
Assume that -colluding PIR also handles Byzantine (malicious) databases that return incorrect answers.
Correction:
-colluding PIR strengthens the privacy threat model — it bounds what an adversary learns from compromised databases. It does not bound what an adversary can do (e.g., return arbitrary garbage). For Byzantine robustness, extend with verifiability layers (Tajeddine et al., 2019; Banawan & Ulukus, 2017) or use a separate Byzantine-tolerance scheme like those covered in Chapter 11. The two properties — privacy and robustness — must be addressed independently.
Key Takeaway
-colluding PIR generalizes Sun-Jafar with replacing in the capacity formula. Each unit of additional collusion tolerance costs a multiplicative factor in the geometric series. Production deployments should pick based on the side-channel attack surface — typically or for cross-cloud baselines. Byzantine robustness is separate and requires additional layers.
Quick Check
For PIR with , the rate gap between (classical) and is approximately:
(small)
(moderate)
(large)
Zero, because Sun-Jafar already handles all collusion.
. . Gap . The requirement costs about a quarter of the rate.