TT-Colluding PIR

From Independent to Colluding Adversaries

Classical Sun-Jafar PIR (Chapter 13) assumes that no two databases share information about received queries. This non-collusion assumption is fragile: a single side channel (shared operator, log aggregation, hypervisor breach) can void it.

TT-colluding PIR strengthens the threat model: any subset of TT databases may pool their queries, and the protocol must remain information-theoretically private against this combined view. Setting T=1T = 1 recovers the classical case; T=N1T = N - 1 makes PIR impossible (capacity drops to the trivial 1/K1/K). The capacity formula was settled by Sun and Jafar in 2018.

Definition:

TT-Colluding PIR (Formal)

A PIR scheme is TT-private if for every desired index θ[K]\theta \in [K] and every subset T[N]\mathcal{T} \subseteq [N] with TT|\mathcal{T}| \leq T: I ⁣(θ;{Q(θ,n),A(θ,n)}nT)  =  0.I\!\left(\theta;\, \{Q^{(\theta, n)}, A^{(\theta, n)}\}_{n \in \mathcal{T}}\right) \;=\; 0. That is, the joint view of any TT databases — queries and answers combined — is statistically independent of θ\theta.

The classical Sun-Jafar setting is T=1T = 1 (each database alone learns nothing). The fully- colluding setting is T=NT = N (no privacy achievable beyond the trivial baseline).

Theorem: TT-Colluding PIR Capacity (Sun–Jafar 2018)

For PIR with KK files replicated across NN non-Byzantine databases, with privacy required against any TT colluding databases, the PIR capacity is CPIR(N,K,T)  =  (1+TN+T2N2++TK1NK1)1.C_{\text{PIR}}(N, K, T) \;=\; \left(1 + \frac{T}{N} + \frac{T^2}{N^2} + \cdots + \frac{T^{K-1}}{N^{K-1}}\right)^{-1}. Setting T=1T = 1 gives Sun-Jafar's classical formula; setting T=NT = N gives C1/KC \to 1/K (the trivial baseline).

Example: TT-Colluding Capacity at N=5,K=3N = 5, K = 3

Compute CPIR(5,3,T)C_{\text{PIR}}(5, 3, T) for T=1,2,3,4T = 1, 2, 3, 4 and verify it interpolates between Sun-Jafar (T=1T = 1) and the trivial baseline (T=NT = N).

TT-Colluding PIR Capacity

Plot the TT-colluding capacity CPIR(N,K,T)C_{\text{PIR}}(N, K, T) as a function of TT for fixed NN and KK. The curve starts at the Sun-Jafar capacity (T=1T = 1) and decays monotonically to the trivial baseline 1/K1/K at T=NT = N. The shape illustrates the rate cost of collusion tolerance.

Parameters
8
5

Shamir Sharing as the TT-Privacy Primitive

The achievability scheme uses Shamir secret sharing as a black-box primitive (Chapter 3). The query polynomial of degree TT ensures that any TT evaluations reveal nothing about the coefficients (Shamir's privacy property), which in turn ensures that any TT database queries reveal nothing about θ\theta.

The coding-theoretic intuition: Shamir's (T,N)(T, N)-secret-sharing achieves TT-privacy with optimal rate NTN - T. The PIR scheme inherits this optimal trade-off structure, modified by the geometric-series capacity formula.

Operational implication: every unit of additional collusion tolerance (TT) costs a multiplicative factor of T/1T / 1 in the geometric series — equivalent to the Shamir overhead.

TT-Colluding PIR vs. Trivial and Classical

TTPrivacy StrengthCapacity FormulaNumerical at N=5,K=4N=5, K=4
T=1T = 1 (classical)Each DB alone learns nothing(1+1/N++1/NK1)1(1 + 1/N + \cdots + 1/N^{K-1})^{-1}0.801\approx 0.801
T=2T = 2Any pair of DBs learns nothing(1+2/N++2K1/NK1)1(1 + 2/N + \cdots + 2^{K-1}/N^{K-1})^{-1}0.616\approx 0.616
T=N1=4T = N - 1 = 4Any N1N-1 DBs learn nothingGeometric in 4/54/50.366\approx 0.366
T=N=5T = N = 5 (trivial)All DBs learn nothing — no privacy possible1/K1/K0.2500.250
⚠️Engineering Note

Side-Channel Risks and Choosing TT

Choosing TT in a TT-colluding PIR deployment:

  • T=1T = 1 (no collusion tolerance): adequate when databases are operated by mutually-independent administrators with no shared infrastructure.

  • T=2T = 2 or 33: protects against small- scale collusion (compromised admin pair, hypervisor leak). Recommended baseline for cross-cloud deployments.

  • T=N/2T = N/2: defends against majority-honest threat model (similar to BFT). Significantly lower rate but standard for high-security deployments.

  • T=N1T = N - 1: defends against everything except universal collusion. The rate is barely above the trivial baseline — usually not worth it.

Side channels to consider when picking TT: shared log aggregators, cross-tenant hypervisors, MITM on database-to-database backplane, and accidental telemetry leaks.

Practical Constraints
  • T=1T = 1: independent admins, no shared infra

  • T=23T = 2-3: cross-cloud baseline

  • T=N/2T = N/2: high-security; major rate loss

  • T=N1T = N - 1: marginal benefit over trivial

📋 Ref: Sun-Jafar 2018; cloud security best practices

Common Mistake: Collusion Tolerance Doesn't Help Against Byzantine Databases

Mistake:

Assume that TT-colluding PIR also handles Byzantine (malicious) databases that return incorrect answers.

Correction:

TT-colluding PIR strengthens the privacy threat model — it bounds what an adversary learns from compromised databases. It does not bound what an adversary can do (e.g., return arbitrary garbage). For Byzantine robustness, extend with verifiability layers (Tajeddine et al., 2019; Banawan & Ulukus, 2017) or use a separate Byzantine-tolerance scheme like those covered in Chapter 11. The two properties — privacy and robustness — must be addressed independently.

Key Takeaway

TT-colluding PIR generalizes Sun-Jafar with T/NT/N replacing 1/N1/N in the capacity formula. Each unit of additional collusion tolerance costs a multiplicative factor in the geometric series. Production deployments should pick TT based on the side-channel attack surface — typically T=2T = 2 or 33 for cross-cloud baselines. Byzantine robustness is separate and requires additional layers.

Quick Check

For PIR with N=4,K=5N = 4, K = 5, the rate gap between T=1T = 1 (classical) and T=2T = 2 is approximately:

0.10\sim 0.10 (small)

0.20\sim 0.20 (moderate)

0.40\sim 0.40 (large)

Zero, because Sun-Jafar already handles all collusion.