Ferkans — Interactive Telecom Tutor

From Independent to Colluding Adversaries

Classical Sun-Jafar PIR (Chapter 13) assumes that no two databases share information about received queries. This non-collusion assumption is fragile: a single side channel (shared operator, log aggregation, hypervisor breach) can void it.

$T$ -colluding PIR strengthens the threat model: any subset of $T$ databases may pool their queries, and the protocol must remain information-theoretically private against this combined view. Setting $T = 1$ recovers the classical case; $T = N - 1$ makes PIR impossible (capacity drops to the trivial $1/K$ ). The capacity formula was settled by Sun and Jafar in 2018.

Definition:
$T$ -Colluding PIR (Formal)

A PIR scheme is $T$ -private if for every desired index $\theta \in [K]$ and every subset $\mathcal{T} \subseteq [N]$ with $|\mathcal{T}| \leq T$ : $I\!\left(\theta;\, \{Q^{(\theta, n)}, A^{(\theta, n)}\}_{n \in \mathcal{T}}\right) \;=\; 0.$ That is, the joint view of any $T$ databases — queries and answers combined — is statistically independent of $\theta$ .

The classical Sun-Jafar setting is $T = 1$ (each database alone learns nothing). The fully- colluding setting is $T = N$ (no privacy achievable beyond the trivial baseline).

Theorem: $T$ -Colluding PIR Capacity (Sun–Jafar 2018)

For PIR with $K$ files replicated across $N$ non-Byzantine databases, with privacy required against any $T$ colluding databases, the PIR capacity is $C_{\text{PIR}}(N, K, T) \;=\; \left(1 + \frac{T}{N} + \frac{T^2}{N^2} + \cdots + \frac{T^{K-1}}{N^{K-1}}\right)^{-1}.$ Setting $T = 1$ gives Sun-Jafar's classical formula; setting $T = N$ gives $C \to 1/K$ (the trivial baseline).

Proof

Achievability via Shamir-style sharing

Modify Sun-Jafar's scheme: each query symbol is constructed as a Shamir-style $(T, N)$ sharing — a polynomial of degree $T$ evaluated at $N$ points. Any $T$ databases see only random-looking queries (information- theoretically), satisfying $T$ -privacy. The decoding works because the polynomial is designed to leak the desired symbol only when all $N$ databases respond.

Converse

The cut-set converse extends naturally: the entropy of any $T$ -subset of queries is $\geq H(\theta)$ (privacy → independence constraint), and the recursive symmetrization yields the geometric formula with $T/N$ replacing $1/N$ .

Asymptotic analysis

For large $N$ with $T/N$ fixed: $C_{\text{PIR}}(N, K, T) \to (1 - T/N)/(1 - (T/N)^K)$ . For large $K$ with fixed $T/N$ : $C_{\text{PIR}}(N, K, T) \to 1 - T/N$ . Generalizes Sun-Jafar's $K \to \infty$ limit of $1 - 1/N$ (set $T = 1$ ).

Example: $T$ -Colluding Capacity at $N = 5, K = 3$

Compute $C_{\text{PIR}}(5, 3, T)$ for $T = 1, 2, 3, 4$ and verify it interpolates between Sun-Jafar ( $T = 1$ ) and the trivial baseline ( $T = N$ ).

Solution

$T = 1$ (classical)

$C(5, 3, 1) = (1 + 0.2 + 0.04)^{-1} = 1/1.24 \approx 0.806$ . Same as Chapter 13.

$T = 2$

$C(5, 3, 2) = (1 + 0.4 + 0.16)^{-1} = 1/1.56 \approx 0.641$ . About $20\%$ lower than $T = 1$ .

$T = 3$

$C(5, 3, 3) = (1 + 0.6 + 0.36)^{-1} = 1/1.96 \approx 0.510$ . About $37\%$ lower than $T = 1$ .

$T = 4$ (almost-fully-colluding)

$C(5, 3, 4) = (1 + 0.8 + 0.64)^{-1} = 1/2.44 \approx 0.410$ .

Trivial baseline at $T = N = 5$

$C(5, 3, 5) = (1 + 1 + 1)^{-1} = 1/3 \approx 0.333$ — exactly the trivial rate $1/K = 1/3$ . Verifies the formula's limiting behavior.

$T$ -Colluding PIR Capacity

Plot the $T$ -colluding capacity $C_{\text{PIR}}(N, K, T)$ as a function of $T$ for fixed $N$ and $K$ . The curve starts at the Sun-Jafar capacity ( $T = 1$ ) and decays monotonically to the trivial baseline $1/K$ at $T = N$ . The shape illustrates the rate cost of collusion tolerance.

Parameters

N

— databases8

K

— files5

Shamir Sharing as the $T$ -Privacy Primitive

The achievability scheme uses Shamir secret sharing as a black-box primitive (Chapter 3). The query polynomial of degree $T$ ensures that any $T$ evaluations reveal nothing about the coefficients (Shamir's privacy property), which in turn ensures that any $T$ database queries reveal nothing about $\theta$ .

The coding-theoretic intuition: Shamir's $(T, N)$ -secret-sharing achieves $T$ -privacy with optimal rate $N - T$ . The PIR scheme inherits this optimal trade-off structure, modified by the geometric-series capacity formula.

Operational implication: every unit of additional collusion tolerance ( $T$ ) costs a multiplicative factor of $T / 1$ in the geometric series — equivalent to the Shamir overhead.

$T$ -Colluding PIR vs. Trivial and Classical

$T$	Privacy Strength	Capacity Formula	Numerical at $N=5, K=4$
$T = 1$ (classical)	Each DB alone learns nothing	$(1 + 1/N + \cdots + 1/N^{K-1})^{-1}$	$\approx 0.801$
$T = 2$	Any pair of DBs learns nothing	$(1 + 2/N + \cdots + 2^{K-1}/N^{K-1})^{-1}$	$\approx 0.616$
$T = N - 1 = 4$	Any $N-1$ DBs learn nothing	Geometric in $4/5$	$\approx 0.366$
$T = N = 5$ (trivial)	All DBs learn nothing — no privacy possible	$1/K$	$0.250$

⚠️Engineering Note

Side-Channel Risks and Choosing $T$

Choosing $T$ in a $T$ -colluding PIR deployment:

$T = 1$ (no collusion tolerance): adequate when databases are operated by mutually-independent administrators with no shared infrastructure.
$T = 2$ or $3$ : protects against small- scale collusion (compromised admin pair, hypervisor leak). Recommended baseline for cross-cloud deployments.
$T = N/2$ : defends against majority-honest threat model (similar to BFT). Significantly lower rate but standard for high-security deployments.
$T = N - 1$ : defends against everything except universal collusion. The rate is barely above the trivial baseline — usually not worth it.

Side channels to consider when picking $T$ : shared log aggregators, cross-tenant hypervisors, MITM on database-to-database backplane, and accidental telemetry leaks.

Practical Constraints

•
$T = 1$ : independent admins, no shared infra
•
$T = 2-3$ : cross-cloud baseline
•
$T = N/2$ : high-security; major rate loss
•
$T = N - 1$ : marginal benefit over trivial

📋 Ref: Sun-Jafar 2018; cloud security best practices

Common Mistake: Collusion Tolerance Doesn't Help Against Byzantine Databases

Mistake:

Assume that $T$ -colluding PIR also handles Byzantine (malicious) databases that return incorrect answers.

Correction:

$T$ -colluding PIR strengthens the privacy threat model — it bounds what an adversary learns from compromised databases. It does not bound what an adversary can do (e.g., return arbitrary garbage). For Byzantine robustness, extend with verifiability layers (Tajeddine et al., 2019; Banawan & Ulukus, 2017) or use a separate Byzantine-tolerance scheme like those covered in Chapter 11. The two properties — privacy and robustness — must be addressed independently.

Key Takeaway

$T$ -colluding PIR generalizes Sun-Jafar with $T/N$ replacing $1/N$ in the capacity formula. Each unit of additional collusion tolerance costs a multiplicative factor in the geometric series. Production deployments should pick $T$ based on the side-channel attack surface — typically $T = 2$ or $3$ for cross-cloud baselines. Byzantine robustness is separate and requires additional layers.

Quick Check

For PIR with $N = 4, K = 5$ , the rate gap between $T = 1$ (classical) and $T = 2$ is approximately:

$\sim 0.10$ (small)

$\sim 0.20$ (moderate)

$\sim 0.40$ (large)

Zero, because Sun-Jafar already handles all collusion.

Correction:

\sim 0.20

(moderate)

$C(4, 5, 1) = (1 + 0.25 + 0.0625 + 0.0156 + 0.0039)^{-1} \approx 0.747$ . $C(4, 5, 2) = (1 + 0.5 + 0.25 + 0.125 + 0.0625)^{-1} \approx 0.516$ . Gap $\approx 0.231$ . The $T = 2$ requirement costs about a quarter of the rate.

TTT-Colluding PIR