Extensions: A Preview of Chapters 14–15

Beyond Classical PIR

Sections 13.1–13.3 established classical PIR: $N$ replicated databases, non-colluding, honest-but-curious. The Sun-Jafar capacity closes this case. Real systems often deviate from the classical assumptions:

Coded storage: databases store MDS-coded versions of the files, not full replicas. This reduces total storage cost from $K \cdot N$ file-equivalents to $K \cdot N / r$ for an $(N, r)$ -MDS code. Capacity formula changes.
Colluding databases: any $T$ databases may pool their queries. Privacy must hold against this stronger adversary. Capacity decreases as $T$ grows.
Symmetric PIR (SPIR): the user must learn only $W_\theta$ — not the other files. Two-sided privacy.
Cache-aided PIR: the user has cached some files locally. Cached content can be exploited to reduce the download. CommIT contribution in Chapter 15.

Section 13.4 previews each of these extensions. Chapters 14 and 15 develop them in detail. The point is that PIR is a rich problem family — the classical Sun-Jafar result is the starting point, not the endpoint.

Definition:
PIR with Coded Storage (Preview)

In PIR with coded storage, the $K$ files are not replicated across $N$ databases — instead, they are stored using an $(N, r)$ MDS code: each database stores one $1/r$ -share of every file, such that any $r$ databases collectively store the full library.

Storage cost: $K \cdot (N/r)$ file-units across all databases (vs. $K \cdot N$ for replication). Reduction factor: $r$ .
PIR rate: $C_{\text{PIR-MDS}}(N, K, r) =$ modified geometric formula (Chapter 14 Theorem 14.2.1).
Trade-off: lower storage cost but typically lower PIR rate. The exact rate depends on $r$ .

Coded-storage PIR is the canonical extension when storage cost is a concern. Real cloud deployments routinely use Reed-Solomon coded storage anyway (for redundancy); making the PIR scheme compatible with coded storage is a natural fit.

Coded-Storage PIR

PIR variant where the files are stored across databases via an $(N, r)$ -MDS code (each database holds one share of every file). Reduces aggregate storage by factor $r$ over replication; PIR rate is reduced accordingly.

Definition:
$T$ -Colluding PIR (Preview)

In $T$ -colluding PIR, the privacy guarantee is strengthened: any subset of $T$ databases may pool their queries, and the privacy must hold against this combined adversary. Specifically: $I\!\left(\theta;\, \{Q^{(\theta, n)}\}_{n \in \mathcal{T}}\right) \;=\; 0$ for every $\mathcal{T} \subseteq [N]$ with $|\mathcal{T}| \leq T$ .

The PIR capacity becomes $C_{\text{PIR}}(N, K, T) \;=\; \left(1 + \frac{T}{N} + \frac{T^2}{N^2} + \cdots + \frac{T^{K-1}}{N^{K-1}}\right)^{-1}$ (Sun-Jafar 2018). Setting $T = 1$ recovers the classical case; setting $T = N - 1$ collapses to the trivial rate $1/K$ (no privacy at all).

Operationally: each unit of additional collusion tolerance costs a multiplicative reduction in the capacity. The exact reduction depends on the geometric-series structure of the formula.

$T$ -Colluding PIR

PIR variant where any $T$ databases may collude to learn the user's desired index. Privacy must hold against this stronger adversary. Capacity reduces from Sun-Jafar's formula by a factor depending on $T/N$ .

Definition:
Symmetric PIR (SPIR) (Preview)

In symmetric PIR (SPIR), two privacy requirements hold:

User privacy (as in classical PIR): databases learn nothing about $\theta$ .
Database privacy: the user learns only $W_\theta$ , not the other files. The user's knowledge after the protocol is exactly $W_\theta$ , and nothing about $W_j$ for $j \neq \theta$ .

SPIR is strictly stronger than classical PIR (which allows the user to learn arbitrary derived information about other files). The extra requirement reduces the capacity: $C_{\text{SPIR}}(N, K) = 1 - 1/N$ for all $K$ (Wang, Banawan, Ulukus 2018) — a stronger bound that does not depend on $K$ .

Operationally: SPIR is for settings where both sides need privacy — e.g., genomic research where both the researcher's query and the database's other records must be protected.

Symmetric PIR (SPIR)

PIR variant where both sides have privacy: the databases learn nothing about $\theta$ , and the user learns nothing about other files beyond $W_\theta$ . Capacity is $1 - 1/N$ — strictly less than classical PIR.

Definition:
Cache-Aided PIR (Preview)

In cache-aided PIR, the user has a local cache containing partial information about the library — typically prefetched during a placement phase before the user knows their desired $\theta$ . The PIR protocol exploits the cached content to reduce the download requested from the databases.

Two flavors:

Cached files known to databases: the cache content is publicly known (e.g., a CDN delivered cached files in a prefetch round). The PIR protocol can exploit the cache to reduce the active download.
Cached files unknown to databases: the user's cache content is private — the databases don't know which files are cached. This adds an additional layer of complexity.

Cache-aided PIR is the topic of Chapter 15. The CommIT group's contribution (Wan/Tuninetti/Caire 2021, double-booked in Chapter 7) extends to demand-privacy in cached settings — connecting PIR to the coded-caching framework of Book CC.

Cache-Aided PIR

PIR variant where the user has a local cache of partial library content. The cache reduces the required download from databases. CommIT- relevant variants (Wan/Tuninetti/Caire 2021) handle demand-privacy in cached settings.

PIR Variants and Their Capacities

PIR Variant	Storage	Privacy	Capacity / formula	Chapter
Classical (Sun-Jafar)	Replicated, $K \cdot N$	Non-colluding	$(1 + 1/N + \cdots + 1/N^{K-1})^{-1}$	13
Coded-storage PIR	MDS-coded, $K \cdot N / r$	Non-colluding	Modified formula	14 §1
$T$ -Colluding PIR	Replicated	$T$ -colluding	$(1 + T/N + \cdots + T^{K-1}/N^{K-1})^{-1}$	14 §2
Symmetric PIR	Replicated	Two-sided	$1 - 1/N$	14 §3
Cache-aided PIR	Replicated + user cache	Variable	$O$ -tilde improvements	15
Cache-aided + demand-private (CommIT)	Replicated + cache	User cache hidden from DBs	Open characterization	15

PIR Capacity Across Variants

Plot the PIR capacity for the four major variants (Classical, Coded-storage, $T$ -Colluding, Symmetric) as a function of $N$ for fixed $K$ . Each variant trades off storage, privacy, and rate differently. The classical Sun-Jafar capacity is the highest (most permissive assumptions); SPIR is the lowest (strongest privacy). The plot illustrates the privacy-rate Pareto frontier.

Parameters

K

— files5

N

max — databases12

T

— colluders (for T-colluding)2

Example: PIR Capacities at $N = 5, K = 4$

Compute the capacity for each PIR variant at $N = 5$ databases, $K = 4$ files. Compare with the classical Sun-Jafar baseline.

Solution

Classical Sun-Jafar

$C_{\text{PIR}}(5, 4) = (1 + 1/5 + 1/25 + 1/125)^{-1} = (1.248)^{-1} \approx 0.801$ .

$T = 2$-colluding

$C_{\text{PIR}}(5, 4, 2) = (1 + 2/5 + 4/25 + 8/125)^{-1} = (1.624)^{-1} \approx 0.616$ . About $23\%$ lower than non-colluding.

Symmetric PIR

$C_{\text{SPIR}}(5, 4) = 1 - 1/5 = 0.800$ . Surprisingly close to classical, because $K = 4$ is small. For larger $K$ , classical approaches SPIR's $1 - 1/N$ .

Coded-storage at $r = 3$

$C_{\text{PIR-MDS}}(5, 4, 3) \approx 0.6$ (exact formula in Chapter 14). Lower rate, $5/3$ less storage.

Operational

Classical: best rate, weakest privacy. $T$ -colluding: trade rate for stronger privacy. SPIR: trade rate for two-sided privacy. Coded-storage: trade rate for less storage. Each variant occupies a different point on the privacy-storage-rate Pareto frontier.

Cross-Cutting Themes Across PIR Extensions

Three themes recur across the PIR extensions:

Storage vs. rate trade-off: any reduction in aggregate storage (coded-storage PIR) costs a corresponding reduction in PIR rate.
Privacy vs. rate trade-off: stronger privacy ( $T$ -colluding, SPIR, demand- private cache) costs rate. The exact cost is quantified by the variant-specific capacity formula.
Side information improves rate: when the user has cached content (Chapter 15), the PIR rate can exceed the classical Sun-Jafar rate — extra information at the user is always beneficial.

Each chapter of Part IV explores one of these themes in depth. The information-theoretic framework remains constant: cut-set converse + finite-field IA achievability. The variants differ in which constraints are tight.

Common Mistake: PIR Variants Don't Compose Trivially

Mistake:

Compose two PIR extensions (e.g., coded-storage + $T$ -colluding) by simply applying both constraints, expecting the capacity to be the product of the two individual capacities.

Correction:

PIR variants combine non-trivially. For example, coded-storage + $T$ -colluding PIR has a capacity formula that is not simply the product of the two individual capacities — it requires its own analysis (Tajeddine, El Rouayheb 2018; Sun, Jafar 2018b). Generally, the joint capacity is lower than what either individual extension would predict — additional constraints further restrict the achievable rate region. Always verify the joint capacity is characterized in the literature before deploying compound PIR variants.

⚠️Engineering Note

Choosing a PIR Variant for Production

Production guidance for PIR variant selection:

Multi-cloud PIR (typical): classical Sun-Jafar with $N \geq 3$ non-colluding providers. Highest rate; standard threat model.
Untrusted-collaboration PIR (cross-org): $T$ -colluding with $T = 1$ or $2$ to handle a small number of malicious participants colluding with the server.
Two-sided privacy (medical, financial): Symmetric PIR. Lower rate ( $1 - 1/N$ ) but both database and user are protected.
Storage-constrained (large libraries): Coded-storage PIR with $r$ chosen to balance storage cost vs. PIR rate.
Cache-augmented (wireless edge): cache- aided PIR (Chapter 15). Extra information at the user reduces required download.

Production deployments often combine variants: coded-storage + $T$ -colluding for cross-cloud queries with mild collusion tolerance.

Practical Constraints

•
Multi-cloud: $N \geq 3$ , classical Sun-Jafar
•
Cross-org: $T$ -colluding for collusion tolerance
•
Medical/financial: SPIR for two-sided privacy
•
Storage-constrained: coded-storage trade-off

📋 Ref: Beimel survey 2007; Microsoft SealPIR variants

Key Takeaway

Classical Sun-Jafar PIR is the starting point; Chapters 14–15 explore extensions. Each extension trades the classical assumptions for a different operational constraint — coded storage, colluding databases, two-sided privacy, side information — and quantifies the capacity reduction. The CommIT-relevant contribution in Chapter 15 connects cache-aided PIR to the coded-caching framework of Book CC.

Why This Matters: Looking Ahead: Wireless Variants and AirComp

Part IV (PIR) emphasizes coded / cached structure for privacy-preserving retrieval. Part V (Chapters 16–18) shifts to the wireless setting: AirComp (Chapter 16), wireless FL (Chapter 17), and open problems (Chapter 18). The AirComp framework can be applied to PIR over wireless channels — with specific implications for the retrieval rate. Chapter 17 carries the fifth and final CommIT-group contribution: information-theoretically secure federated representation learning over wireless channels.

Quick Check

For a setting with strong privacy requirements where both the user and the databases need protection, the right PIR variant is:

Classical Sun-Jafar (highest rate, single-sided privacy).

Symmetric PIR (SPIR) — both the user and the databases have privacy guarantees.

$T$ -colluding PIR — handles malicious databases.

Coded-storage PIR — saves storage.