Ferkans — Interactive Telecom Tutor

ex-ch13-01

Easy

State the classical PIR threat model and give the privacy guarantee in mutual-information terms.

Solution

Threat model

$N$ honest-but-curious databases (each follows the protocol but analyzes its received query), non-colluding (no inter-database information sharing). User wants file $W_\theta$ with $\theta \in [K]$ uniform.

Privacy guarantee

For each database $n$ : $I(\theta; Q^{(\theta, n)}) = 0$ . The query distribution at any single database is independent of the desired index — perfect privacy in the information-theoretic sense.

ex-ch13-02

Easy

Compute the Sun-Jafar PIR capacity for the following $(N, K)$ pairs: $(2, 3)$ , $(5, 5)$ , $(10, 10)$ .

Show Hint

$C_{\text{PIR}} = (1 + 1/N + \cdots + 1/N^{K-1})^{-1}$ .

Solution

$(2, 3)$

$C = (1 + 1/2 + 1/4)^{-1} = (7/4)^{-1} = 4/7 \approx 0.571$ .

$(5, 5)$

$C = (1 + 0.2 + 0.04 + 0.008 + 0.0016)^{-1} = (1.2496)^{-1} \approx 0.800$ .

$(10, 10)$

$C = (1 + 0.1 + 0.01 + \cdots + 10^{-9})^{-1} \approx (1.111)^{-1} \approx 0.900$ .

Pattern

Capacity grows toward $1$ as $N$ grows, approaches $1 - 1/N$ as $K$ grows.

ex-ch13-03

Easy

Why does the trivial "download everything" PIR achieve rate $1/K$ ? What's the privacy guarantee?

Solution

Rate

File size $L$ , total download $K \cdot L$ (all files). Rate $R = L / (KL) = 1/K$ .

Privacy

Database knows the user wants some file, but doesn't know which — query is identical regardless of $\theta$ . Privacy holds (information-theoretically).

Why suboptimal

Sun-Jafar achieves $C_{\text{PIR}} > 1/K$ for any $N \geq 2$ . The trivial scheme wastes bandwidth by downloading $K - 1$ unwanted files.

ex-ch13-04

Easy

Why must classical PIR have $N \geq 2$ databases? What goes wrong with $N = 1$ ?

Solution

$N = 1$ trivial PIR

With one database, the user must download from it. The query for $W_\theta$ necessarily distinguishes $W_\theta$ from other files (else the database can't return the right answer). Privacy is impossible with information-theoretic guarantees.

Workaround

$N = 1$ PIR is feasible computationally (Kushilevitz-Ostrovsky 1997) using homomorphic encryption — but with weaker guarantees and higher per-query cost. Information-theoretic PIR strictly requires $N \geq 2$ non-colluding databases.

ex-ch13-05

Medium

Sketch the Sun-Jafar PIR scheme for $N = 2$ , $K = 2$ , file length $L = 4$ chunks each. Specify the queries, answers, and rate.

Solution

Setup

$W_1 = (a_1, a_2, a_3, a_4)$ , $W_2 = (b_1, b_2, b_3, b_4)$ . User wants $W_1$ ( $\theta = 1$ ).

User generates random labels

Random permutation of indices ensures privacy. The Sun-Jafar paper gives the precise symmetrization, here we use a representative scheme.

Query DB1

Ask for: $a_1, a_2, b_1, b_2$ — 4 chunks.

Query DB2

Ask for: $a_3 \oplus b_1, a_4 \oplus b_2, b_3, b_4$ — 4 chunks.

Decoding

From DB1: $a_1, a_2, b_1, b_2$ . From DB2: $a_3 \oplus b_1$ , $a_4 \oplus b_2$ , $b_3$ , $b_4$ . XOR with $b_1, b_2$ : gets $a_3, a_4$ . Has all of $W_1$ . ✓

Rate

Total download: 8 chunks. File size: 4 chunks. Rate $R = 4/8 = 1/2$ . The capacity-optimal rate is $C(2, 2) = 2/3$ , so this scheme is sub-optimal — illustrating that the Sun-Jafar capacity-achieving scheme is more involved than a naive symmetric XOR.

ex-ch13-06

Medium

Show that the Sun-Jafar capacity formula can be rewritten as $C_{\text{PIR}} = (1 - 1/N)/(1 - 1/N^K)$ . Verify for $N = 3, K = 4$ .

Show Hint

Use the geometric series sum formula.

Solution

Geometric series

$\sum_{i=0}^{K-1} (1/N)^i = (1 - (1/N)^K) / (1 - 1/N)$ .

Inverting

$C_{\text{PIR}} = (1 - 1/N) / (1 - 1/N^K)$ .

Verify $N = 3, K = 4$

$C = (1 - 1/3)/(1 - 1/81) = (2/3)/(80/81) = 162/240 = 27/40 = 0.675$ . Direct: $(1 + 1/3 + 1/9 + 1/27)^{-1} = (40/27)^{-1} = 27/40 = 0.675$ . ✓

ex-ch13-07

Medium

Derive the asymptotic limits of $C_{\text{PIR}}(N, K)$ as (a) $N \to \infty$ with fixed $K$ , and (b) $K \to \infty$ with fixed $N$ .

Show Hint

Use the closed form $(1 - 1/N)/(1 - 1/N^K)$ .

Solution

(a) $N \to \infty$

$1/N \to 0$ and $1/N^K \to 0$ . $C \to (1 - 0)/(1 - 0) = 1$ . With infinite databases, no privacy overhead.

(b) $K \to \infty$

$1/N$ stays bounded above 0; $1/N^K \to 0$ . $C \to (1 - 1/N)/1 = 1 - 1/N$ . With infinite library, the asymptote depends on $N$ .

Operational

Increase $N$ for unbounded gain; increase $K$ for diminishing returns capped at $1 - 1/N$ .

ex-ch13-08

Medium

Show that no PIR scheme can achieve rate $R > 1$ — even ignoring privacy. What does this mean operationally?

Solution

Upper bound

$R = L / D$ . The download $D$ must contain at least $L$ bits (the file's information). Hence $D \geq L$ and $R \leq 1$ .

Sun-Jafar approaches 1

$C_{\text{PIR}}(N, K) \to 1$ as $N \to \infty$ . So PIR can be almost free of overhead with many databases — but never strictly free.

Operational

Any PIR scheme has some overhead (the privacy "tax"). The capacity formula quantifies the minimum tax: $1 - C_{\text{PIR}} \geq 1/N$ always (the $K \to \infty$ asymptote).

ex-ch13-09

Medium

Explain why classical PIR cannot use a single database with information-theoretic privacy.

Solution

Single-database limitation

With one database, the query $Q^{(\theta, 1)}$ must let the database compute $W_\theta$ as a function. If the query is independent of $\theta$ (privacy requirement), the database can't tell which file to send — the answer must contain all files (rate $1/K$ ).

Computational alternative

Kushilevitz-Ostrovsky 1997: single-database PIR with computational privacy uses homomorphic encryption. The database learns no information about $\theta$ under cryptographic assumptions (e.g., DDH) but processes encrypted queries — much more expensive per query.

Trade-off

Information-theoretic: requires $N \geq 2$ , no computational assumptions. Computational: works with $N = 1$ , relies on cryptographic hardness.

ex-ch13-10

Medium

Compare PIR (this chapter) with secure aggregation (Chapter 10) in terms of what is hidden and who the adversary is.

Solution

What is hidden

SecAgg: individual gradient values $\mathbf{g}_k$ (server can compute the sum but not the components). PIR: identity of accessed file $\theta$ (database can serve files but not learn which one was requested).

Adversary

SecAgg: the server (and possibly $T$ colluding users). PIR: each individual database (or $T$ colluding databases in the $T$ -colluding extension).

Communication structure

SecAgg: $n$ users contribute to one server-side aggregate. Direction: users → server. PIR: one user retrieves from $N$ databases. Direction: user ↔ databases.

Algebraic kinship

Both rely on finite-field IA (Chapter 4). Both have cut-set converses (Chapter 2 §2.4 recipe). Different applications of the same machinery.

ex-ch13-11

Hard

Sketch the Sun-Jafar capacity converse: prove $R \leq (1 + 1/N + \cdots + 1/N^{K-1})^{-1}$ for any classical PIR scheme.

Show Hint

Use induction on $K$ .

Solution

Setup

Let $R^*(N, K)$ be the supremum of achievable rates. We prove $R^* \leq (1 + 1/N + \cdots + 1/N^{K-1})^{-1}$ by induction on $K$ .

Base case $K = 1$

With one file, no privacy overhead is needed — user just downloads $W_1$ from any database. $R^*(N, 1) = 1$ . Matches the formula: $(1)^{-1} = 1$ .

Inductive step $K \to K + 1$

Given a scheme for $K + 1$ files with rate $R$ . Conditioning on knowing one file (say $W_1$ ) and applying the privacy constraint to the remaining $K$ files, the residual download rate satisfies a Sun-Jafar-style bound.

Combine

The recursion gives $R \leq (1 + R^*(N, K)/N)^{-1}$ . Plugging in the inductive hypothesis: $R \leq (1 + 1/N \cdot \sum_{i=0}^{K-1} N^{-i})^{-1} = (\sum_{i=0}^{K} N^{-i})^{-1}$ , matching the formula. $\blacksquare$

Notes

The full proof is in Sun-Jafar 2017 §IV. It involves careful tracking of conditional entropies and the "shaping" of query distributions under the privacy constraint.

ex-ch13-12

Hard

Why does the Sun-Jafar scheme require file length $L = N^K$ ? Why not smaller?

Solution

Block-length necessity

The Sun-Jafar scheme uses $K$ "levels" of query structure. At level $i$ , each database returns $\binom{K-1}{i-1}$ symbols, and the user combines across $N$ databases for $N \cdot \binom{K-1}{i-1}$ total. The file must be split into chunks small enough that this counting works out.

Counting

Over all levels, the file is split into $\sum_{i} N \cdot \binom{K-1}{i-1} \cdot N^{K-i} = N^K$ chunks. This is the smallest block length for capacity achievement.

Sub-optimal smaller schemes

Schemes with $L < N^K$ achieve sub-capacity rate. The gap closes as $L$ increases (rate-distortion-like behavior).

Practical workaround

For small files, aggregate $N^K$ files into one "virtual file" and run PIR on the virtual file. This trades latency for capacity-achievement.

ex-ch13-13

Hard

Compose Sun-Jafar PIR with $T$ -colluding extension (Chapter 14 §14.2). Derive the capacity formula informally and identify when the protocol becomes infeasible.

Solution

Heuristic

With $T$ colluders, the privacy constraint is harder. Each unit of additional collusion "consumes" one unit of the privacy structure. Heuristically, the capacity becomes $C_{\text{PIR}}(N, K, T) \leq (1 + T/N + (T/N)^2 + \cdots + (T/N)^{K-1})^{-1}$ .

Verification

Sun-Jafar 2018 (cited reference) confirms this is the exact capacity. At $T = 1$ : recovers Sun-Jafar 2017.

Infeasibility

$T = N$ : $C \to 1/K$ — equivalent to downloading everything (full collusion defeats PIR). $T = N - 1$ : $C \to 1/K$ similarly.

Practical: $T \leq N/2$ for meaningful rate. Production PIR with $T = 2, 3$ provides robust protection at acceptable rate.

ex-ch13-14

Hard

Discuss why PIR's privacy guarantee is non-adaptive: the user's queries are designed before observing answers. Could an adaptive variant improve the rate?

Solution

Non-adaptive nature

In the standard formulation, the user's queries depend only on $\theta$ and private randomness — not on database responses. Adaptive variants would let later queries depend on earlier responses.

Adaptive doesn't help (mostly)

Sun-Jafar's converse is information- theoretic — adaptivity cannot bypass it. The capacity remains $(1 + 1/N + \cdots + 1/N^{K-1})^{-1}$ for adaptive PIR as well.

Where adaptivity matters

Adaptive PIR can improve latency (process queries in stages) and robustness (handle partial database failures gracefully). These are engineering properties, not information-theoretic capacity.

Side information variants

Cache-aided PIR (Chapter 15) is effectively a form of adaptivity: the user has prior cached information and adapts queries accordingly. Capacity does improve with side information — the cache is a form of "adaptive" knowledge.

ex-ch13-15

Challenge

Open problem. The Sun-Jafar PIR scheme uses file length $L = N^K$ . Can capacity-achieving PIR be constructed for smaller block lengths? Sketch what tradeoffs arise.

Solution

Block length minimum

Sun-Jafar requires $L = N^K$ . Below this, the scheme is sub-capacity. A cleaner construction at smaller block length would be useful in deployment.

Approximate-capacity at smaller L

Approximate-capacity schemes at $L = O(\log K)$ block length exist with rate close to (but not matching) $C_{\text{PIR}}$ . The gap shrinks as $L$ grows.

Optimal block-length-rate tradeoff

The full characterization of the rate achievable at finite $L$ is open. Recent work (Banawan et al. 2018+) provides partial results but the optimal scheme at sub-Sun-Jafar block length remains an active research area.

Operational implication

For deployment, $L = N^K$ is usually manageable (with $N, K$ small enough). For very large libraries, block-length reduction would be a real engineering win. Worth tracking the literature.

Exercises

ex-ch13-01

Threat model

Privacy guarantee

ex-ch13-02

$(2, 3)$

$(5, 5)$

$(10, 10)$

Pattern

ex-ch13-03

Rate

Privacy

Why suboptimal

ex-ch13-04

$N = 1$ trivial PIR

Workaround

ex-ch13-05

Setup

User generates random labels

Query DB1

Query DB2

Decoding

Rate

ex-ch13-06

Geometric series

Inverting

Verify $N = 3, K = 4$

ex-ch13-07

(a) $N \to \infty$

(b) $K \to \infty$

Operational

ex-ch13-08

Upper bound

Sun-Jafar approaches 1

Operational

ex-ch13-09

Single-database limitation

Computational alternative

Trade-off

ex-ch13-10

What is hidden

Adversary

Communication structure

Algebraic kinship

ex-ch13-11

Setup

Base case $K = 1$

Inductive step $K \to K + 1$

Combine

Notes

ex-ch13-12

Block-length necessity

Counting

Sub-optimal smaller schemes

Practical workaround

ex-ch13-13

Heuristic

Verification

Infeasibility

ex-ch13-14

Non-adaptive nature

Adaptive doesn't help (mostly)

Where adaptivity matters

Side information variants

ex-ch13-15

Block length minimum

Approximate-capacity at smaller L

Optimal block-length-rate tradeoff

Operational implication