Ferkans — Interactive Telecom Tutor

ex14-1

Easy

Compute $C_{\text{PIR-MDS}}(N, K, r)$ for $N = 6$ , $K = 3$ , $r = 2$ .

Show Hint

Apply $C = (1 + r/N + (r/N)^2)^{-1}$ .

$r/N = 2/6 = 1/3$ .

Solution

Apply formula

$C = (1 + 1/3 + 1/9)^{-1} = (13/9)^{-1} = 9/13 \approx 0.692$ .

Sanity

Compare with $1 - r/N = 1 - 1/3 = 0.667$ (asymptotic limit). At $K = 3$ , capacity is slightly above the asymptote.

ex14-2

Easy

Compute $C_{\text{PIR}}(N, K, T)$ for $N = 4$ , $K = 3$ , $T = 2$ .

Show Hint

$T/N = 1/2$ .

Sum the geometric series.

Solution

Apply formula

$C = (1 + 1/2 + 1/4)^{-1} = (7/4)^{-1} = 4/7 \approx 0.571$ .

Compare with classical

$C_{\text{PIR}}(4, 3, 1) = (1 + 1/4 + 1/16)^{-1} = 16/21 \approx 0.762$ . Going from $T = 1$ to $T = 2$ costs $\approx 25\%$ rate.

ex14-3

Easy

Compute $C_{\text{SPIR}}(N, K)$ for $N = 5, K = 100$ and explain why $K$ doesn't appear in the formula.

Show Hint

Use $C_{ ext{SPIR}} = 1 - 1/N$ .

Recall the database-privacy requirement.

Solution

Apply formula

$C_{\text{SPIR}}(5, 100) = 1 - 1/5 = 0.8$ .

$K$-independence

SPIR's database-privacy requirement forces the user to learn only $W_\theta$ — regardless of how many other files exist. The other files contribute zero to the achievable rate. Consequently, $K$ doesn't enter the formula — only $N$ matters.

ex14-4

Medium

For $N = 12, K = 4$ , compute the storage-rate pairs (in (file-units, rate)) for $r = 1, 2, 3, 4, 6$ . Plot or tabulate the Pareto frontier.

Show Hint

Aggregate storage = $K \cdot N / r$ file-units.

Apply the coded-storage capacity formula.

Solution

$r = 1$ (replication)

Storage: $4 \cdot 12 = 48$ units. Rate: $C(12, 4, 1) = (1 + 1/12 + 1/144 + 1/1728)^{-1} \approx 0.917$ .

$r = 2$

Storage: $24$ units. Rate: $(1 + 2/12 + 4/144 + 8/1728)^{-1} \approx 0.838$ .

$r = 3$

Storage: $16$ units. Rate: $(1 + 3/12 + 9/144 + 27/1728)^{-1} \approx 0.760$ .

$r = 4$

Storage: $12$ units. Rate: $(1 + 4/12 + 16/144 + 64/1728)^{-1} \approx 0.683$ .

$r = 6$

Storage: $8$ units. Rate: $(1 + 6/12 + 36/144 + 216/1728)^{-1} \approx 0.533$ .

Pareto observation

The Pareto frontier is convex (in storage, rate space). The rate/storage efficiency peaks around $r = 3$ — a useful operating point for cost-conscious deployments.

ex14-5

Medium

A PIR deployment uses $N = 6$ databases. The operator estimates $T_{\text{actual}} = 2$ databases could be compromised in the worst case (e.g., shared hypervisor). Compute the capacity loss from using $T = 2$ vs. $T = 1$ for $K = 4$ files.

Show Hint

Compute capacity for both $T$ values.

Express the difference as percentage of classical capacity.

Solution

$T = 1$ (classical)

$C(6, 4, 1) = (1 + 1/6 + 1/36 + 1/216)^{-1} = 216/259 \approx 0.834$ .

$T = 2$

$C(6, 4, 2) = (1 + 2/6 + 4/36 + 8/216)^{-1} = 216/335 \approx 0.645$ .

Capacity loss

Absolute loss: $0.834 - 0.645 = 0.189$ . Relative loss: $0.189 / 0.834 \approx 22.7\%$ .

Operational

Going from $T = 1$ to $T = 2$ costs $\sim 23\%$ rate. The operator should decide whether the security improvement is worth the bandwidth.

ex14-6

Medium

Argue that without shared randomness across databases, SPIR cannot achieve any positive rate. Use the database-privacy condition.

Show Hint

Without randomness, answers are deterministic functions of queries and files.

What constraints does $I(W_{k\neq\theta}; \text{user view}) = 0$ impose?

Solution

Setup

Without shared randomness, the answers $A^{(\theta, n)} = f_n(Q^{(\theta, n)}, W_1, \ldots, W_K)$ are deterministic functions of the query and the files.

Database privacy implication

For any function $g$ of the user's view, $g(\{Q^{(\theta, n)}, A^{(\theta, n)}\})$ depends deterministically on $W_j$ for $j \neq \theta$ . The user can construct a function that equals one bit of $W_j$ — violating database privacy.

Conclusion

Database privacy with deterministic answers and a single query realization is impossible. Therefore positive-rate SPIR requires shared randomness as a masking mechanism.

ex14-7

Medium

For PIR with coded storage and $T$ -colluding, the Freij-Hollanti scheme requires $r + T - 1 < N$ . Tabulate the feasible $(r, T)$ pairs for $N = 8$ .

Show Hint

List all $r \in [1, 8], T \in [1, 8]$ with $r + T \leq N$ .

Solution

Feasibility condition

$r + T - 1 < 8$ , i.e., $r + T \leq 8$ .

Feasible pairs

$(r, T)$ : $(1, 1), (1, 2), \ldots, (1, 7), (2, 1), (2, 2), \ldots, (2, 6), \ldots, (7, 1)$ . Total: $\binom{8}{2}$ feasible pairs (anti-diagonal of the $7 \times 7$ grid below the diagonal $r + T = 8$ ).

Operational

At $N = 8$ : can have $(r, T) = (3, 3)$ for $3$ x storage savings and $3$ -collusion tolerance. Or $(r, T) = (2, 4)$ for $2$ x storage and $4$ -collusion. Many combinations available.

ex14-8

Medium

As $K \to \infty$ (large library), compare the asymptotic rates of: (a) classical PIR with $N = 5$ , (b) $T = 2$ -colluding PIR with $N = 5$ , (c) coded-storage with $r = 2, N = 5$ , (d) SPIR with $N = 5$ .

Show Hint

Each formula has a known $K \to \infty$ limit.

Solution

(a) Classical

$C \to 1 - 1/N = 1 - 1/5 = 0.8$ .

(b) $T = 2$

$C \to 1 - T/N = 1 - 2/5 = 0.6$ .

(c) Coded $r = 2$

$C \to 1 - r/N = 1 - 2/5 = 0.6$ (same as $T = 2$ , by symmetry of the formula — coincidence, not equivalence).

(d) SPIR

$C = 1 - 1/N = 0.8$ for all $K$ . No asymptotic statement needed.

Observation

Classical PIR and SPIR have the same asymptotic rate ( $1 - 1/N$ ). For large $K$ , the SPIR cost is essentially zero. For small $K$ , classical is strictly better (and SPIR is necessary if both sides need privacy).

ex14-9

Medium

Sketch how to use a Shamir-style $(T, N)$ polynomial to construct queries for $T$ -colluding PIR. Why does the polynomial degree need to be exactly $T$ (not less, not more)?

Show Hint

Shamir privacy: any $T$ polynomial evaluations look uniformly random to an evaluator.

Decoding: requires polynomial interpolation.

Solution

Construction

Each query symbol is constructed as a polynomial $p(x)$ of degree $T$ over $\mathbb{F}_q$ , with $T + 1$ coefficients chosen pseudo-randomly. The polynomial is evaluated at $N$ distinct points, with the evaluations sent to the $N$ databases.

Privacy from degree $T$

Shamir's privacy guarantee: any $T$ evaluations of a degree- $T$ polynomial are statistically independent of the polynomial's secret-encoded coefficient (the desired index $\theta$ ).

Why not lower degree?

Polynomial of degree $< T$ would not provide $T$ -privacy: $T$ evaluations determine the polynomial uniquely, revealing $\theta$ .

Why not higher degree?

Higher degree wastes the rate: the polynomial can be reconstructed by the databases jointly, but the user needs to wait for at least $\deg + 1$ evaluations to decode. Excess degree means lower rate. Degree exactly $T$ is the sweet spot.

ex14-10

Hard

Prove that the shared randomness $S$ in any SPIR scheme must satisfy $H(S) \geq L / (N - 1)$ bits per file, where $L$ is the file length.

Show Hint

Database privacy requires marginal independence of the user's view from $W_{k \neq \theta}$ .

Use the chain rule on the database-privacy mutual information.

The randomness must mask each database's contribution to the user's view of other files.

Solution

Setup

Let the answers $A^{(\theta, 1)}, \ldots, A^{(\theta, N)}$ each be a function of $(Q^{(\theta, n)}, W_1, \ldots, W_K, S)$ .

Database-privacy chain

$H(\{W_{k \neq \theta}\} | A^{(\theta, 1)}, \ldots, A^{(\theta, N)}, Q^{(\theta, *)}) = H(\{W_{k \neq \theta}\})$ by privacy. Equivalently: each answer must be conditionally independent of $W_{k \neq \theta}$ given $S$ and $\{W_\theta\}$ .

Counting bits

Each answer carries information about the structure (depending on $\theta$ ) but must be statistically masked from $W_{k \neq \theta}$ . The masking requires entropy from $S$ proportional to the file content $L$ — at minimum $L/(N-1)$ bits.

Tight construction

Sun-Jafar's SPIR construction achieves exactly $H(S) = L/(N-1)$ , proving the bound is tight.

ex14-11

Hard

For $N = 10, K = 5, r = 3, T = 3$ , compute: (a) classical PIR capacity, (b) coded-storage capacity ( $r = 3$ ), (c) $T$ -colluding capacity ( $T = 3$ ), (d) the Freij-Hollanti achievable joint rate, (e) the product of (b) and (c) divided by (a) — the "naive composition" estimate. Compare (d) and (e).

Show Hint

Use the explicit formulas; check $r + T - 1 = 5 < 10$ feasibility.

Naive composition is not the right formula but a useful baseline.

Solution

(a) Classical

$C(10, 5, 1) = (1 + 0.1 + 0.01 + 0.001 + 0.0001)^{-1} = 1/1.1111 \approx 0.900$ .

(b) Coded $r = 3$

$C(10, 5, 3) = (1 + 0.3 + 0.09 + 0.027 + 0.0081)^{-1} = 1/1.4251 \approx 0.702$ .

(c) $T = 3$

$C(10, 5, 3) = (1 + 0.3 + 0.09 + 0.027 + 0.0081)^{-1} \approx 0.702$ (same as (b) by formula coincidence).

(d) Freij-Hollanti joint

$r + T - 1 = 5$ . $R = (10 - 5)/10 \cdot (1 - (5/10)^5)^{-1} = 0.5 \cdot (1 - 0.03125)^{-1} = 0.5 \cdot 1.0323 \approx 0.516$ .

(e) Naive composition

$(0.702)(0.702) / 0.900 \approx 0.548$ . Predicted by naive product rule.

Comparison

Joint actual: $0.516$ . Naive composition: $0.548$ . Naive composition overestimates the joint rate by $\sim 6\%$ . The actual joint rate is harder to achieve than the product rule suggests.

ex14-12

Hard

For $N = 4$ databases, plot $C_{\text{PIR}}(4, K) - C_{\text{SPIR}}(4, K)$ as a function of $K$ for $K = 2, 3, \ldots, 20$ . At what $K$ does the gap drop below $0.001$ ?

Show Hint

$C_{\text{SPIR}}(4, K) = 0.75$ (constant).

$C_{\text{PIR}}(4, K)$ approaches $0.75$ from above as $K$ grows.

Solution

Setup

SPIR: $0.75$ for all $K$ . Classical: $C(4, K) = 4/(4 \cdot \sum_{i=0}^{K-1} (1/4)^i) = (1 - 1/4)/(1 - (1/4)^K) = 0.75 / (1 - 4^{-K})$ .

Gap formula

$\text{Gap}(K) = 0.75 / (1 - 4^{-K}) - 0.75 = 0.75 \cdot 4^{-K} / (1 - 4^{-K}) \approx 0.75 \cdot 4^{-K}$ for large $K$ .

Crossing $0.001$

$0.75 \cdot 4^{-K} = 0.001 \Rightarrow 4^{-K} = 4/3 \cdot 10^{-3} \Rightarrow -K \log 4 = \log(4/3) - 3 \Rightarrow K \approx 4.7$ . So at $K = 5$ the gap drops below $0.001$ .

Verification

$K = 5$ : gap $= 0.75 \cdot 4^{-5}/(1 - 4^{-5}) = 0.75 / 1023 \approx 0.000733$ . Confirms $K = 5$ is the crossover.

Operational

For any library with $\geq 5$ files at $N = 4$ databases, SPIR's rate cost over classical PIR is below $0.1\%$ — essentially free.

ex14-13

Hard

A health-care provider deploys PIR with $N = 5$ medical-record databases. Two threat scenarios: (1) one database operator is curious; (2) up to two operators may collude (e.g., shared admin team). Both require user privacy. Additionally, HIPAA requires that the user should not learn other patients' records. Specify the correct PIR variant and parameters; quantify the rate cost vs. classical Sun-Jafar at $N = 5, K = 1000$ (large library).

Show Hint

Two-sided privacy: SPIR.

$T = 2$ collusion + SPIR + classical replicated storage.

Solution

Identify constraints

User privacy: yes (PIR baseline).
Database privacy: yes (HIPAA-style).
Collusion tolerance: $T = 2$ .
Storage: replicated (no constraint mentioned).

Variant choice

SPIR + $T = 2$ -colluding. Combines two of the §14.4 extensions.

Rate estimation

SPIR with $T$ -colluding: outer bound is $1 - T/N = 1 - 2/5 = 0.6$ (the $K$ -independent SPIR-style limit adjusted for collusion). Achievable rate is $\leq 0.6$ .

Comparison with classical

Classical Sun-Jafar at $K = 1000, N = 5$ : $C \approx 0.8$ (asymptotic). Loss from full requirement set: $\sim 25\%$ rate.

Operational

Acceptable for medical applications — the $0.6$ rate corresponds to downloading $\sim 1.67\times$ the target file. Acceptable bandwidth cost for HIPAA compliance.

ex14-14

Hard

Consider an SPIR deployment with $N = 4$ databases needing $H(S) = L/3$ bits of shared randomness per file. With files of size $L = 1$ MB and a library of $K = 10000$ files, compute: (a) total shared randomness across the deployment; (b) one method to distribute it without a trusted setup.

Show Hint

Each file requires $L/(N-1) = L/3$ bits.

Shamir secret sharing of a master key, then derive per-file masks via PRG.

Solution

(a) Total shared randomness

Per-file: $L/3 = 1\text{ MB}/3 \approx 333\text{ kB}$ . Across $10000$ files: $\sim 3.33\text{ GB}$ of shared randomness needed.

(b) Distribution method

Generate master seed $K \in \{0, 1\}^{256}$ .
Distribute $K$ via $(N-1, N)$ -Shamir secret sharing across the $N$ databases. Each database gets a share that, combined with $N-1$ others, reconstructs $K$ .
Each database derives per-file masks via $\text{PRG}(K, \text{file\_id})$ .
For each PIR query, the databases use the corresponding pseudo-random masks (deterministic given $K$ and query parameters).
The user never learns $K$ .

Compute and storage trade-off

Storage: $N \cdot 256$ bits for the shares — negligible. Compute: $O(L)$ PRG evaluations per query. Avoids distributing $3.33$ GB of true randomness.

Caveat

With a PRG, the SPIR is now computationally secure, not information-theoretically secure. For IT-secure SPIR, true randomness must be distributed (e.g., via OTP- like setup).

ex14-15

Challenge

The capacity of coded-storage + $T$ -colluding PIR for general $K$ is open. Sketch the structure of the cut-set converse argument and explain why the achievable rate (Freij-Hollanti) and the outer bound do not match for finite $K$ .

Show Hint

The achievable rate uses both MDS and Shamir simultaneously.

The cut-set converse may not capture the joint constraint optimally.

Solution

Cut-set converse setup

Choose any subset $\mathcal{S} \subseteq [N]$ of size $r + T - 1$ databases.

From the MDS structure: the entropy of their joint storage is $\geq L \cdot (r + T - 1)/r$ .
From the $T$ -collusion privacy: the entropy of the queries to $\mathcal{S}$ is $\geq H(\theta)$ if $|\mathcal{S}| \geq T$ .
Symmetrize, normalize, recurse.

Where the gap arises

The cut-set converse uses one cut at a time. For finite $K$ , the joint constraint may be tighter when applied across multiple cuts simultaneously — but this is not captured by single- cut techniques.

Why the gap matters

The achievable rate ( $\sim$ Freij-Hollanti) is conjectured to be optimal, but no matching converse is known. Closing this gap requires either: (i) a tighter outer bound (multi-cut) or (ii) a tighter achievable scheme.

Open status

$K = 2$ : capacity is settled. $K \geq 3$ : gap remains open. Recent work (e.g., Jia-Sun-Jafar 2019) has narrowed the gap but not closed it.

Suggested approach

Connection to coded-computing capacity (Chapters 5, 8) might offer new techniques. The structure of the joint problem resembles a coded secret-sharing instance — known to have rate $\sim 1 - r/N$ in similar contexts.

Exercises

ex14-1

Apply formula

Sanity

ex14-2

Apply formula

Compare with classical

ex14-3

Apply formula

$K$-independence

ex14-4

$r = 1$ (replication)

$r = 2$

$r = 3$

$r = 4$

$r = 6$

Pareto observation

ex14-5

$T = 1$ (classical)

$T = 2$

Capacity loss

Operational

ex14-6

Setup

Database privacy implication

Conclusion

ex14-7

Feasibility condition

Feasible pairs

Operational

ex14-8

(a) Classical

(b) $T = 2$

(c) Coded $r = 2$

(d) SPIR

Observation

ex14-9

Construction

Privacy from degree $T$

Why not lower degree?

Why not higher degree?

ex14-10

Setup

Database-privacy chain

Counting bits

Tight construction

ex14-11

(a) Classical

(b) Coded $r = 3$

(c) $T = 3$

(d) Freij-Hollanti joint

(e) Naive composition

Comparison

ex14-12

Setup

Gap formula

Crossing $0.001$

Verification

Operational

ex14-13

Identify constraints

Variant choice

Rate estimation

Comparison with classical

Operational

ex14-14

(a) Total shared randomness

(b) Distribution method

Compute and storage trade-off

Caveat

ex14-15

Cut-set converse setup

Where the gap arises

Why the gap matters

Open status

Suggested approach