Exercises
ex14-1
EasyCompute for , , .
Apply .
.
Apply formula
.
Sanity
Compare with (asymptotic limit). At , capacity is slightly above the asymptote.
ex14-2
EasyCompute for , , .
.
Sum the geometric series.
Apply formula
.
Compare with classical
. Going from to costs rate.
ex14-3
EasyCompute for and explain why doesn't appear in the formula.
Use .
Recall the database-privacy requirement.
Apply formula
.
$K$-independence
SPIR's database-privacy requirement forces the user to learn only β regardless of how many other files exist. The other files contribute zero to the achievable rate. Consequently, doesn't enter the formula β only matters.
ex14-4
MediumFor , compute the storage-rate pairs (in (file-units, rate)) for . Plot or tabulate the Pareto frontier.
Aggregate storage = file-units.
Apply the coded-storage capacity formula.
$r = 1$ (replication)
Storage: units. Rate: .
$r = 2$
Storage: units. Rate: .
$r = 3$
Storage: units. Rate: .
$r = 4$
Storage: units. Rate: .
$r = 6$
Storage: units. Rate: .
Pareto observation
The Pareto frontier is convex (in storage, rate space). The rate/storage efficiency peaks around β a useful operating point for cost-conscious deployments.
ex14-5
MediumA PIR deployment uses databases. The operator estimates databases could be compromised in the worst case (e.g., shared hypervisor). Compute the capacity loss from using vs. for files.
Compute capacity for both values.
Express the difference as percentage of classical capacity.
$T = 1$ (classical)
.
$T = 2$
.
Capacity loss
Absolute loss: . Relative loss: .
Operational
Going from to costs rate. The operator should decide whether the security improvement is worth the bandwidth.
ex14-6
MediumArgue that without shared randomness across databases, SPIR cannot achieve any positive rate. Use the database-privacy condition.
Without randomness, answers are deterministic functions of queries and files.
What constraints does impose?
Setup
Without shared randomness, the answers are deterministic functions of the query and the files.
Database privacy implication
For any function of the user's view, depends deterministically on for . The user can construct a function that equals one bit of β violating database privacy.
Conclusion
Database privacy with deterministic answers and a single query realization is impossible. Therefore positive-rate SPIR requires shared randomness as a masking mechanism.
ex14-7
MediumFor PIR with coded storage and -colluding, the Freij-Hollanti scheme requires . Tabulate the feasible pairs for .
List all with .
Feasibility condition
, i.e., .
Feasible pairs
: . Total: feasible pairs (anti-diagonal of the grid below the diagonal ).
Operational
At : can have for x storage savings and -collusion tolerance. Or for x storage and -collusion. Many combinations available.
ex14-8
MediumAs (large library), compare the asymptotic rates of: (a) classical PIR with , (b) -colluding PIR with , (c) coded-storage with , (d) SPIR with .
Each formula has a known limit.
(a) Classical
.
(b) $T = 2$
.
(c) Coded $r = 2$
(same as , by symmetry of the formula β coincidence, not equivalence).
(d) SPIR
for all . No asymptotic statement needed.
Observation
Classical PIR and SPIR have the same asymptotic rate (). For large , the SPIR cost is essentially zero. For small , classical is strictly better (and SPIR is necessary if both sides need privacy).
ex14-9
MediumSketch how to use a Shamir-style polynomial to construct queries for -colluding PIR. Why does the polynomial degree need to be exactly (not less, not more)?
Shamir privacy: any polynomial evaluations look uniformly random to an evaluator.
Decoding: requires polynomial interpolation.
Construction
Each query symbol is constructed as a polynomial of degree over , with coefficients chosen pseudo-randomly. The polynomial is evaluated at distinct points, with the evaluations sent to the databases.
Privacy from degree $T$
Shamir's privacy guarantee: any evaluations of a degree- polynomial are statistically independent of the polynomial's secret-encoded coefficient (the desired index ).
Why not lower degree?
Polynomial of degree would not provide -privacy: evaluations determine the polynomial uniquely, revealing .
Why not higher degree?
Higher degree wastes the rate: the polynomial can be reconstructed by the databases jointly, but the user needs to wait for at least evaluations to decode. Excess degree means lower rate. Degree exactly is the sweet spot.
ex14-10
HardProve that the shared randomness in any SPIR scheme must satisfy bits per file, where is the file length.
Database privacy requires marginal independence of the user's view from .
Use the chain rule on the database-privacy mutual information.
The randomness must mask each database's contribution to the user's view of other files.
Setup
Let the answers each be a function of .
Database-privacy chain
by privacy. Equivalently: each answer must be conditionally independent of given and .
Counting bits
Each answer carries information about the structure (depending on ) but must be statistically masked from . The masking requires entropy from proportional to the file content β at minimum bits.
Tight construction
Sun-Jafar's SPIR construction achieves exactly , proving the bound is tight.
ex14-11
HardFor , compute: (a) classical PIR capacity, (b) coded-storage capacity (), (c) -colluding capacity (), (d) the Freij-Hollanti achievable joint rate, (e) the product of (b) and (c) divided by (a) β the "naive composition" estimate. Compare (d) and (e).
Use the explicit formulas; check feasibility.
Naive composition is not the right formula but a useful baseline.
(a) Classical
.
(b) Coded $r = 3$
.
(c) $T = 3$
(same as (b) by formula coincidence).
(d) Freij-Hollanti joint
. .
(e) Naive composition
. Predicted by naive product rule.
Comparison
Joint actual: . Naive composition: . Naive composition overestimates the joint rate by . The actual joint rate is harder to achieve than the product rule suggests.
ex14-12
HardFor databases, plot as a function of for . At what does the gap drop below ?
(constant).
approaches from above as grows.
Setup
SPIR: for all . Classical: .
Gap formula
for large .
Crossing $0.001$
. So at the gap drops below .
Verification
: gap . Confirms is the crossover.
Operational
For any library with files at databases, SPIR's rate cost over classical PIR is below β essentially free.
ex14-13
HardA health-care provider deploys PIR with medical-record databases. Two threat scenarios: (1) one database operator is curious; (2) up to two operators may collude (e.g., shared admin team). Both require user privacy. Additionally, HIPAA requires that the user should not learn other patients' records. Specify the correct PIR variant and parameters; quantify the rate cost vs. classical Sun-Jafar at (large library).
Two-sided privacy: SPIR.
collusion + SPIR + classical replicated storage.
Identify constraints
- User privacy: yes (PIR baseline).
- Database privacy: yes (HIPAA-style).
- Collusion tolerance: .
- Storage: replicated (no constraint mentioned).
Variant choice
SPIR + -colluding. Combines two of the Β§14.4 extensions.
Rate estimation
SPIR with -colluding: outer bound is (the -independent SPIR-style limit adjusted for collusion). Achievable rate is .
Comparison with classical
Classical Sun-Jafar at : (asymptotic). Loss from full requirement set: rate.
Operational
Acceptable for medical applications β the rate corresponds to downloading the target file. Acceptable bandwidth cost for HIPAA compliance.
ex14-14
HardConsider an SPIR deployment with databases needing bits of shared randomness per file. With files of size MB and a library of files, compute: (a) total shared randomness across the deployment; (b) one method to distribute it without a trusted setup.
Each file requires bits.
Shamir secret sharing of a master key, then derive per-file masks via PRG.
(a) Total shared randomness
Per-file: . Across files: of shared randomness needed.
(b) Distribution method
- Generate master seed .
- Distribute via -Shamir secret sharing across the databases. Each database gets a share that, combined with others, reconstructs .
- Each database derives per-file masks via .
- For each PIR query, the databases use the corresponding pseudo-random masks (deterministic given and query parameters).
- The user never learns .
Compute and storage trade-off
Storage: bits for the shares β negligible. Compute: PRG evaluations per query. Avoids distributing GB of true randomness.
Caveat
With a PRG, the SPIR is now computationally secure, not information-theoretically secure. For IT-secure SPIR, true randomness must be distributed (e.g., via OTP- like setup).
ex14-15
ChallengeThe capacity of coded-storage + -colluding PIR for general is open. Sketch the structure of the cut-set converse argument and explain why the achievable rate (Freij-Hollanti) and the outer bound do not match for finite .
The achievable rate uses both MDS and Shamir simultaneously.
The cut-set converse may not capture the joint constraint optimally.
Cut-set converse setup
Choose any subset of size databases.
- From the MDS structure: the entropy of their joint storage is .
- From the -collusion privacy: the entropy of the queries to is if .
- Symmetrize, normalize, recurse.
Where the gap arises
The cut-set converse uses one cut at a time. For finite , the joint constraint may be tighter when applied across multiple cuts simultaneously β but this is not captured by single- cut techniques.
Why the gap matters
The achievable rate ( Freij-Hollanti) is conjectured to be optimal, but no matching converse is known. Closing this gap requires either: (i) a tighter outer bound (multi-cut) or (ii) a tighter achievable scheme.
Open status
: capacity is settled. : gap remains open. Recent work (e.g., Jia-Sun-Jafar 2019) has narrowed the gap but not closed it.
Suggested approach
Connection to coded-computing capacity (Chapters 5, 8) might offer new techniques. The structure of the joint problem resembles a coded secret-sharing instance β known to have rate in similar contexts.