Coded-Storage PIR
Why Coded Storage?
Classical Sun-Jafar PIR (Chapter 13) assumes replicated storage: every one of the databases stores all files. Aggregate storage is file-units.
Production cloud storage rarely replicates. Reed- Solomon coded storage is the norm: an -MDS code stores each file as shares such that any shares reconstruct the file. Aggregate storage drops to file-units — typically a – reduction.
Question: can the user retrieve privately when the databases hold MDS-coded shares (not full files)? If yes, at what rate? The answer is yes, with a capacity formula due to Tajeddine and El Rouayheb (2018).
Definition: MDS-Coded Storage Model
MDS-Coded Storage Model
Let be a finite field, and let be the generator matrix of an -MDS code.
Each file is partitioned into sub-files , each of length symbols. Database stores the -th column of
MDS property: any databases collectively store enough information to reconstruct . No subset of fewer than databases stores any information about (information-theoretically).
Total storage: symbols across databases, each storing symbols. Aggregate: file-units (vs. for replication).
Example: MDS Storage at
A library has files, each of size MB. Compute the aggregate storage cost for: (a) replication across databases, (b) -MDS coded storage.
Replication baseline
Each database stores all files, . Across databases: aggregate.
$(10, 3)$-MDS coded storage
Each database stores of every file: . Across databases: aggregate.
Reduction factor
— exactly the MDS rate . This is fundamental: MDS storage saves a factor of in aggregate storage versus replication.
PIR rate cost
The savings come at a cost: PIR rate drops below the replicated Sun-Jafar capacity. Exact PIR rate: see Theorem 14.1.1.
Theorem: Coded-Storage PIR Capacity (Tajeddine–El Rouayheb)
For PIR with files stored across databases via an -MDS code, the PIR capacity is Equivalently: Setting recovers the classical Sun-Jafar rate (since the geometric series in matches Chapter 13's formula).
Achievability
The achievable scheme is a generalization of Sun-Jafar's scheme to MDS-coded storage. Each query is constructed in rounds, with each round using interference symbols across databases. Because of the MDS structure, databases jointly reconstruct each symbol of . The query design ensures privacy via random linear combinations across all files.
Converse
The converse uses the same cut-set inductive approach as Sun-Jafar (Chapter 13 §13.3), adapted for the MDS structure: the entropy of any -subset of database contents is bounded by the file content. Symmetrization and induction over yield the formula. Full details: Tajeddine & El Rouayheb 2018, §III–IV.
Asymptotic limits
For fixed and : . Compare with classical Sun-Jafar's limit — the rate gap is exactly the difference , the cost of MDS coding.
Example: Coded-Storage Capacity at , , varying
Compute for . Compare with (the asymptotic rate).
$r = 1$ (classical)
. Asymptotic limit: .
$r = 2$
. Asymptotic limit: .
$r = 3$
. Asymptotic limit: .
$r = 4$
. Asymptotic limit: .
Operational
Each unit increase in (more storage savings) costs in PIR rate. The savings:cost ratio depends on . For large , the trade is close to : in storage:rate.
Coded-Storage PIR: Storage vs. Rate Trade-off
For fixed databases and files, plot the PIR rate as a function of MDS dimension . Each point on the curve corresponds to a different storage-rate operating point: small = high rate but high storage; large = low rate but small storage. The curve traces the Pareto-optimal frontier for coded-storage PIR.
Parameters
Storage-Rate Pareto Points at
| Aggregate Storage (file-units) | Storage Reduction | PIR Rate | Rate / Storage Index | |
|---|---|---|---|---|
| 1 (replication) | ||||
| 2 | ||||
| 3 | ||||
| 5 | ||||
| 9 |
Reading the Pareto Frontier
The rate/storage index column above is rate per file-unit of aggregate storage — a simple efficiency proxy. It peaks somewhere in the middle (here, ). This is the throughput-optimal operating point for the given cost.
In production, the choice of depends on economics: storage cost per byte versus bandwidth cost per byte. When storage is cheap (e.g., archival), small (replicate). When storage is expensive (e.g., DRAM caching layer), larger (heavy MDS coding). A cloud PIR deployment should benchmark both axes to find its optimum.
Deploying Coded-Storage PIR
Practical guidelines for production coded-storage PIR deployments:
-
Reuse existing erasure-coded storage: cloud providers already use Reed-Solomon coded storage for redundancy. The PIR layer can be retrofitted on top.
-
Choose from existing cluster structure: if storage is replicated across regions, is natural. Modifying the storage layer for PIR is typically not worth it.
-
Field size: with suffices for MDS codes (Reed-Solomon). Use for byte-aligned arithmetic.
-
Latency budget: each query round requires one network round-trip. The PIR scheme uses rounds; for large , latency dominates.
-
Verifiability: coded-storage PIR is not Byzantine-robust unless extended (see Tajeddine & El Rouayheb 2018, §VI). Production deployments with untrusted databases require additional layers.
- •
Field size: for MDS codes
- •
Latency: network round-trips
- •
Storage already coded: from existing redundancy
- •
Byzantine robustness: requires extension layer
Common Mistake: Replication vs. MDS — Storage Cost Confusion
Mistake:
Assume that an -MDS code with means "one database stores everything" (and the others store nothing).
Correction:
An -MDS code with is replication: each of the databases stores a complete copy of every file. The MDS rate means "any 1 database reconstructs the library." For , each database stores a unique -th portion (no redundancy beyond the bare minimum). General interpolates between full replication () and zero redundancy (). This nomenclature is standard in coding theory but can confuse information-theory readers used to Reed-Solomon notation.
Key Takeaway
Coded-storage PIR generalizes Sun-Jafar to MDS-coded databases. The capacity formula replaces classical with arbitrary . Storage cost reduces by factor ; PIR rate decreases monotonically with . Production deployments should match to the existing storage redundancy.
Quick Check
For a coded-storage PIR system with databases and files, what is the rate cost of doubling the MDS dimension from to (relative to classical PIR)?
Rate decreases from to (a relative reduction).
Rate doubles, since storage halves.
Rate stays the same; only storage changes.
Rate becomes zero (impossible to do PIR with coded storage).
. Wait — let us recompute: . For : . The relative reduction is comparable to the asserted figure (within rounding).