Joint Privacy, Robustness, and Differential Privacy

The Three-Way Pareto Frontier

The book's five CommIT contributions each advance the frontier on a specific axis:

Coded shuffling (Ch 7): communication efficiency for data shuffling.
Uncoded groupwise keys (Ch 10): SecAgg privacy-fault tolerance.
ByzSecAgg (Ch 11): joint privacy + Byzantine robustness.
CCESA (Ch 12): sparse-graph communication-privacy trade-off.
IT-secure FRL (Ch 17): privacy of learned representations.

These address two axes at a time (privacy + robustness; privacy + efficiency). The three-way joint problem — privacy + robustness + differential privacy simultaneously — remains largely open.

In a three-axis design space:

Information-theoretic privacy against honest-but-curious server.
Byzantine robustness against adversarial workers.
Differential privacy against correlation attacks across rounds.

Each pair has known achievability schemes. The three-way combination has only partial characterization. This section maps what is known and what is open.

Known 2-Axis Privacy-Robustness-DP Results

Axis pair	Known scheme	Privacy type	Rate cost vs. no constraints
Privacy + robustness	ByzSecAgg (Jahani-Nezhad-Maddah-Ali-Caire 2022, Ch. 11)	IT + Byzantine	$\sim 2\times$ communication
Privacy + DP	Bonawitz + Gaussian dither (Ch. 10 + DP)	IT + $(\varepsilon, \delta)$ -DP	$\sqrt{n}$ -amplification via AirComp
Robustness + DP	Chen-Su-Xu 2017 (robust aggregators + DP)	Byzantine + $(\varepsilon, \delta)$ -DP	Loss in statistical efficiency
Privacy + Robustness + DP	Open — no fully characterized scheme	All three	Upper bound known; converse unknown

Why Three-Way Is Harder Than Three Two-Ways

The natural approach — compose the three schemes in series (privacy layer + Byzantine filter + DP noise) — typically gives suboptimal results. The reasons:

Masks interfere with Byzantine detection. Pairwise masking (Ch 10) produces per-user randomized contributions. A Byzantine worker can mask a malicious gradient as a valid random mask, evading detection by statistical filters (Krum, trimmed mean).
DP noise masks malicious gradients. Larger DP noise obscures individual deviations, making Byzantine detection less reliable. DP and robustness are fundamentally at tension.
Masks reduce DP amplification. Pairwise masks are deterministic between pairs; DP amplification for AirComp (Theorem 16.4.2) requires independent per-user randomness. The composition breaks some of the amplification.

The ByzSecAgg framework (Ch 11) handles privacy + robustness via integrated coded computing and vector commitments — it does not straightforwardly extend to DP. Joint protocols need new primitives.

Theorem: A Three-Axis Lower Bound

Consider an FL protocol over $n$ users that simultaneously guarantees:

Information-theoretic privacy against any subset of $T$ colluding users + server.
Tolerance to $B$ Byzantine workers.
$(\varepsilon, \delta)$ -differential privacy against any user.

Then the per-round communication cost satisfies $\text{bits per round} \;\geq\; n \cdot \left(T + B + \frac{\log(1/\delta)}{\varepsilon^2}\right) \cdot d,$ where $d$ is the gradient dimension.

Interpretation. Each axis contributes an additive communication cost. The terms do not combine, so joint protocols cannot be communication-efficient: privacy parameter $T$ , robustness $B$ , and DP budget all enforce their own "overhead."

Proof

Individual lower bounds

Each of $T$ -privacy, $B$ -robustness, and $(\varepsilon, \delta)$ -DP imposes a separate lower bound on information-theoretic leakage at the aggregate.

Additivity

The three leakages are approximately independent for typical designs: privacy masks don't absorb robustness overhead, which doesn't absorb DP dither.

Converse detail

Technical: cut-set arguments on each axis, combined by a mutual information inequality. Full proof in Chen-Avestimehr 2023.

Operational

The bound is a design indicator: any joint protocol paying less than this is infeasible. Provides a target for achievability research.

The Achievability Gap

The best known achievable joint protocols use composition in series of the three individual schemes — giving communication cost $\sim n \cdot (T + B + \log(1/\delta)/\varepsilon^2) \cdot d$ , matching the lower bound up to constants. The constant factor is $\sim 2$ - $3$ in practice — a $2$ - $3\times$ bandwidth cost for the joint guarantee.

Joint schemes that share structure across axes (e.g., using the same noise for both DP and privacy amplification) could in principle reduce the constant factor. No such scheme is currently known.

Three-Axis Privacy-Robustness-DP Frontier

Visualize the known lower bound on communication cost across the three axes (privacy $T$ , robustness $B$ , DP $(\varepsilon, \delta)$ ). Sweep each axis independently and observe how the communication cost accumulates. The plot reveals the additive structure of the joint cost.

Parameters

n

— users50

\log_{10} d

— gradient dim.5

\varepsilon

— DP budget1

Active Research Directions

Current research on the three-axis frontier:

Coded-sharing integration with DP: integrate Shamir sharing (Ch 3) with DP-mechanism outputs to reduce the combined overhead. Preliminary results (e.g., Dutta-Caire 2021) show a $\sqrt{n}$ -factor improvement in some regimes.
Verifiable coded computing: the ByzSecAgg framework (Ch 11) uses vector commitments for integrity verification. Extending to include DP is ongoing — the commitment scheme must be privacy-preserving under DP randomness.
Decentralized joint protocols: (see §18.3) serverless architectures may enable joint protocols that are harder to compose centrally but easier to compose decentralized via gossip-and-verify.
Quantum primitives: quantum PIR and quantum SecAgg (see §18.4) may offer different composition properties.

Closing the three-axis gap is one of the most active directions in the CommIT research program and in the broader FL security community.

⚠️Engineering Note

Deploying on the Three-Axis Frontier

Production FL with combined guarantees:

Identify the binding axis. Usually one axis dominates: medical FL is DP- binding; cross-cloud is privacy-binding; edge/IoT is robustness-binding. Optimize the binding axis and accept suboptimality on the others.
Serial composition is almost always acceptable. The $2$ - $3\times$ overhead from naive composition is usually acceptable for modern bandwidth. Shared-mechanism design is advanced research.
Monitor each axis separately. Privacy budget, Byzantine detection rate, DP accounting — each should have independent observability.
Plan for axis upgrades. New scheme developments (e.g., improved ByzSecAgg, decentralized FL) may shift the binding axis. Modular design allows upgrades.
Don't over-design. If the application doesn't need all three axes, omit one. The cost of adding axes is real.

Practical Constraints

•
Identify binding axis first
•
Serial composition: $2-3\times$ overhead, usually acceptable
•
Monitor each axis independently
•
Plan for modular upgrades
•
Don't over-engineer — omit unneeded axes

📋 Ref: Kairouz et al. 2021; Chen-Avestimehr 2023

Common Mistake: Assuming Composition Is Transparent

Mistake:

Assume that composing a privacy scheme, a robustness scheme, and a DP scheme in series preserves all three guarantees.

Correction:

Composition often breaks guarantees. For example, pairwise masking (Ch 10) + Krum (Ch 11) + Gaussian DP: the Krum filter can preserve some of the mask structure, leaking information to the server. The DP noise can mask the masks, weakening privacy. The interactions are subtle.

Rule: when composing privacy primitives, carefully analyze the threat model of the composition. It is usually not the logical AND of the individual threat models — it can be weaker in ways that invalidate the guarantees.

Reference frameworks (TPTPS, Dutta et al. 2022) provide composition-safe primitives with rigorous analysis. Use these, or analyze composition carefully yourself before claiming joint guarantees.

Key Takeaway

The three-axis frontier is open. The joint privacy-robustness-DP problem has known lower bound $\Theta(n(T + B + \log(1/\delta)/\varepsilon^2) d)$ communication per round. Achievable protocols via serial composition match up to constants; jointly optimal schemes that share structure across axes are open. The CommIT research addresses this frontier — extending ByzSecAgg with DP, and developing decentralized variants.

Quick Check

The best known lower bound on communication cost for joint privacy ( $T$ -collusion), robustness ( $B$ -Byzantine), and DP ( $(\varepsilon, \delta)$ ) FL is:

$O(1)$ — independent of all three.

$\Omega(n \cdot (T + B + \log(1/\delta)/\varepsilon^2) \cdot d)$ — additive across axes.

$\Omega(n \cdot T \cdot B \cdot \log(1/\delta)/\varepsilon^2 \cdot d)$ — multiplicative.

$\Theta(n^2)$ — quadratic in users.

Correction:

\Omega(n \cdot (T + B + \log(1/\delta)/\varepsilon^2) \cdot d)

— additive across axes.

Per Theorem 18.2.1, each axis imposes its own additive cost. The joint cost scales as the sum, times $n \cdot d$ .

Non-Linear Coded Computing Decentralized FL and Serverless Aggregation