Exercises
ex-ch08-01
EasyFor coded convolution with , compute the recovery threshold for both the standard polynomial code and the entangled polynomial code.
Standard: ; entangled: .
Both
Standard polynomial code: . Entangled polynomial code: . A reduction in recovery threshold at the same storage.
ex-ch08-02
EasyFor a cubic function (diagonal quadratic in each input), compute the LCC recovery threshold.
is quadratic in each ; total degree .
Degree
(quadratic in each variable, but each term involves only one variable). Number of inputs .
LCC threshold
.
ex-ch08-03
EasyState why ReLU cannot be exactly computed via LCC at bounded per-worker storage and bounded recovery threshold.
Non-polynomial
ReLU is piecewise-linear but not polynomial; it cannot be expressed as a finite-degree polynomial on . LCC requires to be polynomial.
Workarounds
(i) Approximate ReLU by a low-degree polynomial, apply LCC with bounded error; (ii) use cryptographic MPC (garbled circuits) for the ReLU step separately; (iii) replicate the ReLU layer across workers (hybrid coding).
ex-ch08-04
EasyCompare the LCC recovery threshold for bilinear () vs. cubic () functions with inputs.
Both cases: .
Bilinear
.
Cubic
.
Ratio
Cubic requires more responses. Linear scaling with function degree.
ex-ch08-05
MediumConstruct an entangled polynomial code for convolution with having chunks and having chunks. Specify the encoding polynomials and verify the recovery threshold.
Encoding: , .
Encoding
(degree 2). (degree 3).
Product
has degree 5, so evaluations suffice.
Recovery threshold
.
ex-ch08-06
MediumWhy is LCC constant-factor worse than specialized codes (polynomial, entangled) for common operations? Give an example where LCC is attractive despite this penalty.
LCC is optimal for generic polynomials, not exploiting operation-specific structure.
Reason
LCC treats the target as a generic polynomial of degree . Specialized codes exploit operation-specific algebraic structure (e.g., convolution's sparse monomial basis) to reduce the effective degree.
When LCC wins
Higher-degree operations (quartic or quintic polynomial activations in privacy-preserving neural networks), arbitrary multi-tensor contractions, or federated computation of higher-order statistics (skewness, kurtosis).
Practical recommendation
Use specialized codes per-operation type; use LCC only for operations without specialized schemes. Production frameworks dispatch on operation type.
ex-ch08-07
MediumDerive the degree of the composed polynomial in the LCC construction, where is the Lagrange-encoding polynomial of the inputs and has total degree .
has degree .
Compose degrees
has degree as a polynomial in . applies (degree ) to ; the degree of the composition is .
Lagrange interpolation
Any polynomial of degree is uniquely determined by distinct evaluations. Hence .
ex-ch08-08
MediumFor workers and a matrix multiplication with (so ), compare the recovery thresholds of: (a) standard polynomial code, (b) entangled polynomial code (MatDot variant), (c) LCC.
Only the standard polynomial code is optimal for general matmul at storage .
(a) Standard polynomial code
at storage . Optimal at this storage.
(b) MatDot (entangled for matmul)
at higher storage . Better threshold, but more storage per worker.
(c) LCC
. Worse than the standard polynomial code β LCC is not the right tool for this structured operation.
Conclusion
For matmul at storage, use the standard polynomial code. MatDot buys a smaller threshold at the cost of more storage. LCC is not the right tool here.
ex-ch08-09
MediumPropose a polynomial approximation of the sigmoid function on with degree . Discuss the tradeoff.
Taylor series around 0 or Chebyshev approximation.
Chebyshev-series, degree 5
.
Error
Max error on : (1% of output range). Higher-degree approximations reduce error at cost of LCC recovery-threshold overhead.
LCC implication
With degree 5, . For inputs, that's responses β a substantial overhead. Use this only when exact sigmoid is critical and polynomial-approximation error is acceptable.
When to use
Privacy-preserving logistic regression, where the sigmoid appears once per prediction and polynomial approximation (degree 3β5) is standard practice.
ex-ch08-10
MediumFor a 4-way tensor contraction (matrix-like contraction on tensors of order 3), propose an entangled polynomial code achieving recovery threshold for -partitioning.
Each tensor has three partition axes.
Encoding
Each tensor gets a polynomial encoding with exponents for and for . The product polynomial has degree equal to the sum of the polynomial degrees, minus one for the contracted index.
Recovery threshold
The total degree of the product polynomial, treated as a single-variable polynomial (by interleaving exponents), gives .
Status
Yu et al. (2020) give the precise construction for 3-tensor contractions; 4-way contractions follow the same pattern.
ex-ch08-11
HardProve (sketch) the LCC lower bound via an entropy argument on polynomial values.
Count degrees of freedom.
Counting argument
The polynomial has degree . Its coefficient vector has independent entries. Any scheme recovering this polynomial requires at least that many independent worker responses.
Output vs. polynomial coefficients
The target is one evaluation of this polynomial; recovering it generically requires recovering the whole polynomial. Hence .
Tight achievability
The LCC construction matches this bound with equality via Lagrange interpolation on the worker evaluations. The full proof is in Yu et al. 2019 Thm. 1.
ex-ch08-12
HardDescribe how LCC composes with -privacy (hiding inputs from any colluding workers). What is the recovery-threshold cost?
Add random masks to the Lagrange polynomial.
Construction
Extend the Lagrange polynomial to include additional random terms: , where are uniform random vectors and are Lagrange bases on additional "masking" points.
New degree
has degree . has degree .
Recovery threshold
. Cost: extra responses for -privacy.
Comparison
Privacy cost scales linearly in and in . For high-degree functions, privacy is especially expensive. Soleymani et al. 2021 formalize this result.
ex-ch08-13
HardDiscuss the "polynomial approximation + LCC" strategy for non-polynomial deep learning: identify the tradeoffs between approximation degree, accumulated error across layers, and total recovery-threshold overhead.
Approximation error
Degree- Chebyshev approximation of a smooth non-polynomial has max error decaying exponentially in . For deep networks with layers, accumulated error scales as (if is per-layer error).
Recovery threshold
Per-layer LCC threshold: . Deep network with layers: effective degree is (composition), so overall LCC threshold becomes β exponential in the depth!
Per-layer vs. end-to-end
Per-layer coding (one LCC instance per layer) avoids the degree-explosion; overall overhead is -times per-layer. Much more practical.
Tradeoff
Choose by per-layer error budget and per-layer LCC overhead. For layers, degree , the per-layer threshold overhead is modest but accumulated error may be unacceptable. Research direction: end-to-end analysis of polynomial- approximate deep learning.
ex-ch08-14
HardSketch the composition of coded convolution (Chapter 8 Β§8.2) with coded gradient (Chapter 6) for distributed CNN training. What is the per-layer recovery threshold?
Per-layer structure
Forward: convolution β activation β ... β loss. Backward: loss β derivative-of-activation β transposed-convolution β ... Each convolution is codable via entangled polynomial codes (). Each activation is non-polynomial (replicated or approximated).
Gradient coding integration
The per-layer gradient aggregates over mini-batches. Gradient coding (Β§6.2) applies to the aggregation; the per-layer convolutions apply independently.
Per-layer threshold
For a -convolution with -straggler gradient aggregation: , . The overall per-layer operation waits for β typically gradient-coding limits are more relaxed, so dominates.
Open direction
Optimizing jointly across layers (re-balancing partition counts per layer to minimize end-to-end latency) is an active research problem.
ex-ch08-15
ChallengeOpen problem. Propose an approximate coded-computing framework that handles general non-polynomial functions with bounded error. What information-theoretic quantities would characterize the optimal rate-accuracy tradeoff?
Think rate-distortion theory for coded computing.
Framework sketch
Define: input dimension , function with Lipschitz constant , approximation tolerance , number of workers . Goal: achieve at minimum .
Rate-distortion analogy
Define a "coded rate" . The optimal rate as a function of and is . For polynomial , (exact LCC). For non-polynomial with Lipschitz , the rate-distortion curve is an open characterization.
Partial results
Jahani-Nezhad & Maddah-Ali 2022 (Berrut Approximated Coded Computing) give partial answers using rational-function approximations. The optimal rate-distortion curve is a research frontier.
Status
This is one of the open problems of Chapter 18. Interested researchers should combine rate-distortion theory with coded-computing IA, explicit construction, and convergence analysis.