Ferkans — Interactive Telecom Tutor

ex-ch08-01

Easy

For coded convolution with $(p, q) = (3, 5)$ , compute the recovery threshold $K$ for both the standard polynomial code and the entangled polynomial code.

Show Hint

Standard: $K = pq$ ; entangled: $K = p + q - 1$ .

Solution

Both

Standard polynomial code: $K = pq = 15$ . Entangled polynomial code: $K = p + q - 1 = 7$ . A $2\times$ reduction in recovery threshold at the same storage.

ex-ch08-02

Easy

For a cubic function $f(\mathbf{x}_1, \ldots, \mathbf{x}_6) = \sum_i \mathbf{x}_i^T \mathbf{A}_i \mathbf{x}_i$ (diagonal quadratic in each input), compute the LCC recovery threshold.

Show Hint

$f$ is quadratic in each $\mathbf{x}_i$ ; total degree $d_f = 2$ .

Solution

Degree

$d_f = 2$ (quadratic in each variable, but each term involves only one variable). Number of inputs $K = 6$ .

LCC threshold

$K_{\text{rec}} = d_f(K - 1) + 1 = 2 \cdot 5 + 1 = 11$ .

ex-ch08-03

Easy

State why ReLU cannot be exactly computed via LCC at bounded per-worker storage and bounded recovery threshold.

Solution

Non-polynomial

ReLU is piecewise-linear but not polynomial; it cannot be expressed as a finite-degree polynomial on $\mathbb{R}$ . LCC requires $f$ to be polynomial.

Workarounds

(i) Approximate ReLU by a low-degree polynomial, apply LCC with bounded error; (ii) use cryptographic MPC (garbled circuits) for the ReLU step separately; (iii) replicate the ReLU layer across workers (hybrid coding).

ex-ch08-04

Easy

Compare the LCC recovery threshold for bilinear ( $d_f = 2$ ) vs. cubic ( $d_f = 3$ ) functions with $K = 5$ inputs.

Show Hint

Both cases: $K_{\text{rec}} = d_f(K - 1) + 1$ .

Solution

Bilinear

$K_{\text{rec}} = 2 \cdot 4 + 1 = 9$ .

Cubic

$K_{\text{rec}} = 3 \cdot 4 + 1 = 13$ .

Ratio

Cubic requires $13/9 \approx 1.44\times$ more responses. Linear scaling with function degree.

ex-ch08-05

Medium

Construct an entangled polynomial code for convolution with $\mathbf{a}$ having $p = 3$ chunks and $\mathbf{b}$ having $q = 4$ chunks. Specify the encoding polynomials and verify the recovery threshold.

Show Hint

Encoding: $p_{\mathbf{a}}(x) = \sum_i \mathbf{a}_i x^i$ , $p_{\mathbf{b}}(x) = \sum_j \mathbf{b}_j x^j$ .

Solution

Encoding

$p_{\mathbf{a}}(x) = \mathbf{a}_1 + x \mathbf{a}_2 + x^2 \mathbf{a}_3$ (degree 2). $p_{\mathbf{b}}(x) = \mathbf{b}_1 + x \mathbf{b}_2 + x^2 \mathbf{b}_3 + x^3 \mathbf{b}_4$ (degree 3).

Product

$p_{\mathbf{c}}(x) = p_{\mathbf{a}}(x) p_{\mathbf{b}}(x)$ has degree 5, so $p + q - 1 = 6$ evaluations suffice.

Recovery threshold

$K = 6$ .

ex-ch08-06

Medium

Why is LCC constant-factor worse than specialized codes (polynomial, entangled) for common operations? Give an example where LCC is attractive despite this penalty.

Show Hint

LCC is optimal for generic polynomials, not exploiting operation-specific structure.

Solution

Reason

LCC treats the target as a generic polynomial of degree $d_f$ . Specialized codes exploit operation-specific algebraic structure (e.g., convolution's sparse monomial basis) to reduce the effective degree.

When LCC wins

Higher-degree operations (quartic or quintic polynomial activations in privacy-preserving neural networks), arbitrary multi-tensor contractions, or federated computation of higher-order statistics (skewness, kurtosis).

Practical recommendation

Use specialized codes per-operation type; use LCC only for operations without specialized schemes. Production frameworks dispatch on operation type.

ex-ch08-07

Medium

Derive the degree of the composed polynomial $g(z) = f(u(z))$ in the LCC construction, where $u$ is the Lagrange-encoding polynomial of the $K$ inputs and $f$ has total degree $d_f$ .

Show Hint

$u$ has degree $K - 1$ .

Solution

Compose degrees

$u(z)$ has degree $K - 1$ as a polynomial in $z$ . $f(u(z))$ applies $f$ (degree $d_f$ ) to $u(z)$ ; the degree of the composition is $d_f \cdot \deg u = d_f(K - 1)$ .

Lagrange interpolation

Any polynomial of degree $\leq d_f(K - 1)$ is uniquely determined by $d_f(K - 1) + 1$ distinct evaluations. Hence $K_{\text{rec}} = d_f(K - 1) + 1$ .

ex-ch08-08

Medium

For $K = 16$ workers and a matrix multiplication with $p = q = 4$ (so $K = pq = 16$ ), compare the recovery thresholds of: (a) standard polynomial code, (b) entangled polynomial code (MatDot variant), (c) LCC.

Show Hint

Only the standard polynomial code is optimal for general matmul at storage $\mu = 1/p + 1/q$ .

Solution

(a) Standard polynomial code

$K = pq = 16$ at storage $\mu = 1/p + 1/q = 1/2$ . Optimal at this storage.

(b) MatDot (entangled for matmul)

$K = p + q - 1 = 7$ at higher storage $\mu' = 1/\min(p, q) = 1/4$ . Better threshold, but more storage per worker.

(c) LCC

$K_{\text{rec}} = d_f(K - 1) + 1 = 2 \cdot 15 + 1 = 31$ . Worse than the standard polynomial code — LCC is not the right tool for this structured operation.

Conclusion

For $(p = q = 4)$ matmul at $\mu = 1/2$ storage, use the standard polynomial code. MatDot buys a smaller threshold at the cost of more storage. LCC is not the right tool here.

ex-ch08-09

Medium

Propose a polynomial approximation of the sigmoid function $\sigma(x) = 1/(1 + e^{-x})$ on $[-5, 5]$ with degree $\leq 5$ . Discuss the tradeoff.

Show Hint

Taylor series around 0 or Chebyshev approximation.

Solution

Chebyshev-series, degree 5

$\hat\sigma_5(x) \approx 0.5 + 0.2159 x - 0.0083 x^3 + 0.000119 x^5$ .

Error

Max error on $[-5, 5]$ : $\approx 0.01$ (1% of output range). Higher-degree approximations reduce error at cost of LCC recovery-threshold overhead.

LCC implication

With degree 5, $K_{\text{rec}}^{\text{LCC}} = 5(K - 1) + 1$ . For $K = 10$ inputs, that's $46$ responses — a substantial overhead. Use this only when exact sigmoid is critical and polynomial-approximation error is acceptable.

When to use

Privacy-preserving logistic regression, where the sigmoid appears once per prediction and polynomial approximation (degree 3–5) is standard practice.

ex-ch08-10

Medium

For a 4-way tensor contraction $T_{ijk\ell} = \sum_s A_{ijs} B_{sk\ell}$ (matrix-like contraction on tensors of order 3), propose an entangled polynomial code achieving recovery threshold $K = p_1 + p_2 + p_3 + p_4 - 3$ for $(p_1, p_2, p_3, p_4)$ -partitioning.

Show Hint

Each tensor has three partition axes.

Solution

Encoding

Each tensor gets a polynomial encoding with exponents $i, j, s$ for $A$ and $s, k, \ell$ for $B$ . The product polynomial has degree equal to the sum of the polynomial degrees, minus one for the contracted index.

Recovery threshold

The total degree of the product polynomial, treated as a single-variable polynomial (by interleaving exponents), gives $K = p_1 + p_2 + p_3 + p_4 - 3$ .

Status

Yu et al. (2020) give the precise construction for 3-tensor contractions; 4-way contractions follow the same pattern.

ex-ch08-11

Hard

Prove (sketch) the LCC lower bound $K_{\text{rec}} \geq d_f(K - 1) + 1$ via an entropy argument on polynomial values.

Show Hint

Count degrees of freedom.

Solution

Counting argument

The polynomial $f(u(z))$ has degree $d_f(K - 1)$ . Its coefficient vector has $d_f(K - 1) + 1$ independent entries. Any scheme recovering this polynomial requires at least that many independent worker responses.

Output vs. polynomial coefficients

The target $f(\mathbf{x}_1, \ldots, \mathbf{x}_K)$ is one evaluation of this polynomial; recovering it generically requires recovering the whole polynomial. Hence $K_{\text{rec}} \geq d_f(K - 1) + 1$ .

Tight achievability

The LCC construction matches this bound with equality via Lagrange interpolation on the $N$ worker evaluations. The full proof is in Yu et al. 2019 Thm. 1. $\blacksquare$

ex-ch08-12

Hard

Describe how LCC composes with $T$ -privacy (hiding inputs from any $T$ colluding workers). What is the recovery-threshold cost?

Show Hint

Add $T$ random masks to the Lagrange polynomial.

Solution

Construction

Extend the Lagrange polynomial $u(z)$ to include $T$ additional random terms: $u_{\text{priv}}(z) = \sum_{j=1}^K \mathbf{x}_j \ell_j(z) + \sum_{\ell=1}^T \mathbf{r}_\ell \beta_{K+\ell}(z)$ , where $\mathbf{r}_\ell$ are uniform random vectors and $\beta_{K+\ell}$ are Lagrange bases on additional "masking" points.

New degree

$u_{\text{priv}}(z)$ has degree $K + T - 1$ . $g_{\text{priv}}(z) = f(u_{\text{priv}}(z))$ has degree $d_f(K + T - 1)$ .

Recovery threshold

$K_{\text{rec}}^{\text{priv}} = d_f(K + T - 1) + 1$ . Cost: $d_f \cdot T$ extra responses for $T$ -privacy.

Comparison

Privacy cost scales linearly in $T$ and in $d_f$ . For high-degree functions, privacy is especially expensive. Soleymani et al. 2021 formalize this result.

ex-ch08-13

Hard

Discuss the "polynomial approximation + LCC" strategy for non-polynomial deep learning: identify the tradeoffs between approximation degree, accumulated error across layers, and total recovery-threshold overhead.

Solution

Approximation error

Degree- $d$ Chebyshev approximation of a smooth non-polynomial $f$ has max error decaying exponentially in $d$ . For deep networks with $L$ layers, accumulated error scales as $L \cdot \epsilon$ (if $\epsilon$ is per-layer error).

Recovery threshold

Per-layer LCC threshold: $d \cdot (K - 1) + 1$ . Deep network with $L$ layers: effective degree is $d^L$ (composition), so overall LCC threshold becomes $d^L (K - 1) + 1$ — exponential in the depth!

Per-layer vs. end-to-end

Per-layer coding (one LCC instance per layer) avoids the degree-explosion; overall overhead is $L$ -times per-layer. Much more practical.

Tradeoff

Choose $d$ by per-layer error budget and per-layer LCC overhead. For $L = 100$ layers, degree $d = 5$ , the per-layer threshold overhead is modest but accumulated error may be unacceptable. Research direction: end-to-end analysis of polynomial- approximate deep learning.

ex-ch08-14

Hard

Sketch the composition of coded convolution (Chapter 8 §8.2) with coded gradient (Chapter 6) for distributed CNN training. What is the per-layer recovery threshold?

Solution

Per-layer structure

Forward: convolution → activation → ... → loss. Backward: loss → derivative-of-activation → transposed-convolution → ... Each convolution is codable via entangled polynomial codes ( $K = p + q - 1$ ). Each activation is non-polynomial (replicated or approximated).

Gradient coding integration

The per-layer gradient $\partial L/\partial \mathbf{W}$ aggregates over mini-batches. Gradient coding (§6.2) applies to the aggregation; the per-layer convolutions apply independently.

Per-layer threshold

For a $(p = q = 8)$ -convolution with $s = 2$ -straggler gradient aggregation: $K_{\text{conv}} = 15$ , $K_{\text{grad}} = N - 2$ . The overall per-layer operation waits for $\max(K_{\text{conv}}, K_{\text{grad}})$ — typically gradient-coding limits are more relaxed, so $K_{\text{conv}}$ dominates.

Open direction

Optimizing jointly across layers (re-balancing partition counts per layer to minimize end-to-end latency) is an active research problem.

ex-ch08-15

Challenge

Open problem. Propose an approximate coded-computing framework that handles general non-polynomial functions with bounded error. What information-theoretic quantities would characterize the optimal rate-accuracy tradeoff?

Show Hint

Think rate-distortion theory for coded computing.

Solution

Framework sketch

Define: input dimension $d$ , function $f$ with Lipschitz constant $L$ , approximation tolerance $\epsilon$ , number of workers $N$ . Goal: achieve $\|\widehat{f(\mathbf{x})} - f(\mathbf{x})\|^2 \leq \epsilon^2$ at minimum $K_{\text{rec}}$ .

Rate-distortion analogy

Define a "coded rate" $R = K_{\text{rec}} / N$ . The optimal rate as a function of $\epsilon$ and $L$ is $R^*(\epsilon, L) = \text{(some function of } L, \epsilon, d\text{)}$ . For polynomial $f$ , $R^*(0, L) = d_f/N$ (exact LCC). For non-polynomial with Lipschitz $L$ , the rate-distortion curve is an open characterization.

Partial results

Jahani-Nezhad & Maddah-Ali 2022 (Berrut Approximated Coded Computing) give partial answers using rational-function approximations. The optimal rate-distortion curve is a research frontier.

Status

This is one of the open problems of Chapter 18. Interested researchers should combine rate-distortion theory with coded-computing IA, explicit construction, and convergence analysis.

Exercises

ex-ch08-01

Both

ex-ch08-02

Degree

LCC threshold

ex-ch08-03

Non-polynomial

Workarounds

ex-ch08-04

Bilinear

Cubic

Ratio

ex-ch08-05

Encoding

Product

Recovery threshold

ex-ch08-06

Reason

When LCC wins

Practical recommendation

ex-ch08-07

Compose degrees

Lagrange interpolation

ex-ch08-08

(a) Standard polynomial code

(b) MatDot (entangled for matmul)

(c) LCC

Conclusion

ex-ch08-09

Chebyshev-series, degree 5

Error

LCC implication

When to use

ex-ch08-10

Encoding

Recovery threshold

Status

ex-ch08-11

Counting argument

Output vs. polynomial coefficients

Tight achievability

ex-ch08-12

Construction

New degree

Recovery threshold

Comparison

ex-ch08-13

Approximation error

Recovery threshold

Per-layer vs. end-to-end

Tradeoff

ex-ch08-14

Per-layer structure

Gradient coding integration

Per-layer threshold

Open direction

ex-ch08-15

Framework sketch

Rate-distortion analogy

Partial results

Status