The Subpacketization Bottleneck

Why MAN Does Not Scale to Large KK

The MAN scheme has a serious practical limitation that has kept it out of production systems: subpacketization grows exponentially in the number of users. Each file must be split into (Kt)\binom{K}{t} subfiles, and each subfile must be tracked, transmitted, and decoded. For K=50K = 50, t=10t = 10: (5010)1010\binom{50}{10} \approx 10^{10} subfiles per file. At this scale, individual subfiles are too small to contain useful data (each ~1 bit), and the coding overhead overwhelms the savings.

This chapter addresses the subpacketization bottleneck. The CommIT contribution — graph-coloring delivery schemes — achieves most of the MAN coded-multicasting gain with polynomial subpacketization F=O(Kc)F = O(K^c) for constant cc. This makes coded caching deployable at realistic CDN scale (K100-1000K \approx 100\text{-}1000 users per cluster).

Theorem: MAN Subpacketization Is Exponential

The MAN scheme requires subpacketization F=(Kt)F = \binom{K}{t} where t=Kμt = K \mu. Using Stirling's approximation, F  =  (KKμ)    2Khb(μ)/KKμK(1μ),F \;=\; \binom{K}{K\mu} \;\geq\; 2^{K h_b(\mu)} / \sqrt{K \cdot K\mu \cdot K(1-\mu)}, where hb(μ)=μlog2μ(1μ)log2(1μ)h_b(\mu) = -\mu \log_2 \mu - (1-\mu)\log_2(1-\mu) is the binary entropy function. For any fixed μ(0,1)\mu \in (0, 1), FF grows exponentially in KK.

(Kt)\binom{K}{t} is the number of subsets of size tt from a KK-element set. For t=K/2t = K/2, this is (KK/2)2K\binom{K}{K/2} \sim 2^K — doubling KK squares FF. For t=K/10t = K/10: F2Khb(0.1)20.47KF \sim 2^{K h_b(0.1)} \approx 2^{0.47 K} — still exponential.

Subpacketization: MAN vs PDA vs Graph-Coloring

Log-scale plot of required subpacketization FF vs KK for three schemes at fixed target tt: MAN (exponential), PDA (polynomial), graph-coloring (K2K^2). At K50K \approx 50 MAN becomes infeasible for typical file sizes; PDA and graph-coloring remain practical to K1000K \approx 1000+.

Parameters
2
100

Subpacketization Feasibility vs File Size

For fixed memory ratio, plot the subpacketization of MAN (red) and PDA (blue) vs KK. Horizontal line at file size (in bits): above this line, subfiles become smaller than practical granularity. MAN crosses this line at modest KK; PDA stays feasible at much larger scales.

Parameters
0.2
8

Example: Where Does MAN Break?

A 1 GB video file (8×1098 \times 10^9 bits). Minimum practical subfile size: 1 KB (8×1038 \times 10^3 bits) due to header overhead. Max practical F=106F = 10^6. At what KK does MAN break at μ=0.1\mu = 0.1?

Key Takeaway

MAN's subpacketization (Kt)\binom{K}{t} limits practical deployment to K50K \approx 50. For larger networks, polynomial- subpacketization schemes are essential. The CommIT graph- coloring schemes (§14.3) and PDA framework (§14.2) are the answers.

🚨Critical Engineering Note

Subpacketization in Practice

Realistic numbers for subpacketization planning:

  1. File granularity. HTTP byte-range minimum: 1 KB per subfile. Smaller than this, header overhead dominates.
  2. Feasible file sizes. Video: 100 MB-2 GB per file. Software: 10 MB-1 GB. Documents: 1 MB-100 MB.
  3. Max feasible FF: 10^4 for small files, 10^6 for large.
  4. Max practical KK at μ=0.1\mu = 0.1: with MAN, K50K \leq 50. With PDA, K1000K \leq 1000+.
  5. Subfile indexing overhead. Each subfile has a label (log2F\log_2 F bits of metadata). For large FF, this is non-negligible.

Production coded-caching deployments must plan subpacketization at design time. Picking the wrong subpacketization scheme leads to 100× overhead penalty at runtime.

Practical Constraints
  • HTTP byte-range min: 1 KB

  • Subfile label: ~log_2 F bytes per subfile

  • MAN feasible only for K ≤ 50 at μ = 0.1

  • PDA / graph-coloring extend to K = 1000+

Common Mistake: Theoretical Rates Hide Subpacketization

Mistake:

Citing MAN rate R=K(1μ)/(1+Kμ)R = K(1-\mu)/(1+K\mu) as achievable for a 1000-user network without addressing subpacketization.

Correction:

At K=1000K = 1000, μ=0.1\mu = 0.1: MAN subpacketization is astronomical. The rate formula is mathematically correct but operationally unachievable without a different scheme. Honest presentation: "MAN achieves rate RR if each file can be split into (Kt)\binom{K}{t} subfiles; this is feasible only for K50K \leq 50."

For large KK, polynomial-subpacketization schemes achieve a constant-factor approximation of the MAN rate. The relevant achievable rate is typically 1.31.3-22× the MAN rate, not the MAN rate itself.