Centralized vs. Distributed Processing

The Central Question of Cell-Free Processing

Chapters 11 and 12 established that cell-free massive MIMO eliminates cell boundaries and that user-centric clustering makes the architecture scalable. But we left a fundamental question unanswered: how should the distributed APs process the received signals? At one extreme, every AP forwards its raw baseband samples to the CPU, which performs centralized MMSE combining โ€” optimal, but demanding enormous fronthaul capacity. At the other extreme, each AP applies local combining and forwards only a scalar estimate per user โ€” minimal fronthaul, but suboptimal performance. This chapter develops the full spectrum between these extremes and identifies when each operating point is appropriate.

Definition:

Centralized Processing

Consider a cell-free network with MM APs, each equipped with NN antennas, serving KK single-antenna users. Let ymโˆˆCN\mathbf{y}_m \in \mathbb{C}^N denote the received signal at AP mm. In centralized processing, the CPU collects {y1,โ€ฆ,yM}\{\mathbf{y}_1, \ldots, \mathbf{y}_M\} and forms the network-wide received vector

y=[y1โ‹ฎyM]โˆˆCMN\mathbf{y} = \begin{bmatrix} \mathbf{y}_1 \\ \vdots \\ \mathbf{y}_M \end{bmatrix} \in \mathbb{C}^{MN}

The CPU then applies a centralized combining vector vkโˆˆCMN\mathbf{v}_{k} \in \mathbb{C}^{MN} to detect user kk:

s^k=vkHy\hat{s}_k = \mathbf{v}_{k}^{H} \mathbf{y}

The centralized MMSE combining vector is

vkc-MMSE=(โˆ‘j=1Kpjg^jg^jH+Cy+ฯƒ2IMN)โˆ’1g^k\mathbf{v}_{k}^{\text{c-MMSE}} = \left( \sum_{j=1}^{K} p_j \hat{\mathbf{g}}_j \hat{\mathbf{g}}_j^H + \mathbf{C}_{\mathbf{y}} + \sigma^2 \mathbf{I}_{MN} \right)^{-1} \hat{\mathbf{g}}_k

where g^k=[g^1kT,โ€ฆ,g^MkT]T\hat{\mathbf{g}}_k = [\hat{\mathbf{g}}_{1k}^T, \ldots, \hat{\mathbf{g}}_{Mk}^T]^T is the stacked channel estimate, and Cy\mathbf{C}_{\mathbf{y}} accounts for estimation error covariance.

Centralized MMSE is optimal in the sense that it maximizes the per-user SINR under the UatF framework. The price is that every AP must forward NN complex samples per channel use to the CPU.

Definition:

Distributed Processing

In distributed processing, each AP mm applies a local combining vector amkโˆˆCN\mathbf{a}_{mk} \in \mathbb{C}^N to its own received signal:

s^mk=amkHym\hat{s}_{mk} = \mathbf{a}_{mk}^H \mathbf{y}_m

AP mm then forwards the scalar s^mk\hat{s}_{mk} (one complex number per user) to the CPU. The CPU forms the final estimate by linearly combining the local estimates:

s^k=โˆ‘mโˆˆMkฮฑmkโ€‰s^mk\hat{s}_k = \sum_{m \in \mathcal{M}_k} \alpha_{mk} \, \hat{s}_{mk}

where ฮฑmk\alpha_{mk} are weighting coefficients and Mk\mathcal{M}_k is the set of APs serving user kk.

Distributed processing reduces the fronthaul load from MNMN complex samples to โˆฃMkโˆฃ|\mathcal{M}_k| complex scalars per user per channel use. The question is how much SINR we lose.

Centralized Processing

A cell-free processing architecture where all APs forward raw received signals (or sufficient statistics) to a central processing unit, which applies network-wide combining. Achieves the best SINR but requires the highest fronthaul capacity.

Related: Distributed Processing, Fronthaul

Distributed Processing

A cell-free processing architecture where each AP applies local combining and forwards only scalar estimates to the CPU. Minimizes fronthaul load at the cost of suboptimal interference suppression.

Related: Centralized Processing, Local Combining

Theorem: Centralized MMSE SINR (UatF Bound)

Under centralized MMSE combining with the UatF bound, the uplink SINR of user kk is

SINRk(4)=pkg^kH(โˆ‘jโ‰ kpjg^jg^jH+โˆ‘j=1KpjCj+ฯƒ2IMN)โˆ’1g^k\text{SINR}_k^{(4)} = p_k \hat{\mathbf{g}}_k^H \left( \sum_{j \neq k} p_j \hat{\mathbf{g}}_j \hat{\mathbf{g}}_j^H + \sum_{j=1}^{K} p_j \mathbf{C}_j + \sigma^2 \mathbf{I}_{MN} \right)^{-1} \hat{\mathbf{g}}_k

where Cj=diag(C1j,โ€ฆ,CMj)\mathbf{C}_j = \text{diag}(\mathbf{C}_{1j}, \ldots, \mathbf{C}_{Mj}) is the block-diagonal estimation error covariance, Cmj=ฮฒmjRmjโˆ’ฮณmjR^mj\mathbf{C}_{mj} = \beta_{mj} \mathbf{R}_{mj} - \gamma_{mj} \hat{\mathbf{R}}_{mj}, and R^mj\hat{\mathbf{R}}_{mj} depends on the pilot scheme.

The centralized MMSE receiver sees the entire MNMN-dimensional signal space and can optimally balance desired signal amplification against interference suppression across all APs simultaneously. This is the same MMSE receiver as in co-located massive MIMO, but now the antenna elements are distributed across the coverage area.

Theorem: Distributed Processing SINR with Weighted Combining

Under distributed processing with local combining vectors {amk}\{\mathbf{a}_{mk}\} and CPU weights {ฮฑmk}\{\alpha_{mk}\}, the UatF SINR of user kk is

SINRkdist=pkโˆฃโˆ‘mโˆˆMkฮฑmkโ€‰E[amkHgmk]โˆฃ2โˆ‘j=1Kpjโˆ‘mโˆˆMkโˆฃฮฑmkโˆฃ2โ€‰Var(amkHgmj)+ฯƒ2โˆ‘mโˆˆMkโˆฃฮฑmkโˆฃ2โ€‰E[โˆฅamkโˆฅ2]\text{SINR}_k^{\text{dist}} = \frac{p_k \left| \sum_{m \in \mathcal{M}_k} \alpha_{mk} \, \mathbb{E}[\mathbf{a}_{mk}^H \mathbf{g}_{mk}] \right|^2}{\sum_{j=1}^{K} p_j \sum_{m \in \mathcal{M}_k} |\alpha_{mk}|^2 \, \text{Var}(\mathbf{a}_{mk}^H \mathbf{g}_{mj}) + \sigma^2 \sum_{m \in \mathcal{M}_k} |\alpha_{mk}|^2 \, \mathbb{E}[\|\mathbf{a}_{mk}\|^2]}

where the expectation is over small-scale fading. The denominator separates into beamforming gain uncertainty, inter-user interference, and noise amplification.

Each AP provides a noisy estimate of user kk's symbol. The quality of these estimates varies across APs due to different path losses and interference environments. The CPU weights ฮฑmk\alpha_{mk} should emphasize APs with strong, reliable estimates and de-emphasize APs with weak or interference-dominated signals.

Centralized vs. Distributed Processing

AspectCentralized (Level 4)Distributed (Level 1โ€“3)
Fronthaul per APNN complex samples per channel useโˆฃDmโˆฃ|\mathcal{D}_m| complex scalars per channel use
CPU computationO((MN)2K)O((MN)^2 K) for MMSEO(MK)O(M K) for weighted sum
Interference suppressionNetwork-wide: suppresses all inter-user interference jointlyLocal: suppresses only intra-AP interference
CSI requirement at CPUg^mk\hat{\mathbf{g}}_{mk} for all m,km, kOnly large-scale statistics (for LSFD weights)
ScalabilityPoor: matrix inversion scales with MNMNGood: per-AP computation bounded
PerformanceOptimal (MMSE bound)Depends on cooperation level (L1 < L2 < L3 < L4)

Example: Fronthaul Load: Centralized vs. Distributed

Consider a cell-free network with M=100M = 100 APs, each with N=4N = 4 antennas, serving K=20K = 20 users. Each AP serves a user-centric cluster of size โˆฃDmโˆฃ=10|\mathcal{D}_m| = 10 users on average. Compare the fronthaul load per coherence block of ฯ„c=200\tau_c = 200 samples for centralized and distributed processing. Assume 32-bit floating point for real and imaginary parts (64 bits per complex sample).

Centralized vs. Local MMSE: Per-User SINR CDF

Compare the cumulative distribution of per-user SINR under centralized MMSE (Level 4) and local MMSE (Level 2) combining. Adjust the number of APs and antennas per AP to observe how the performance gap changes with network density.

Parameters
100

Number of access points

4

Antennas per access point

20

Number of users

20

User-centric cluster size

Common Mistake: Centralized Processing Is Not Always Worth the Cost

Mistake:

Assuming that centralized MMSE (Level 4) is always the right choice because it maximizes SINR.

Correction:

The SINR gain from centralized processing shrinks as the number of antennas per AP increases. With Nโ‰ฅ4N \geq 4 antennas, local MMSE at each AP already suppresses most intra-cluster interference. The remaining gap to centralized MMSE may not justify the Nร—N \times increase in fronthaul load. The system designer must evaluate the performance-fronthaul tradeoff for the specific deployment scenario.

Historical Note: From Cloud-RAN to Cell-Free: The Distributed Processing Journey

2010โ€“2020

The idea of centralizing baseband processing appeared in the Cloud-RAN (C-RAN) architecture proposed by China Mobile Research Institute around 2010. In C-RAN, remote radio heads (RRHs) forward digitized baseband signals to a centralized baseband unit (BBU) pool via high-capacity fronthaul links (typically CPRI over fiber). The cell-free massive MIMO paradigm, introduced by Ngo et al. in 2017, can be viewed as a Cloud-RAN where the RRHs are single-antenna (or few-antenna) APs deployed at very high density. The evolution from Cloud-RAN to cell-free to user-centric cell-free mirrors the engineering community's gradual recognition that full centralization does not scale โ€” the question has always been: how much to centralize? The four cooperation levels formalized by Bjornson and Sanguinetti in 2020 provide the definitive answer to this question.

Quick Check

In a cell-free network with M=50M = 50 APs, N=4N = 4 antennas per AP, and K=10K = 10 users, what is the per-AP fronthaul dimension for centralized processing (complex samples per channel use)?

1010 (one per user)

44 (one per antenna)

4040 (Nร—KN \times K)

200200 (full MNMN dimension)

Why This Matters: O-RAN Functional Splits and Cooperation Levels

The cooperation levels map directly to O-RAN functional split options. Level 4 (centralized MMSE) corresponds to Split 7.2x where the O-RU forwards frequency-domain IQ samples โ€” the analog of our ym\mathbf{y}_m. Level 2 (local MMSE) corresponds to a higher split (e.g., Split 6) where the O-RU performs local equalization and forwards soft bits or symbol estimates. The O-RAN Alliance's ongoing work on cell-free RAN explicitly addresses these tradeoffs. In practice, the fronthaul technology (fiber, millimeter-wave wireless, or Ethernet) determines which split is feasible, and therefore which cooperation level is achievable.

Key Takeaway

Centralized vs. distributed processing is not a binary choice. The four cooperation levels (L1โ€“L4) provide a continuum from fully local to fully centralized processing. The optimal operating point depends on the fronthaul capacity, the number of antennas per AP, and the interference environment. As a rule of thumb: invest in centralized processing when APs have few antennas (N=1N = 1โ€“22) and the network is interference-limited; use distributed processing when APs are well-equipped (Nโ‰ฅ4N \geq 4) and fronthaul is the bottleneck.