Subarray-Based Processing
The Computational Wall at
Once the aperture carries thousands of antennas, the complexity of full-aperture MMSE estimation becomes prohibitive. A naive per-user MMSE on a array requires flops per coherence block per user, which is multiple orders of magnitude above any realistic baseband budget. The redeeming structural fact, grounded in Section 18.1, is that the VR of every user is spatially localized: no user benefits from processing antennas far outside its VR. This motivates partitioning the array into subarrays and decoupling the estimation problem at subarray level, trading global optimality for an enormous complexity reduction.
Definition: Subarray Partition of an XL-MIMO Array
Subarray Partition of an XL-MIMO Array
A subarray partition of an -element array is a collection of disjoint index sets each of cardinality with . Typical partitions split a UPA into rectangular tiles of size ; we denote the number of subarrays and the per-subarray antenna count.
For a given user , the active subarrays are those that intersect the user's VR:
The subarray grid is a receiver-side processing construct, not a hardware constraint β although it matches the natural structure of panel-based arrays where each panel has its own digital baseband unit. When the hardware is already partitioned (e.g., a wall of panels), the obvious choice is to align with the panel boundaries.
Theorem: Complexity Reduction from Subarray Decomposition
Assume the subarray partition has equal tiles of size , each subarray processes its own pilots with an independent MMSE estimator, and the CPU combines subarray outputs with a linear weighted sum (no cross-subarray covariance computation). Then the total flop count to compute all channel estimates is compared with the full-aperture MMSE complexity of . The subarray decomposition therefore yields a factor speedup.
The MMSE estimator for user inverts a covariance matrix β cost . Splitting into subarrays inverts independent matrices at cost . Two dimensions are saved because each subarray processes antennas instead of , and the number of subarrays scales additively (one loop) not multiplicatively.
Full-aperture cost
MMSE for one user: . The inverse is , matrix-vector multiplications are . Dominated by the inverse: per user . Total: .
Subarray cost
For each subarray , the local MMSE inverts an matrix at cost . Summing across subarrays: . Substituting gives per user. Across users: .
Speedup ratio
. For and ( tiles of ), this is a speedup factor of .
Why Active-Subarray Pruning Matters
The subarray decomposition alone does not yet use the VR structure. We gain a second multiplier β typically per user β by processing only the active subarrays for user , skipping the rest. On a large aperture where each user touches only a fraction of the subarrays, this is another 10x reduction. The full complexity of the VR-aware subarray pipeline is therefore .
Subarray-Based Channel Estimation Pipeline
Complexity: , embarrassingly parallel across subarrays.Steps 1β5 run independently per subarray and can be mapped to panel-local baseband units; only the user-level zero-fill in step 10 requires a central aggregator. The algorithm has no cross-subarray matrix inversion, which is what unlocks the complexity reduction of Theorem TComplexity Reduction from Subarray Decomposition.
NMSE vs Number of Subarrays (With and Without VR Pruning)
Compare the NMSE of full-aperture MMSE, plain subarray MMSE, and VR-pruned subarray MMSE as a function of the subarray count . Notice that plain subarray MMSE degrades as grows (smaller tiles lose covariance information), while VR-pruned subarray MMSE remains near the full-aperture NMSE until the tiles become smaller than the VR boundary features.
Parameters
Example: A 4096-Element Array: How Much Do We Save?
An XL-MIMO array has elements arranged as . A design uses subarray tiles, giving subarrays of each. Users touch on average active subarrays. Compare the per-user, per-coherence-block flop count of: (a) full-aperture MMSE, (b) plain subarray MMSE, (c) VR-pruned subarray MMSE.
Full-aperture MMSE
Dominant cost flops per user. For users and a coherence time of ms at MHz, this is flops/s β orders of magnitude above any realistic baseband budget.
Plain subarray MMSE
Per user: flops. Speedup over (a): . Per-coherence- block cost drops from flops (all users) to flops β tractable.
VR-pruned subarray MMSE
Only of 64 subarrays are processed. Per user: flops. Extra speedup over (b): . Total over all users: flops per coherence block β well within budget.
Interpretation
The subarray decomposition provides the speedup; the VR-aware pruning adds another . The resulting estimator runs in real time on commodity hardware and loses at most dB of NMSE relative to the intractable full-aperture MMSE (see the interactive plot above).
Full-Aperture vs Subarray vs VR-Pruned Subarray MMSE
| Attribute | Full-aperture MMSE | Plain subarray MMSE | VR-pruned subarray MMSE |
|---|---|---|---|
| Per-user flops | |||
| Parallelism | Serial inverse | Embarrassingly parallel across | Embarrassingly parallel across |
| NMSE (stationary channel) | Optimal | 0.3-1 dB penalty | 0.3-1 dB penalty |
| NMSE (VR with low ) | Wastes pilots on dead antennas | Same as full if tiles cover VR | Near full-aperture, ~10x cheaper |
| Requires VR detector? | No | No | Yes (Section 18.5) |
| Cross-subarray coupling? | Full | None | None |
| Typical use case | Regular massive MIMO panels | XL-MIMO with blockage / multipath clustering |
What Subarray Processing Does Not Buy You
Subarray decomposition is a computational decoupling, not an information-theoretic one. The subarray estimators ignore cross-subarray covariance , which is non-zero whenever the spatial correlation is non-trivial. In practice the loss is small ( dB) because most of the per-user covariance mass lives within a single subarray, but the approximation is visible at high SNR where fine-grained correlation matters. If absolute fidelity is needed at dB, use a two-stage estimator: subarray MMSE first, then a low-rank refinement that couples neighbouring subarrays.
Common Mistake: Do Not Make Subarrays Smaller Than VR Features
Mistake:
Push as large as possible to maximize the speedup.
Correction:
When subarray size drops below the typical VR cluster diameter, individual tiles no longer see enough antennas to estimate the in-tile covariance reliably, and the VR detector starts flipping whole subarrays on and off based on a handful of samples. The sweet spot is β times the expected VR border thickness; for a VR on a array, subarrays work well. Smaller tiles force the detector to rely on the MRF prior to glue fragments back together, which works but eats into the prior's noise-cleaning budget.
Quick Check
An XL-MIMO array has antennas. It is partitioned into square subarrays. What is the flop-count speedup of plain subarray MMSE over full-aperture MMSE, per user?
16
256
1024
4096
By Theorem TComplexity Reduction from Subarray Decomposition, the speedup is . Each subarray inverts an -antenna matrix, so total cost is .
Quick Check
Why should subarray tiles not be made much smaller than the typical VR cluster diameter?
The MRF prior cannot run on small tiles
Local in-tile covariance estimation becomes unreliable, and VR detection starts flipping tiles on evidence too noisy to trust
The subarray count must be a power of two
The fronthaul cost grows faster than
Each tile must contain enough antennas to form a reliable per-tile covariance estimate and make a stable activation decision. When the tile size drops below the VR boundary feature scale, the detector starts toggling whole tiles on a handful of noisy samples. The sweet spot is tiles slightly larger than the characteristic VR boundary thickness (Section 18.3).
Align Subarrays with Hardware Panels
A production XL-MIMO array is rarely a single monolithic panel. Ericsson, Nokia, and Huawei commercial XL-MIMO products expose the array as a grid of panels, each with its own baseband unit and its own front-haul link to the central processor. The natural subarray partition is one subarray per panel:
- Cross-panel traffic stays at the fronthaul level (a weighted sum of per-panel estimates), not at the baseband level.
- The panel boundary matches a natural discontinuity in the spatial covariance (different oscillators, different calibration).
- A panel that is blocked or powered down simply drops out of without any algorithm reconfiguration.
- β’
Panel size: 4-16 antennas per panel at sub-6 GHz; 64-256 at mmWave
- β’
Inter-panel fronthaul: ~1 Gbps per panel for weighted-sum output
- β’
Per-panel computation budget: ms for