Ferkans — Interactive Telecom Tutor

The End of Spatial Stationarity

Throughout Chapters 1–16 we treated the base-station array as a compact, homogeneous sensor: every antenna element saw every user through the same fading process, perhaps with spatial correlation. That picture rests on a hidden assumption: the physical aperture $D$ of the array is small compared to both the propagation range and the scatterer spread. When $D$ grows to metres — the so-called extra-large or XL-MIMO regime — the assumption fails. A user on one side of the array may be hidden from the other side by blockage, by range-dependent path loss across the aperture, or simply because its scatterers subtend a limited angular window that only a fraction of the array can resolve. The energy a user radiates lives on a spatial subset of the array which we call its visibility region (VR).

Every piece of machinery we built for stationary massive MIMO — orthogonal pilots, MMSE estimation with a single $\mathbf{R}_k$ , channel hardening, favorable propagation — degrades gracefully once VRs shrink below the full aperture. Worse, failing to estimate the VR and treating out-of-VR antennas as useful only pumps noise into the combiner. This chapter develops a principled pipeline: detect the VR, estimate the channel on the VR, and decode. The CommIT contribution of Xu and Caire (a 2D Markov prior on the VR mask) gives this pipeline its statistical backbone.

,

Definition:
The XL-MIMO Regime

An array is called extra-large MIMO (XL-MIMO) when its physical aperture $D$ is large enough that at least one of the following fails over the typical user range $r$ :

Far-field: $r \geq d_F = 2D^2/\lambda$ , so the wavefront is planar across the array.
Power stationarity: the per-antenna received power $|\mathbf{H}_{k,n}|^2$ is approximately constant across antennas $n$ .
Angular stationarity: the angular power spectrum seen by antenna $n$ is (approximately) the same for every $n$ .

Representative numbers at $f_c = 3.5$ GHz ( $\lambda = 8.6$ cm) with $D = 2$ m give $d_F \approx 93$ m, so any user inside the cell can sit in the near field and simultaneously see only part of the array. Similar numbers arise at mmWave with $D = 30$ cm and $f_c = 28$ GHz.

The name is pragmatic. "XL-MIMO" emphasizes that the aperture — not the antenna count — is what breaks the stationary model. A dense 256-element sub-wavelength array with $D = 10\lambda$ is still comfortably stationary; a sparse 64-element array spread over $D = 2$ m is not.

,

Definition:
Visibility Region and Binary Mask

Let $\mathbf{H}_{k} \in \mathbb{C}^{N_t}$ denote the full-aperture uplink channel of user $k$ . The visibility region of user $k$ is the index set of antennas whose received signal is above the noise floor with non-negligible probability:

$\mathcal{V}_k = \left\{ n \in \{1,\ldots,N_t\} : \mathbb{E}\!\left[|\mathbf{H}_{k,n}|^2\right] \geq \eta\,\sigma^2 \right\},$

for a small threshold $\eta > 0$ (typically $\eta \in [0.1, 1]$ ). Its binary mask is

$\mathbf{m}_k \in \{0,1\}^{N_t}, \qquad m_{k,n} = \mathbb{1}\{ n \in \mathcal{V}_k \}.$

On a uniform planar array with $N_t = N_1 N_2$ elements, we reshape $\mathbf{m}_k$ into a $N_1 \times N_2$ 2D mask $\mathbf{M}_k \in \{0,1\}^{N_1 \times N_2}$ . The observed channel is the element-wise product

$\mathbf{H}_{k}^{\text{obs}} = \mathbf{m}_k \odot \mathbf{H}_{k}^{\text{full}},$

where $\mathbf{H}_{k}^{\text{full}}$ is the idealized stationary channel that would be observed if the whole aperture were illuminated.

The threshold $\eta$ is a modelling choice, not a physical constant. Soft masks $m_{k,n} \in [0,1]$ are natural generalizations but do not change the structure of the problem. In the joint estimation of Section 18.5, we replace the hard mask by the variational marginal $q_n = \Pr[m_{k,n} = 1 \mid \mathbf{Y}_p]$ .

,

Visibility region (VR)

The subset of antennas of an XL-MIMO array on which a given user's received power is above the noise floor. Equivalent to the support of the binary mask $\mathbf{m}_k$ . For stationary massive MIMO, $\mathcal{V}_k = \{1,\ldots,N_t\}$ and the concept collapses to the classical model.

Spatial non-stationarity

The property that the second-order statistics of the channel ( $\mathbb{E}[|\mathbf{H}_{k,n}|^2]$ , angular spread, delay spread) vary across the antennas of a single array. In XL-MIMO this arises from near-field range variation across the aperture, blockage of subsets of the array, and user-dependent multipath clustering.

Three Physical Causes of VRs

VRs arise from three distinct physical mechanisms, often acting together:

Aperture-range geometry. For a user at range $r$ with aperture $D$ and $r/d_F \lesssim 1$ , the spherical wavefront curvature causes the per-antenna amplitude to roll off as $\propto 1/r_n$ where $r_n$ is the distance from the user to antenna $n$ . Far edges of the array see weaker signals.
Blockage. A human body, a vehicle, a pillar, or a wall can block the line of sight between the user and a contiguous block of antennas. Blockages are essentially binary and strongly correlated spatially.
Multipath clustering. Finite scatterers around the user illuminate only a limited angular window. The corresponding subset of the array that sees the cluster defines the VR. The VR boundary is soft but still localized.

All three mechanisms produce spatially contiguous VRs. This is the statistical regularity the 2D Markov prior of Section 18.2 exploits.

Visibility Region on a 2D Array

Generate a synthetic VR on an $N_1 \times N_2$ UPA. The heatmap shows the per-antenna received power $|\mathbf{H}_{k,n}|^2$ after masking by the VR. Try changing the VR size, shape, and SNR. Observe how the VR is a contiguous spatial blob, not a random subset of antennas.

Parameters

N_1

(horizontal)32

N_2

(vertical)32

VR area fraction0.35

Blob smoothing

\sigma

4

SNR (dB)10

Theorem: LS Estimation Penalty from VR Mismatch

Consider the LS channel estimate under orthonormal pilots $\mathbf{S}_{i,k} \mathbf{S}_{i,k}^{H} = \tau_p \mathbf{I}$ , with the receiver applying a hard mask $\hat{\mathbf{m}}_k$ to the estimate. Assume the true channel on the true VR $\mathcal{V}_k$ is $\mathbf{H}_{k}[n] \sim \mathcal{CN}(0, \sigma_c^2)$ and zero off $\mathcal{V}_k$ , and let $\tau_p \geq K$ . Then the normalized mean-square error is

$\text{NMSE}(\hat{\mathbf{m}}_k) = \frac{1}{|\mathcal{V}_k|\sigma_c^2} \mathbb{E}\!\left[\bigl\| \hat{\mathbf{H}}_k - \mathbf{H}_{k} \bigr\|^2\right] = \frac{1}{\text{SNR}}\,\frac{|\hat{\mathcal{V}}_k|}{|\mathcal{V}_k|} + \frac{|\mathcal{V}_k \setminus \hat{\mathcal{V}}_k|}{|\mathcal{V}_k|},$

where $\text{SNR} = \tau_p \sigma_c^2/\sigma^2$ and $\hat{\mathcal{V}}_k$ is the support of $\hat{\mathbf{m}}_k$ .

The first term is noise leakage: every antenna the receiver declares active but that carries no signal contributes pure noise — bigger $\hat{\mathcal{V}}_k$ means more noise. The second term is missed energy: every antenna inside the true VR that the receiver misses loses its signal entirely. Shrinking $\hat{\mathcal{V}}_k$ trades one penalty for the other, and the optimum is the true VR.

Proof

Split the error into three antenna sets

Partition the index set $\{1,\ldots,N_t\}$ into $A = \mathcal{V}_k \cap \hat{\mathcal{V}}_k$ (hit), $B = \hat{\mathcal{V}}_k \setminus \mathcal{V}_k$ (false alarm), $C = \mathcal{V}_k \setminus \hat{\mathcal{V}}_k$ (miss), $D$ = rest. On $D$ , both $\hat{\mathbf{H}}_k[n]$ and $\mathbf{H}_{k}[n]$ are zero by the mask, so it contributes nothing.

Compute errors on A, B, C

On $A$ : $\hat{\mathbf{H}}_k[n] = \mathbf{H}_{k}[n] + w_n / \sqrt{\tau_p}$ with $w_n \sim \mathcal{CN}(0, \sigma^2)$ , so $\mathbb{E}|\hat{\mathbf{H}}_k[n]- \mathbf{H}_{k}[n]|^2 = \sigma^2/\tau_p$ . On $B$ : $\hat{\mathbf{H}}_k[n] = w_n/\sqrt{\tau_p}$ but $\mathbf{H}_{k}[n] = 0$ , so $\mathbb{E}|\cdot|^2 = \sigma^2/\tau_p$ . On $C$ : $\hat{\mathbf{H}}_k[n] = 0$ but $\mathbf{H}_{k}[n]$ is a fresh $\mathcal{CN}(0, \sigma_c^2)$ , contributing $\sigma_c^2$ .

Normalize

Total error is $(|A| + |B|)\sigma^2/\tau_p + |C|\sigma_c^2$ . Normalizing by $|\mathcal{V}_k|\sigma_c^2$ and noting $|A|+|B| = |\hat{\mathcal{V}}_k|$ yields the stated formula. $\blacksquare$

,

Key Takeaway

VR mismatch is a two-sided penalty. Declaring too many antennas active floods the combiner with noise ( $|\hat{\mathcal{V}}_k|/\text{SNR}$ ); declaring too few throws away signal energy ( $|\mathcal{V}_k \setminus \hat{\mathcal{V}}_k|$ ). Every VR detection algorithm in this chapter is, at heart, an effort to balance these two costs.

Example: How Large is the VR Mismatch Penalty?

An XL-MIMO array has $N_t = 1024$ antennas. A user's true VR has $|\mathcal{V}_k| = 256$ antennas. The operating SNR is $\text{SNR} = 10$ dB ( $= 10$ ). Compare the NMSE of three detectors: (a) "All on" ( $\hat{\mathcal{V}}_k = \{1,\ldots,1024\}$ ), (b) "True VR" ( $\hat{\mathcal{V}}_k = \mathcal{V}_k$ ), (c) "Too small" with $|\hat{\mathcal{V}}_k| = 192$ , containing $144$ true-VR antennas and $48$ out-of-VR antennas.

Solution

All on

$|\hat{\mathcal{V}}_k|/|\mathcal{V}_k| = 1024/256 = 4$ , miss = 0. $\text{NMSE} = 4/10 + 0 = 0.40$ .

True VR

$|\hat{\mathcal{V}}_k|/|\mathcal{V}_k| = 1$ , miss = 0. $\text{NMSE} = 1/10 + 0 = 0.10$ .

Too small

$|\hat{\mathcal{V}}_k| = 192$ , miss $= 256 - 144 = 112$ , so miss fraction $= 112/256 = 0.4375$ . $\text{NMSE} = 192/(256 \cdot 10) + 0.4375 = 0.075 + 0.4375 \approx 0.51$ .

Interpretation

The correct VR is 4x more accurate than "all on" and 5x more accurate than the undersized estimate. Crucially, over-detection degrades linearly with $|\hat{\mathcal{V}}_k|$ , while under-detection degrades linearly with the miss count but with a much larger coefficient ( $\sigma_c^2 \gg \sigma^2/\tau_p$ for useful SNRs). The break-even point favors being slightly generous — but not as generous as the full aperture. $\blacksquare$

Common Mistake: Averaging Across VRs Destroys Information

Mistake:

A tempting shortcut is to compute a single spatial covariance $\bar{\mathbf{R}} = \frac{1}{K} \sum_k \mathbf{R}_k$ and use it as a stationary surrogate for every user. This mimics the massive-MIMO pipeline and seems to retain the "many antennas → good estimates" benefit.

Correction:

Averaging mixes disjoint VRs and hands the estimator a covariance with support covering the union of all VRs. The resulting MMSE estimator pumps noise through every antenna, cancelling the VR advantage. The right object is the per-user spatial covariance $\mathbf{R}_k$ estimated from that user's pilot correlator alone, regularized by the 2D Markov prior of Section 18.2. The computational cost is controlled by the subarray decomposition of Section 18.3.

Historical Note: From Holographic MIMO to XL-MIMO

2014–present

The idea that spatial non-stationarity would eventually dominate massive MIMO has a long prehistory. Marzetta's original 2010 proposal assumed a compact antenna panel where the stationary Rayleigh model was a good approximation. By 2014 Payami and Tufvesson had measured non-stationarity on a 7.4 m linear array at 2.6 GHz and reported that users illuminated only 20–30% of the aperture — the first empirical demonstration of VRs. Theoretical follow-ups by Amiri, Angjelichinoski, de Carvalho, and Popovski (2018–2020) introduced the term "visibility region" and formulated sparsity-based estimation. Björnson and Sanguinetti (2019) argued that holographic and XL-MIMO arrays will be the dominant regime for 6G. By 2022 the CommIT group (Xu and Caire) had developed the 2D Markov prior framework that is the backbone of this chapter.

,

Why This Matters: XL-MIMO in 6G Radio Architectures

Two 6G deployment candidates force XL-MIMO into the spotlight: distributed massive MIMO panels covering entire facades or ceilings (sometimes called holographic surfaces), and sub-THz arrays where the wavelength is so short that even a 20 cm panel holds thousands of elements and sits in the near field for any indoor user. In both scenarios the stationary model from Part I is hopelessly optimistic, and the techniques of this chapter become mandatory. The connection to RIS (Chapter 21) is also direct: an RIS is physically an XL aperture with its own visibility structure determined by the active user location.

Quick Check

An XL-MIMO array has $N_t = 2048$ antennas. A user's VR covers 512 antennas. The operating SNR is 0 dB. Which of the four NMSE values below corresponds to a detector that correctly identifies the VR (no false alarms, no misses)?

0.25

1.00

4.00

0.00