Ferkans — Interactive Telecom Tutor

Why Estimate Both at Once

Sections 18.1–18.4 have given us three separate components: a VR mask with a 2D Markov prior (18.2), a subarray processing pipeline (18.3), and a near-field sparse channel representation (18.4). Running them sequentially — first detect the VR, then estimate the channel — is tempting but wastes information. A sequential scheme uses the raw pilot correlator for VR detection and throws away the channel coefficients; a proper scheme feeds the channel estimate back into the VR detector because a large coherent signal on an antenna is stronger evidence of $m_{k,n} = 1$ than a raw energy test. The principled machinery for this feedback is the EM algorithm with the 2D Markov prior playing the role of a structured latent variable and the near-field dictionary supplying the observation model.

,

Definition:
Joint Posterior of Mask and Channel

Stack all pilot observations into $\mathbf{Y}_p \in \mathbb{C}^{N_t \times \tau_p}$ . For user $k$ , let $\mathbf{m}_k$ denote the latent binary mask and $\mathbf{H}_{k} = \mathbf{A}_{\text{polar}} \mathbf{z}_k$ the near-field channel in polar coordinates. Under the Gaussian observation model and with the 2D Markov prior on $\mathbf{m}_k$ (Definition D2D Markov Random Field Prior on the VR Mask), the joint posterior factors as $\Pr[\mathbf{m}_k, \mathbf{z}_k \mid \mathbf{Y}_p] \propto \underbrace{\Pr[\mathbf{m}_k]}_{\text{2D Ising}} \cdot \underbrace{\Pr[\mathbf{z}_k]}_{\text{sparsity}} \cdot \underbrace{p(\mathbf{Y}_p \mid \mathbf{m}_k, \mathbf{z}_k)}_{\text{Gaussian}}.$ The Gaussian likelihood is $p(\mathbf{Y}_p \mid \mathbf{m}_k, \mathbf{z}_k) = \mathcal{CN}\bigl(\mathbf{Y}_p;\, \mathbf{D}(\mathbf{m}_k) \mathbf{A}_{\text{polar}} \mathbf{z}_k\, {\mathbf{S}_{i,k}}_{k},\, \sigma^2 \mathbf{I}\bigr),$ where $\mathbf{D}(\mathbf{m}_k) = \text{diag}(\mathbf{m}_k)$ applies the mask element-wise.

The log-joint $\mathcal{L}(\mathbf{m}_k, \mathbf{z}_k) = \log \Pr[\mathbf{m}_k, \mathbf{z}_k \mid \mathbf{Y}_p]$ is a sum of a discrete-MRF term in $\mathbf{m}_k$ , a continuous quadratic term in $\mathbf{z}_k$ , and a bilinear coupling $\mathbf{m}_k^T \cdot$ (quadratic in $\mathbf{z}_k$ ). Maximizing jointly over $(\mathbf{m}_k, \mathbf{z}_k)$ is NP-hard in general, but alternating maximization (EM) converges to a good local optimum in a few iterations.

Theorem: Monotone Ascent of EM for the Joint Problem

Let $q^{(t)}(\mathbf{m}_k)$ denote the variational distribution over $\mathbf{m}_k$ at EM iteration $t$ and $\hat{\mathbf{z}}_k^{(t)}$ the channel estimate. Define the evidence lower bound $\text{ELBO}(q, \mathbf{z}) = \mathbb{E}_{q}\!\left[\log \Pr[\mathbf{m}_k]\right] + \mathbb{E}_{q}\!\left[\log p(\mathbf{Y}_p \mid \mathbf{m}_k, \mathbf{z})\right] + H(q) + \log \Pr[\mathbf{z}].$ The EM updates — E-step: $q^{(t+1)} = \arg\max_q \text{ELBO}(q, \hat{\mathbf{z}}_k^{(t)})$ subject to the mean-field factorization $q(\mathbf{m}_k) = \prod_n q_n(m_{k,n})$ , and M-step: $\hat{\mathbf{z}}_k^{(t+1)} = \arg\max_{\mathbf{z}} \text{ELBO}(q^{(t+1)}, \mathbf{z})$ — produce a sequence $\{\text{ELBO}^{(t)}\}$ that is monotonically non-decreasing.

Each step maximizes the ELBO with respect to a different argument, so the ELBO cannot decrease. Because the ELBO is bounded above by the log-evidence $\log p(\mathbf{Y}_p)$ , the sequence converges. The limit is a stationary point — not necessarily the global maximum — but in practice the 2D Markov prior regularizes the landscape well and good initializations reach high-quality solutions within 3–5 iterations.

Proof

E-step non-decrease

By definition $q^{(t+1)} = \arg\max_q \text{ELBO}(q, \hat{\mathbf{z}}_k^{(t)})$ , so $\text{ELBO}(q^{(t+1)}, \hat{\mathbf{z}}_k^{(t)}) \geq \text{ELBO}(q^{(t)}, \hat{\mathbf{z}}_k^{(t)})$ .

M-step non-decrease

Similarly $\hat{\mathbf{z}}_k^{(t+1)} = \arg\max_{\mathbf{z}} \text{ELBO}(q^{(t+1)}, \mathbf{z})$ , so $\text{ELBO}(q^{(t+1)}, \hat{\mathbf{z}}_k^{(t+1)}) \geq \text{ELBO}(q^{(t+1)}, \hat{\mathbf{z}}_k^{(t)})$ .

Chain and bound

Chaining the two steps: $\text{ELBO}^{(t+1)} \geq \text{ELBO}^{(t)}$ . Because $\text{ELBO} \leq \log p(\mathbf{Y}_p) < \infty$ , the monotone sequence converges. $\blacksquare$

,

Joint VR + Channel Estimation via EM

Complexity:

\mathcal{O}(T \cdot (K_{\text{BP}} N_t + L G N_t))

. For

T = 5

EM outer iterations,

K_{\text{BP}} = 10

BP sweeps,

L = 4

, and

G = 10N_t

, total

\sim 5 \cdot (10 N_t + 40 N_t^{2})

flops. Well within the subarray complexity budget of Section 18.3.

Input: Pilot observations

\mathbf{Y}_p

, pilot

{\mathbf{S}_{i,k}}_{k}

, polar

dictionary

\mathbf{A}_{\text{polar}}

, MRF parameters

(J, h)

, noise

variance

\sigma^2

, maximum iterations

T

, convergence tolerance

\epsilon

.

Output: VR marginals

\{q_n\}

and channel estimate

\hat{\mathbf{H}}_k

.

1. Initialize: set

q_n^{(0)} \leftarrow \sigma(\ell_n)

from the raw

per-antenna LLR of Definition DPosterior MRF from Pilot Observations; set

\hat{\mathbf{z}}_k^{(0)}

from polar-OMP (Algorithm APolar-OMP for Near-Field Channel Estimation)

on the full-aperture pilot.

2. for

t = 0, 1, \ldots, T-1

do

3.

\quad

E-step (Loopy BP on the MRF):

4.

\quad\quad

Compute data-driven LLRs

\ell_n^{(t)} = \log p(\mathbf{y}_{p,n} \mid m_{k,n} = 1, \hat{\mathbf{z}}_k^{(t)}) - \log p(\mathbf{y}_{p,n} \mid m_{k,n} = 0)

.

5.

\quad\quad

Run

K_{\text{BP}}

sweeps of sum-product BP on the 2D Ising

graph with external fields

h_n = h + \tfrac{1}{2}\ell_n^{(t)}

.

6.

\quad\quad

Extract marginals

q_n^{(t+1)} = \Pr[m_{k,n} = 1 \mid \mathbf{Y}_p, \hat{\mathbf{z}}_k^{(t)}]

.

7.

\quad

M-step (weighted sparse LS):

8.

\quad\quad

Form weight matrix

\mathbf{W}^{(t+1)} = \text{diag}(q_n^{(t+1)})

and solve

\hat{\mathbf{z}}_k^{(t+1)} = \arg\min_\mathbf{z} \| \mathbf{W}^{(t+1)} (\mathbf{Y}_p {\mathbf{S}_{i,k}}_{k}^* /\|{\mathbf{S}_{i,k}}_{k}\|^2 - \mathbf{A}_{\text{polar}} \mathbf{z}) \|_2^2 + \lambda \|\mathbf{z}\|_1

.

9.

\quad

Check convergence: if

\|\hat{\mathbf{z}}_k^{(t+1)} - \hat{\mathbf{z}}_k^{(t)}\|_2 < \epsilon

break.

10. end for

11. Assemble

\hat{\mathbf{H}}_k \leftarrow \text{diag}(\mathbf{q}^{(T)})\, \mathbf{A}_{\text{polar}}\, \hat{\mathbf{z}}_k^{(T)}

.

12. return

\{q_n^{(T)}\}, \hat{\mathbf{H}}_k

.

Step 5 is the core CommIT contribution: loopy BP on the 2D Ising graph enforces spatial smoothness of the VR and acts as a structured regularizer on the mask. Without it, the mean-field update would amount to independent sigmoid thresholding per antenna — the same as sequential VR detection, and strictly worse than the joint scheme (Theorem TMonotone Ascent of EM for the Joint Problem still holds but the operating point is worse).

Joint EM vs Sequential Estimation: NMSE vs VR Mismatch

Compare four estimators across the operating SNR: (i) genie (true VR + MMSE), (ii) sequential (hard-threshold VR detector + MMSE on detected support), (iii) Xu–Caire joint EM (this section), (iv) LS on the full aperture. At moderate SNR the joint EM lies within 1 dB of the genie; the sequential detector suffers a 4–6 dB penalty near the boundary of VR ambiguity.

Parameters

N_1

32

N_2

32

VR fraction0.3

Pilot length

\tau_p

16

Min SNR (dB)-5

Max SNR (dB)20

Example: How Many EM Iterations Do We Need?

On a $32 \times 32$ panel with $\text{VR fraction} = 0.3$ and $\text{SNR} = 0$ dB, how does the NMSE of the joint EM estimator evolve with EM iteration index $t$ and what is a reasonable stopping rule?

Solution

Iteration 0 (polar-OMP init)

At $t=0$ the mask is essentially uniform $q_n = 1/2$ , and the M-step is ordinary polar-OMP with a very loose prior. Typical NMSE $\approx 0.4$ (much worse than LS on the true VR).

Iteration 1 (first BP pass)

The BP sweeps clean the mask using the channel estimate from iteration 0. NMSE drops sharply to $\approx 0.15$ , the bulk of the improvement.

Iterations 2–3

The mask and channel now mutually reinforce; NMSE falls to $\approx 0.09$ , essentially the genie floor for this operating point.

Iterations 4–5

Diminishing returns; NMSE improvement below $0.005$ . The paper recommends stopping when $\|\hat{\mathbf{z}}^{(t+1)} - \hat{\mathbf{z}}^{(t)}\|/\|\hat{\mathbf{z}}^{(t)}\| < 10^{-2}$ , which usually triggers at $t = 3$ or $4$ . $\blacksquare$

Pilot Overhead in the XL-MIMO Regime

A subtle payoff of the joint estimator is that it shrinks the pilot overhead needed to achieve a target NMSE. With the Markov prior exploited, pilot lengths as short as $\tau_p = 8$ symbols suffice on a $32 \times 32$ panel at $10$ dB SNR — less than the $K = 16$ suggested by orthogonal pilot allocation. The remaining non-orthogonality is absorbed by the MMSE / sparse step, which the 2D prior makes robust. In short: the 2D Markov prior partially substitutes for pilot orthogonality, a surprisingly aggressive form of pilot decontamination specific to XL-MIMO.

Why This Matters: Connection to Classical Pilot Contamination (Ch. 3)

Pilot contamination (Chapter 3) arises when two users in different cells share the same pilot; the estimator cannot separate them because their covariance subspaces overlap. In XL-MIMO, users in the same cell share the pilot resource too, but the VR structure hands the estimator an extra separator: two users with disjoint VRs can share the same pilot sequence because the spatial evidence on different antennas distinguishes them. The joint EM of this section exploits that spatial separation automatically; the result is a form of spatial pilot decontamination that only works when visibility regions are non-overlapping enough. The CommIT group's work on spatially correlated pilot decontamination from Chapter 3 is the sibling story in the stationary regime — same principle, different structural prior.

,

Common Mistake: Do Not Freeze the Mask Too Early

Mistake:

After the first BP pass, the marginals $q_n$ look well-separated, so round them to a hard 0/1 mask and finish with an ordinary LS on the hard support.

Correction:

Hard-thresholding between EM iterations destroys the soft evidence that the M-step needs to refine the channel. The penalty is largest at the boundary of the VR where $q_n$ sits around 0.3–0.7, precisely the antennas where the joint information flow matters most. Keep the marginals soft throughout EM and hard-threshold only at the very end, and only if your downstream combiner requires a binary mask. The interactive plot above shows the 2–4 dB NMSE penalty of premature thresholding at moderate SNR.

⚠️Engineering Note

Deploying the Joint Estimator in Real Time

A production XL-MIMO deployment running the joint EM estimator of Algorithm AJoint VR + Channel Estimation via EM must balance three constraints: coherence block length (typically $\tau_c = 100$ – $200$ symbols), per-user baseband budget (typically $< 1$ ms on embedded DSP), and fronthaul capacity. A practical blueprint:

Outer loop $T = 3$ – $5$ . More iterations rarely improve NMSE by more than $0.1$ dB.
Inner BP $K_{\text{BP}} = 10$ sweeps. Use a checkerboard schedule so the sweeps parallelize across half the grid at a time.
Polar dictionary cached. The dictionary depends only on array geometry and wavelength, not on channel or user state, so it can be precomputed once and reused for months.
Subarray fallback. If the per-user budget is tight, run the joint EM on the active subarrays only (Section 18.3), which drops the M-step cost by $S / |\mathcal{A}_k|$ without hurting NMSE noticeably.
Graceful degradation. When SNR drops below $-5$ dB, fall back to polar- OMP with a flat mask prior; the MRF benefit vanishes under very noisy evidence and the BP sweeps waste cycles.

Practical Constraints

•
Outer EM iterations $T \in [3, 5]$
•
Inner BP sweeps $K_{\text{BP}} \in [8, 16]$
•
Per-user runtime budget: $< 0.5$ ms on embedded DSP for $N_t \leq 4096$
•
Fallback threshold: SNR $< -5$ dB reverts to polar-OMP with uniform mask

,

Joint VR and Channel Estimation