Prerequisites & Notation

Before You Begin

This chapter sits between three separate communities: (i) the massive MIMO system-design literature of Parts I-IV of this book, (ii) the model-based estimation and inference theory of Book FSI, and (iii) the modern deep-learning toolbox. The reader is expected to be comfortable with all three viewpoints and, crucially, with the tension between them: when the closed-form MMSE estimator exists it beats any black-box network, and when it does not the learned alternative is the best tool we have.

Linear MMSE channel estimation with known spatial covariance(Review ch03)
Self-check: Can you write the MMSE estimator $\hat{\mathbf{H}} = \boldsymbol{\Sigma}_{\ntn{ch}} (\boldsymbol{\Sigma}_{\ntn{ch}} + (\text{SNR})^{-1}\mathbf{I})^{-1} \mathbf{y}_p$ and identify the two ingredients that make it optimal?
FDD CSI feedback overhead, Type I / Type II codebooks, JSDM(Review ch08)
Self-check: Can you explain why FDD massive MIMO feedback overhead scales with the product of the number of antennas and the number of quantization bits per coefficient?
Deep unfolding: turning an iterative algorithm into a trainable network by making each iteration a layer(Review ch18)
Self-check: Can you describe how unrolling $K$ iterations of ISTA produces a $K$ -layer network with learnable step sizes and soft-threshold parameters?
Approximate message passing (AMP) and orthogonal AMP (OAMP)(Review ch20)
Self-check: Can you state the AMP iteration $\mathbf{x}^{t+1} = \eta_t(\mathbf{x}^t + \mathbf{A}^H \mathbf{r}^t)$ and explain the role of the Onsager correction term?
Mutual information and rate-distortion as information-theoretic primitives(Review ch13)
Self-check: Can you write the rate-distortion function $R(D) = \min_{p(\hat{x}|x): \mathbb{E} d(x, \hat{x}) \leq D} I(X; \hat{X})$ and identify the optimization variable?
Stochastic gradient descent, backpropagation, standard DL architectures (MLP, CNN, Transformer)
Self-check: Can you implement a minimal training loop in PyTorch or JAX with a custom loss, without relying on a high-level trainer?
Markov decision processes, Bellman equation, policy gradients
Self-check: Can you write the Bellman equation $V^{\pi}(s) = \mathbb{E}[r + \gamma V^{\pi}(s^\prime) \mid s, \pi]$ and explain why policy gradient methods optimize a stochastic policy rather than a deterministic one?

Notation for This Chapter

Symbols introduced or specialized in this chapter. Customizable symbols use $\ntn{}$ tokens. Machine-learning-specific symbols (network weights, loss functions, learning rate) are written in raw LaTeX because they are not part of the massive MIMO notation registry. See the NGlobal Notation Table master table.

Symbol	Meaning	Introduced
$\mathbf{H}$	True channel matrix (target of estimation / feedback / prediction)	s01
$\hat{\mathbf{H}}$	Estimated or reconstructed channel matrix (output of the learned network)	s01
$\mathbf{S}_{i,k}$	Pilot matrix used for uplink training	s01
$\mathbf{y}_p$	Pilot observation vector at the BS	s01
$f_\theta(\cdot)$	Neural network with trainable parameters $\theta$	s01
$\mathcal{L}(\theta)$	Training loss (NMSE, cross-entropy, negative log-likelihood, etc.)	s01
$\text{NMSE}$	Normalized mean-squared error $\mathbb{E} \\|\mathbf{H} - \hat{\mathbf{H}}\\|_F^2 / \mathbb{E} \\|\mathbf{H}\\|_F^2$	s01
$\mathbf{z}$	Latent code in the CSI feedback encoder-decoder	s02
$B$	Feedback payload size in bits per channel instance	s02
$R(D)$	Rate-distortion function: minimum bits per channel sample to achieve NMSE $\leq D$	s02
$i^{\star}_t$	Optimal beam index at time slot $t$ within a predefined codebook	s03
$s_t$ , $a_t$ , $r_t$	MDP state, action, and reward at time $t$	s04
$\pi_\phi(a \mid s)$	Stochastic policy parametrized by $\phi$ (PPO actor network)	s04
$V^{\pi}(s)$	Value function of policy $\pi$ at state $s$	s04
$\text{SNR}$	Per-user SNR (linear scale)	s01
$K$	Number of users sharing the resource block	s04
$N_t$	Number of BS transmit antennas	s01

← Ch 24 ML for Channel Estimation