Chapter Summary

Key Points

1.
Probability spaces and axioms. A probability space $(\Omega, \mathcal{F}, P)$ provides the rigorous foundation for every stochastic model in communications: the sample space $\Omega$ enumerates all possible outcomes (e.g., all realizable channel states), the $\sigma$ -algebra $\mathcal{F}$ defines the measurable events, and the probability measure $P$ assigns consistent probabilities satisfying Kolmogorov's axioms $P(\Omega) = 1$ , non-negativity, and countable additivity. Conditional probability and Bayes' theorem $P(A \mid B) = P(B \mid A)\,P(A) / P(B)$ underpin maximum-likelihood and MAP detection, where the receiver inverts the channel to estimate transmitted symbols. The law of total probability decomposes complex system analyses — such as computing outage probability over a random fading channel — into tractable conditional calculations.
2.
Random variables, distributions, and moments. A random variable $X : \Omega \to \mathbb{R}$ maps outcomes to numbers, and its CDF $F_X(x) = P(X \le x)$ fully characterizes its statistical behavior. For continuous random variables the PDF $f_X(x) = dF_X/dx$ enables computation of the expectation $\mathbb{E}[X] = \int x\,f_X(x)\,dx$ and variance $\mathrm{Var}(X) = \mathbb{E}[(X - \mu)^2]$ , which quantify signal power and noise spread in communication links. The Gaussian distribution $\mathcal{N}(\mu, \sigma^2)$ models thermal noise, the exponential distribution governs inter-arrival times and instantaneous SNR under Rayleigh fading, and the chi-squared distribution arises when summing squared Gaussian components — each playing a distinct role in receiver analysis and system design.
3.
Functions of random variables and fading distributions. Transforming random variables via $Y = g(X)$ — whether through nonlinear device characteristics or envelope detection — requires the Jacobian formula $f_Y(y) = f_X(g^{-1}(y))\,|dg^{-1}/dy|$ for monotonic mappings. The Rayleigh distribution, arising as the envelope $|Z|$ of a circularly symmetric complex Gaussian $Z \sim \mathcal{CN}(0, 2\sigma^2)$ , models non-line-of-sight (NLOS) fading; the Ricean distribution adds a deterministic line-of-sight component with $K$ -factor quantifying the ratio of specular to scattered power; and the Nakagami- $m$ distribution provides a flexible shape parameter that spans from severe fading ( $m = 1/2$ ) through Rayleigh ( $m = 1$ ) to near-deterministic channels ( $m \to \infty$ ). These fading models directly determine outage probability $P_{\mathrm{out}} = P(\gamma < \gamma_{\mathrm{th}})$ and average bit-error-rate performance in wireless links.
4.
Moment-generating and characteristic functions. The moment-generating function $M_X(s) = \mathbb{E}[e^{sX}]$ and the characteristic function $\Phi_X(\omega) = \mathbb{E}[e^{j\omega X}]$ encode the entire distribution in a single transform, with moments extracted as derivatives at the origin: $\mathbb{E}[X^n] = M_X^{(n)}(0)$ . For independent random variables, the MGF of a sum factors as a product $M_{X+Y}(s) = M_X(s)\,M_Y(s)$ , which greatly simplifies the analysis of diversity combining schemes (MRC, EGC) where the total SNR is a sum of independent branch SNRs. The characteristic function, guaranteed to exist for all distributions, connects to the PDF through the inverse Fourier transform and is the natural tool for proving the central limit theorem and analyzing OFDM sub-carrier statistics.
5.
Random vectors, covariance matrices, and complex Gaussians. A random vector $\mathbf{x} \in \mathbb{C}^n$ has its second-order statistics captured by the covariance matrix $\mathbf{R} = \mathbb{E}[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^H]$ , which is positive semidefinite Hermitian — precisely the matrix class whose eigendecomposition was studied in Chapter 1. The multivariate Gaussian $\mathcal{N}(\boldsymbol{\mu}, \mathbf{R})$ and its complex counterpart $\mathcal{CN}(\boldsymbol{\mu}, \mathbf{R})$ are uniquely determined by their first and second moments, with circular symmetry requiring the pseudo-covariance $\mathbb{E}[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T] = \mathbf{0}$ . In MIMO systems the channel vector $\mathbf{h} \sim \mathcal{CN}(\mathbf{0}, \mathbf{R})$ encodes spatial correlation, and the eigenstructure of $\mathbf{R}$ directly determines beamforming gain, spatial multiplexing capability, and the capacity of correlated MIMO channels.
6.
Convergence concepts and concentration inequalities. The law of large numbers (LLN) guarantees that sample averages converge to ensemble means — justifying ergodic capacity as a meaningful metric when a codeword spans many independent fading realizations. The central limit theorem (CLT) establishes that normalized sums of i.i.d. random variables tend to Gaussian, which explains why aggregate interference in dense networks and OFDM time-domain samples are well-modeled as Gaussian. The Chernoff bound $P(X \ge a) \le \min_{s > 0} e^{-sa} M_X(s)$ provides exponentially tight tail probabilities that are essential for bounding error rates in coded systems, while Chebyshev's inequality offers distribution-free guarantees on concentration around the mean.
7.
Stochastic processes: stationarity and spectral analysis. A stochastic process $\{X(t),\, t \in \mathbb{R}\}$ assigns a random variable to each time instant, with wide-sense stationarity (WSS) requiring a constant mean $\mathbb{E}[X(t)] = \mu$ and an autocorrelation function that depends only on the time lag $R_X(\tau) = \mathbb{E}[X(t)\,X^*(t - \tau)]$ . The Wiener–Khinchin theorem establishes that the power spectral density (PSD) is the Fourier transform of the autocorrelation, $S_X(f) = \int R_X(\tau)\,e^{-j2\pi f\tau}\,d\tau$ , providing the bridge between time-domain correlation and frequency-domain power distribution. For wireless channels, the Doppler spectrum $S_H(f)$ and its inverse — the time-domain correlation $R_H(\Delta t)$ — quantify how rapidly the channel varies, directly governing pilot spacing, coherence time, and the validity of quasi-static fading assumptions.
8.
Gaussian processes and white noise. A Gaussian process is fully specified by its mean function and autocorrelation, since all finite-dimensional distributions are jointly Gaussian — a property inherited from the multivariate Gaussian theory of random vectors. White Gaussian noise, with flat PSD $S_N(f) = N_0/2$ and delta autocorrelation $R_N(\tau) = (N_0/2)\,\delta(\tau)$ , is the canonical noise model in communications: passing it through a filter of bandwidth $W$ produces colored noise with power $\sigma^2 = N_0 W$ . The additive white Gaussian noise (AWGN) channel $Y(t) = X(t) + N(t)$ serves as the fundamental reference model against which all practical systems are benchmarked, and its sufficiency statistics (matched-filter outputs) are themselves Gaussian, enabling clean derivations of optimal detection and capacity.
9.
Markov chains and Poisson processes. A discrete-time Markov chain satisfies the memoryless property $P(X_{n+1} \mid X_n, \ldots, X_0) = P(X_{n+1} \mid X_n)$ , making the transition matrix $\mathbf{P}$ with entries $p_{ij} = P(X_{n+1} = j \mid X_n = i)$ the complete descriptor of the chain's dynamics. Finite-state Markov channels model the bursty error behavior of fading links — the Gilbert–Elliott model partitions the channel into "good" and "bad" states with distinct error rates, capturing temporal error correlation that i.i.d. models miss. The Poisson process, with independent and stationary increments and inter-arrival times $\sim \mathrm{Exp}(\lambda)$ , models random access attempts, call arrivals, and interference events in cellular networks, providing the probabilistic backbone of teletraffic engineering and stochastic geometry.

Looking Ahead

Chapter 3 introduces information theory and coding, translating the probability foundations from this chapter into fundamental performance limits for communication systems. Shannon entropy $H = -\sum p(x) \log p(x)$ quantifies the irreducible uncertainty of a source, mutual information $I(X; Y) = H - H(X \mid Y)$ measures the information conveyed through a noisy channel, and together they define channel capacity $C = \max_{p(x)} I(X; Y)$ — the ultimate data rate achievable with vanishing error probability. The AWGN channel capacity $C = W \log_2(1 + \text{SNR})$ and its MIMO generalization $C = \log_2 \det(\mathbf{I} + \text{SNR} \cdot \mathbf{H}\mathbf{H}^H)$ are direct consequences of the Gaussian and complex Gaussian distributions studied here, while the channel coding theorem guarantees that codes exist which approach these limits — motivating the design of practical codes (turbo, LDPC, polar) that will be examined alongside their information-theoretic underpinnings.

Poisson Processes Exercises