Distributions and Sampling

Why Statistics Is Central to Wireless Simulation

Every wireless system is inherently stochastic: noise is random, fading is random, user locations are random, and data bits are random. To simulate these systems faithfully, you need to draw samples from the correct distributions, verify your random number generators produce the right statistics, and generate correlated random vectors to model realistic channel conditions.

This section shows you how scipy.stats gives you a unified interface to over 100 distributions, how to fit distributions to data, and how to generate correlated Gaussian vectors via the Cholesky decomposition.

Definition:

Probability Density Function (PDF)

A continuous random variable XX has probability density function fX(x)f_X(x) satisfying:

P(a≀X≀b)=∫abfX(x) dx,fX(x)β‰₯0,βˆ«βˆ’βˆžβˆžfX(x) dx=1P(a \le X \le b) = \int_a^b f_X(x)\,dx, \quad f_X(x) \ge 0, \quad \int_{-\infty}^{\infty} f_X(x)\,dx = 1

In scipy.stats, every distribution object provides pdf(x), cdf(x), rvs(size), fit(data), and ppf(q) (quantile function).

from scipy.stats import norm
X = norm(loc=0, scale=1)      # standard normal
X.pdf(0.0)                    # f(0) = 0.3989...
X.cdf(1.96)                   # P(X <= 1.96) = 0.975
samples = X.rvs(size=10000)   # draw 10000 samples

The loc and scale parameters shift and scale any distribution. For norm, loc is the mean and scale is the standard deviation.

Definition:

CDF and Quantile (Percent-Point) Function

The cumulative distribution function (CDF) is:

FX(x)=P(X≀x)=βˆ«βˆ’βˆžxfX(t) dtF_X(x) = P(X \le x) = \int_{-\infty}^{x} f_X(t)\,dt

Its inverse FXβˆ’1(q)F_X^{-1}(q) is the quantile function (ppf in SciPy): for 0<q<10 < q < 1, FXβˆ’1(q)F_X^{-1}(q) returns the value xx such that P(X≀x)=qP(X \le x) = q.

from scipy.stats import norm
norm.ppf(0.975)    # 1.96 β€” the 97.5th percentile
norm.isf(0.025)    # 1.96 β€” inverse survival function

The quantile function is essential for computing confidence intervals and critical values in hypothesis testing.

Definition:

Gaussian (Normal) Distribution

The Gaussian distribution X∼N(ΞΌ,Οƒ2)X \sim \mathcal{N}(\mu, \sigma^2) has PDF:

fX(x)=12πσ2exp⁑ ⁣(βˆ’(xβˆ’ΞΌ)22Οƒ2)f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)

The QQ-function Q(x)=1βˆ’Ξ¦(x)Q(x) = 1 - \Phi(x) gives the tail probability and appears throughout BER analysis.

from scipy.stats import norm
from scipy.special import erfc
Q = lambda x: 0.5 * erfc(x / np.sqrt(2))
# Equivalent:
Q_scipy = lambda x: norm.sf(x)           # survival function

For BPSK over AWGN, the BER is exactly Q ⁣(2Eb/N0)Q\!\left(\sqrt{2E_b/N_0}\right). This is the single most important formula in digital communications.

Definition:

Circularly Symmetric Complex Gaussian

A circularly symmetric complex Gaussian (CSCG) random variable Z∼CN(0,Οƒ2)Z \sim \mathcal{CN}(0, \sigma^2) is defined as:

Z=X+jY,X,Y∼N(0,Οƒ2/2)Β independentZ = X + jY, \quad X, Y \sim \mathcal{N}(0, \sigma^2/2) \text{ independent}

The envelope ∣Z∣|Z| follows a Rayleigh distribution and the power ∣Z∣2|Z|^2 follows an exponential distribution with mean Οƒ2\sigma^2.

rng = np.random.default_rng(42)
sigma2 = 1.0
z = rng.standard_normal(10000) * np.sqrt(sigma2/2) \
  + 1j * rng.standard_normal(10000) * np.sqrt(sigma2/2)
# |z| is Rayleigh, |z|^2 is exponential

Definition:

Multivariate Gaussian and Covariance Matrix

A random vector x∼N(μ,R)\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{R}) has joint PDF:

f(x)=1(2Ο€)n/2∣R∣1/2exp⁑ ⁣(βˆ’12(xβˆ’ΞΌ)TRβˆ’1(xβˆ’ΞΌ))f(\mathbf{x}) = \frac{1}{(2\pi)^{n/2} |\mathbf{R}|^{1/2}} \exp\!\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T \mathbf{R}^{-1} (\mathbf{x}-\boldsymbol{\mu})\right)

where R=E[(xβˆ’ΞΌ)(xβˆ’ΞΌ)T]\mathbf{R} = E[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^T] is the covariance matrix (positive semi-definite).

from scipy.stats import multivariate_normal
R = np.array([[1.0, 0.8], [0.8, 1.0]])
rv = multivariate_normal(mean=[0, 0], cov=R)
samples = rv.rvs(size=5000)

Definition:

Generating Correlated Gaussian Samples via Cholesky

To generate x∼N(μ,R)\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{R}):

  1. Compute the Cholesky factor: R=LLT\mathbf{R} = \mathbf{L}\mathbf{L}^T
  2. Draw i.i.d. standard normals: z∼N(0,I)\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})
  3. Set x=ΞΌ+Lz\mathbf{x} = \boldsymbol{\mu} + \mathbf{L}\mathbf{z}

Then Cov[x]=Lβ‹…Iβ‹…LT=R\mathrm{Cov}[\mathbf{x}] = \mathbf{L} \cdot \mathbf{I} \cdot \mathbf{L}^T = \mathbf{R}.

L = np.linalg.cholesky(R)    # lower triangular
z = rng.standard_normal((n, N_samples))
x = mu[:, None] + L @ z      # each column is a correlated sample

This is far more efficient than multivariate_normal.rvs() when you need millions of samples: compute L\mathbf{L} once (O(n3)O(n^3)), then each sample costs only O(n2)O(n^2).

Theorem: Linear Transformation of Gaussian Vectors

If x∼N(μ,R)\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{R}) and y=Ax+b\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{b}, then:

y∼N(AΞΌ+b,β€…β€ŠARAT)\mathbf{y} \sim \mathcal{N}(\mathbf{A}\boldsymbol{\mu} + \mathbf{b},\; \mathbf{A}\mathbf{R}\mathbf{A}^T)

In particular, if R=LLT\mathbf{R} = \mathbf{L}\mathbf{L}^T and z∼N(0,I)\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}), then x=Lz∼N(0,R)\mathbf{x} = \mathbf{L}\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{R}).

A linear transformation of a Gaussian is still Gaussian. The mean transforms linearly and the covariance picks up A\mathbf{A} on both sides (the "sandwich" formula).

Theorem: Maximum Likelihood Estimation for Gaussian Parameters

Given NN i.i.d. samples x1,…,xNx_1, \dots, x_N from N(ΞΌ,Οƒ2)\mathcal{N}(\mu, \sigma^2), the maximum likelihood estimators are:

ΞΌ^ML=1Nβˆ‘i=1Nxi=xΛ‰,Οƒ^ML2=1Nβˆ‘i=1N(xiβˆ’xΛ‰)2\hat{\mu}_{\mathrm{ML}} = \frac{1}{N}\sum_{i=1}^N x_i = \bar{x}, \qquad \hat{\sigma}^2_{\mathrm{ML}} = \frac{1}{N}\sum_{i=1}^N (x_i - \bar{x})^2

The MLE for Οƒ2\sigma^2 is biased; the unbiased estimator divides by Nβˆ’1N-1 (Bessel's correction), which is what np.var(x, ddof=1) computes.

The log-likelihood is a concave quadratic in ΞΌ\mu and log⁑σ2\log\sigma^2. Setting the gradient to zero gives the sample mean and sample variance.

Theorem: Goodness of Fit via KS Statistic

The Kolmogorov-Smirnov statistic between the empirical CDF F^N(x)\hat{F}_N(x) and a reference CDF F0(x)F_0(x) is:

DN=sup⁑x∣F^N(x)βˆ’F0(x)∣D_N = \sup_x |\hat{F}_N(x) - F_0(x)|

Under H0:X∼F0H_0: X \sim F_0, N DNβ†’K\sqrt{N}\,D_N \to K where KK has the Kolmogorov distribution. scipy.stats.kstest returns (DN,p)(D_N, p).

The KS statistic measures the worst-case vertical gap between the empirical and theoretical CDFs. A small DND_N (large pp-value) means the data is consistent with F0F_0.

Example: Exploring Distributions with scipy.stats

Use scipy.stats to create Rayleigh, Ricean, and Nakagami distributions, compute their PDFs, and verify that rvs() samples match the theoretical PDF.

Example: Generating Correlated Fading Samples

Generate 10000 pairs of correlated Rayleigh fading coefficients with correlation ρ=0.8\rho = 0.8 using the Cholesky method, and verify the sample correlation matches the target.

Example: Fitting a Distribution to Channel Measurement Data

Given simulated channel power data, fit Rayleigh, Rice, and Nakagami distributions and determine which best fits the data using the KS test.

Distribution Explorer

Compare PDF, CDF, and histogram of scipy.stats distributions. Adjust distribution parameters and sample size to see how well the empirical distribution matches the theoretical one.

Parameters

Correlated Gaussian Sampling via Cholesky

Visualize how the Cholesky decomposition transforms i.i.d. Gaussian samples into correlated pairs. Adjust the correlation coefficient and observe the scatter plot and marginal histograms.

Parameters

Central Limit Theorem in Action

Watch the distribution of the sample mean converge to a Gaussian as the number of averaged samples increases. Start with any distribution (uniform, exponential, Rayleigh) and see the CLT at work.

Parameters

Wireless Distribution Family Tree

Wireless Distribution Family Tree
Relationships between common distributions in wireless communications. A complex Gaussian envelope yields Rayleigh; adding a LOS component gives Rice; Nakagami generalizes both.

Distributions and Sampling

python
Complete scipy.stats distribution exploration, fitting, and Cholesky sampling.
# Code from: ch09/python/distributions.py
# Load from backend supplements endpoint

Quick Check

To generate x∼N(0,R)\mathbf{x} \sim \mathcal{N}(\mathbf{0}, \mathbf{R}) via Cholesky, you compute R=LLT\mathbf{R} = \mathbf{L}\mathbf{L}^T and then set x=Lz\mathbf{x} = \mathbf{L}\mathbf{z} where z∼N(0,I)\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}). What happens if R\mathbf{R} is not positive definite?

The samples will have the wrong mean

np.linalg.cholesky raises a LinAlgError

The samples will be uncorrelated

The resulting distribution will be Rayleigh instead of Gaussian

Common Mistake: Using Global Seeds Instead of RNG Objects

Mistake:

Using np.random.seed(42) and then np.random.randn(...) for reproducibility. This uses the legacy global state, which is not thread-safe and causes subtle bugs in parallel simulations.

Correction:

Use the new np.random.default_rng(42) and call methods on the returned Generator object. Each parallel worker should get its own RNG with a different seed (or use SeedSequence.spawn()).

Common Mistake: Complex Gaussian Variance Convention

Mistake:

Writing z = rng.standard_normal(N) + 1j * rng.standard_normal(N) to get CN(0,1)\mathcal{CN}(0, 1). This produces Var[z]=2\mathrm{Var}[z] = 2, not 11.

Correction:

Divide by 2\sqrt{2}: z = (rng.standard_normal(N) + 1j * rng.standard_normal(N)) / np.sqrt(2). Now E[∣z∣2]=1E[|z|^2] = 1 as intended.

Why This Matters: From Distributions to Channel Models

The distributions in this section are not abstract mathematics β€” they are the building blocks of wireless channel models. A Rayleigh fading channel coefficient is literally h∼CN(0,1)h \sim \mathcal{CN}(0, 1); a Ricean channel with KK-factor adds a deterministic LOS component. The Cholesky method for generating correlated samples is exactly how correlated MIMO channels are simulated in Section 9.4.

See full treatment in Generating Fading Channels

Historical Note: Gauss and the Normal Distribution

19th century

Carl Friedrich Gauss derived the normal distribution in 1809 while studying the distribution of astronomical observation errors. The characteristic bell curve bears his name in many languages ("Gauss-Verteilung" in German). The Central Limit Theorem, which explains why the Gaussian is so ubiquitous, was proven rigorously by Lyapunov in 1901.

Historical Note: Lord Rayleigh and the Distribution of Wave Amplitudes

19th century

John William Strutt, 3rd Baron Rayleigh, derived the Rayleigh distribution in 1880 while studying the intensity of sound waves resulting from many superimposed sources. The same distribution describes the envelope of a narrowband signal that is the sum of many scattered multipath components β€” the physical basis of Rayleigh fading in wireless communications.

Probability Density Function (PDF)

A function fX(x)f_X(x) such that P(a≀X≀b)=∫abfX(x) dxP(a \le X \le b) = \int_a^b f_X(x)\,dx.

Related: Cumulative Distribution Function (CDF)

Cumulative Distribution Function (CDF)

FX(x)=P(X≀x)F_X(x) = P(X \le x), a non-decreasing function from 0 to 1.

Related: Probability Density Function (PDF)

Cholesky Decomposition

Factorization of a positive-definite matrix as R=LLT\mathbf{R} = \mathbf{L}\mathbf{L}^T where L\mathbf{L} is lower triangular. Used to generate correlated random samples.

Related: Covariance Matrix

Covariance Matrix

R=E[(xβˆ’ΞΌ)(xβˆ’ΞΌ)T]\mathbf{R} = E[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^T], a positive semi-definite matrix encoding pairwise correlations.

Related: Cholesky Decomposition

Maximum Likelihood Estimation (MLE)

A method that finds parameter values maximizing the likelihood function L(θ)=∏if(xi;θ)L(\theta) = \prod_i f(x_i; \theta).

Q-function

The tail probability of the standard normal: Q(x)=1βˆ’Ξ¦(x)=12erfc(x/2)Q(x) = 1 - \Phi(x) = \frac{1}{2}\mathrm{erfc}(x/\sqrt{2}). Appears in BER expressions for many modulation schemes.