Random Number Generation

Randomness in Scientific Computing

Random number generation underpins Monte Carlo simulation, stochastic optimization, statistical testing, and noise modeling in communications. NumPy's modern Generator API (introduced in 1.17) provides fast, reproducible, and statistically sound random number generation β€” replacing the legacy np.random.seed / np.random.randn interface.

This section covers the modern API exclusively.

Definition:

The Generator API

The modern way to generate random numbers in NumPy:

rng = np.random.default_rng(seed=42)   # create a Generator

# Gaussian (normal)
x = rng.standard_normal(1000)         # shape (1000,), mean=0, std=1
y = rng.normal(loc=3.0, scale=0.5, size=(100, 4))

# Uniform
u = rng.uniform(low=0, high=1, size=500)

# Integer
idx = rng.integers(low=0, high=10, size=20)

Each Generator instance carries its own state, so multiple generators can run independently (unlike the legacy global state).

Definition:

Key Distributions for Scientific Computing

Distribution NumPy Method Use Case
Gaussian rng.normal(mu, sigma, size) Noise, Bayesian priors
Uniform rng.uniform(lo, hi, size) Initialization, sampling
Rayleigh rng.rayleigh(scale, size) Fading channel envelopes
Exponential rng.exponential(scale, size) Inter-arrival times
Poisson rng.poisson(lam, size) Photon counting, events
Chi-squared rng.chisquare(df, size) Goodness-of-fit tests

Complex Gaussian: NumPy has no built-in complex normal, but it is trivially constructed:

def complex_gaussian(rng, shape, power=1.0):
    """Circularly-symmetric complex Gaussian: CN(0, power)."""
    std = np.sqrt(power / 2)
    return rng.normal(0, std, shape) + 1j * rng.normal(0, std, shape)

Definition:

Seeding and Reproducibility

A seed initializes the random number generator to a known state, making results reproducible:

# Reproducible: same seed -> same sequence
rng1 = np.random.default_rng(42)
rng2 = np.random.default_rng(42)
assert np.array_equal(rng1.standard_normal(10),
                      rng2.standard_normal(10))

# Independent streams for parallel work
seed_seq = np.random.SeedSequence(42)
child_seeds = seed_seq.spawn(4)
rngs = [np.random.default_rng(s) for s in child_seeds]
# Each rng produces a statistically independent stream

Never use np.random.seed() in new code β€” it sets global state and is not thread-safe.

Definition:

Random Matrices

Generating random matrices with special structure:

rng = np.random.default_rng(42)

# Random orthogonal matrix (Haar-distributed)
from scipy.stats import ortho_group
Q = ortho_group.rvs(dim=4, random_state=rng)
# Q @ Q.T β‰ˆ I

# Random unitary matrix
from scipy.stats import unitary_group
U = unitary_group.rvs(dim=4, random_state=rng)
# U @ U.conj().T β‰ˆ I

# Quick alternative: QR decomposition of random matrix
A = rng.standard_normal((4, 4))
Q, R = np.linalg.qr(A)
# Q is orthogonal, but not Haar-uniform (signs may be biased)

Theorem: Central Limit Theorem (Computational Verification)

Let X1,X2,…,XnX_1, X_2, \ldots, X_n be i.i.d. random variables with mean ΞΌ\mu and variance Οƒ2\sigma^2. Then the standardized sample mean XΛ‰nβˆ’ΞΌΟƒ/n\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} converges in distribution to N(0,1)\mathcal{N}(0, 1) as nβ†’βˆžn \to \infty.

No matter what distribution you start with (uniform, exponential, Poisson), the average of enough samples always looks Gaussian. This is why Gaussian noise appears everywhere in science.

Example: Complex Gaussian Noise for Communications

Generate circularly-symmetric complex Gaussian noise with power Οƒ2=0.1\sigma^2 = 0.1 and verify that real and imaginary parts each have variance Οƒ2/2\sigma^2 / 2.

Example: Reproducible Parallel Simulations

Run 4 independent Monte Carlo simulations with reproducible but statistically independent random streams using SeedSequence.

Distribution Explorer

Generate and visualize samples from different probability distributions. Compare histograms with theoretical PDFs.

Parameters

Historical Note: Mersenne Twister to PCG

2019

NumPy's legacy RNG used the Mersenne Twister (MT19937), introduced in 1997 by Matsumoto and Nishimura. While it has a huge period (219937βˆ’12^{19937} - 1), it has known statistical weaknesses and is slow on modern CPUs. NumPy 1.17 (2019) introduced the Generator API with PCG64 (Permuted Congruential Generator) as the default bit generator β€” faster, smaller state, and better statistical properties.

Quick Check

What is wrong with using np.random.seed(42) in modern code?

It sets global state, is not thread-safe, and prevents independent streams

It does not produce reproducible results

The seed value 42 is too small

It is slower than default_rng

Common Mistake: Using Legacy np.random Functions

Mistake:

Using the global random state in library or parallel code:

np.random.seed(42)
x = np.random.randn(100)  # global state β€” not thread-safe!

Correction:

Use the Generator API with explicit state:

rng = np.random.default_rng(42)
x = rng.standard_normal(100)  # local state, reproducible, thread-safe

Generator

NumPy's modern random number generator object (np.random.Generator), replacing the legacy global state.

Related: SeedSequence, PCG64, BitGenerator

Random Generation

python
Modern RNG API, complex Gaussian, correlated vectors, reproducibility patterns.
# Code from: ch05/python/random_generation.py
# Load from backend supplements endpoint