Random Number Generation
Randomness in Scientific Computing
Random number generation underpins Monte Carlo simulation, stochastic
optimization, statistical testing, and noise modeling in communications.
NumPy's modern Generator API (introduced in 1.17) provides fast,
reproducible, and statistically sound random number generation β
replacing the legacy np.random.seed / np.random.randn interface.
This section covers the modern API exclusively.
Definition: The Generator API
The Generator API
The modern way to generate random numbers in NumPy:
rng = np.random.default_rng(seed=42) # create a Generator
# Gaussian (normal)
x = rng.standard_normal(1000) # shape (1000,), mean=0, std=1
y = rng.normal(loc=3.0, scale=0.5, size=(100, 4))
# Uniform
u = rng.uniform(low=0, high=1, size=500)
# Integer
idx = rng.integers(low=0, high=10, size=20)
Each Generator instance carries its own state, so multiple
generators can run independently (unlike the legacy global state).
Definition: Key Distributions for Scientific Computing
Key Distributions for Scientific Computing
| Distribution | NumPy Method | Use Case |
|---|---|---|
| Gaussian | rng.normal(mu, sigma, size) |
Noise, Bayesian priors |
| Uniform | rng.uniform(lo, hi, size) |
Initialization, sampling |
| Rayleigh | rng.rayleigh(scale, size) |
Fading channel envelopes |
| Exponential | rng.exponential(scale, size) |
Inter-arrival times |
| Poisson | rng.poisson(lam, size) |
Photon counting, events |
| Chi-squared | rng.chisquare(df, size) |
Goodness-of-fit tests |
Complex Gaussian: NumPy has no built-in complex normal, but it is trivially constructed:
def complex_gaussian(rng, shape, power=1.0):
"""Circularly-symmetric complex Gaussian: CN(0, power)."""
std = np.sqrt(power / 2)
return rng.normal(0, std, shape) + 1j * rng.normal(0, std, shape)
Definition: Seeding and Reproducibility
Seeding and Reproducibility
A seed initializes the random number generator to a known state, making results reproducible:
# Reproducible: same seed -> same sequence
rng1 = np.random.default_rng(42)
rng2 = np.random.default_rng(42)
assert np.array_equal(rng1.standard_normal(10),
rng2.standard_normal(10))
# Independent streams for parallel work
seed_seq = np.random.SeedSequence(42)
child_seeds = seed_seq.spawn(4)
rngs = [np.random.default_rng(s) for s in child_seeds]
# Each rng produces a statistically independent stream
Never use np.random.seed() in new code β it sets global state
and is not thread-safe.
Definition: Random Matrices
Random Matrices
Generating random matrices with special structure:
rng = np.random.default_rng(42)
# Random orthogonal matrix (Haar-distributed)
from scipy.stats import ortho_group
Q = ortho_group.rvs(dim=4, random_state=rng)
# Q @ Q.T β I
# Random unitary matrix
from scipy.stats import unitary_group
U = unitary_group.rvs(dim=4, random_state=rng)
# U @ U.conj().T β I
# Quick alternative: QR decomposition of random matrix
A = rng.standard_normal((4, 4))
Q, R = np.linalg.qr(A)
# Q is orthogonal, but not Haar-uniform (signs may be biased)
Theorem: Central Limit Theorem (Computational Verification)
Let be i.i.d. random variables with mean and variance . Then the standardized sample mean converges in distribution to as .
No matter what distribution you start with (uniform, exponential, Poisson), the average of enough samples always looks Gaussian. This is why Gaussian noise appears everywhere in science.
Example: Complex Gaussian Noise for Communications
Generate circularly-symmetric complex Gaussian noise with power and verify that real and imaginary parts each have variance .
Generate and verify
rng = np.random.default_rng(42)
sigma2 = 0.1
n_samples = 100_000
noise = np.sqrt(sigma2 / 2) * (
rng.standard_normal(n_samples)
+ 1j * rng.standard_normal(n_samples)
)
print(f"Total power: {np.mean(np.abs(noise)**2):.4f}") # ~0.1
print(f"Real variance: {np.var(noise.real):.4f}") # ~0.05
print(f"Imag variance: {np.var(noise.imag):.4f}") # ~0.05
Example: Reproducible Parallel Simulations
Run 4 independent Monte Carlo simulations with reproducible but
statistically independent random streams using SeedSequence.
Spawn independent generators
from concurrent.futures import ProcessPoolExecutor
def monte_carlo_pi(rng, n_samples=100_000):
"""Estimate pi using random points in unit square."""
x = rng.uniform(0, 1, n_samples)
y = rng.uniform(0, 1, n_samples)
return 4 * np.mean(x**2 + y**2 <= 1)
# Reproducible independent streams
ss = np.random.SeedSequence(42)
child_seeds = ss.spawn(4)
rngs = [np.random.default_rng(s) for s in child_seeds]
estimates = [monte_carlo_pi(rng) for rng in rngs]
print(f"Pi estimates: {estimates}")
print(f"Average: {np.mean(estimates):.6f}") # ~3.1416
Distribution Explorer
Generate and visualize samples from different probability distributions. Compare histograms with theoretical PDFs.
Parameters
Historical Note: Mersenne Twister to PCG
2019NumPy's legacy RNG used the Mersenne Twister (MT19937), introduced in 1997 by Matsumoto and Nishimura. While it has a huge period (), it has known statistical weaknesses and is slow on modern CPUs. NumPy 1.17 (2019) introduced the Generator API with PCG64 (Permuted Congruential Generator) as the default bit generator β faster, smaller state, and better statistical properties.
Quick Check
What is wrong with using np.random.seed(42) in modern code?
It sets global state, is not thread-safe, and prevents independent streams
It does not produce reproducible results
The seed value 42 is too small
It is slower than default_rng
The modern approach is np.random.default_rng(42) which creates a local generator.
Common Mistake: Using Legacy np.random Functions
Mistake:
Using the global random state in library or parallel code:
np.random.seed(42)
x = np.random.randn(100) # global state β not thread-safe!
Correction:
Use the Generator API with explicit state:
rng = np.random.default_rng(42)
x = rng.standard_normal(100) # local state, reproducible, thread-safe
Generator
NumPy's modern random number generator object (np.random.Generator), replacing the legacy global state.
Related: SeedSequence, PCG64, BitGenerator
Random Generation
# Code from: ch05/python/random_generation.py
# Load from backend supplements endpoint