Testing Scientific Code
Why Test Scientific Code?
Scientific code that produces wrong answers is worse than code that crashes β at least a crash tells you something is wrong. A simulation that silently computes the wrong BER, or an estimator with an off-by-one index, can waste months of research. Automated tests are the only scalable defense.
This section covers pytest β the de facto standard for Python testing β
with a focus on techniques specific to numerical and stochastic code.
Historical Note: From unittest to pytest
2004-presentPython's standard library has included unittest since Python 2.1 (2001),
modeled after Java's JUnit. The pytest framework, created by Holger
Krekel in 2004 as "py.test," took a radically different approach: plain
assert statements instead of self.assertEqual, automatic test
discovery, and powerful fixtures. By 2015, pytest had become the
dominant testing framework in the scientific Python ecosystem, used by
NumPy, SciPy, pandas, scikit-learn, and most major projects.
Definition: pytest Test Function
pytest Test Function
A test function is any function whose name starts with test_ in a
file whose name starts with test_ or ends with _test.py:
# test_channel.py
import numpy as np
def test_channel_shape():
H = make_rayleigh_channel(4, 2)
assert H.shape == (4, 2)
def test_channel_dtype():
H = make_rayleigh_channel(4, 2)
assert H.dtype == np.complex128
Run with pytest test_channel.py -v. pytest discovers and runs both
tests, reporting pass/fail with detailed diffs on failure.
Definition: pytest Fixture
pytest Fixture
A fixture is a function decorated with @pytest.fixture that
provides setup (and optional teardown) for tests:
import pytest
import numpy as np
@pytest.fixture
def channel_4x2():
"""A reproducible 4x2 Rayleigh channel."""
rng = np.random.default_rng(42)
return (rng.standard_normal((4, 2))
+ 1j * rng.standard_normal((4, 2))) / np.sqrt(2)
def test_zf_shape(channel_4x2):
x_hat = zf_detect(channel_4x2, np.ones(4) + 0j)
assert x_hat.shape == (2,)
Fixtures are injected by name β pytest matches function parameter names to available fixtures automatically.
Definition: Parametrized Tests
Parametrized Tests
@pytest.mark.parametrize generates multiple test cases from a
list of parameter values:
@pytest.mark.parametrize("n_rx, n_tx", [
(2, 2), (4, 2), (8, 4), (16, 8),
])
def test_channel_shape(n_rx, n_tx):
H = make_rayleigh_channel(n_rx, n_tx)
assert H.shape == (n_rx, n_tx)
This generates four separate test cases, each reported independently. Parametrized tests are essential for testing across different antenna configurations, SNR values, or modulation orders.
Theorem: Floating-Point Comparison Theorem
Two floating-point computations of the same mathematical expression
can differ by up to where
is machine epsilon (
for float64) and is the condition number of the computation.
Therefore, testing floating-point equality with == is almost always
wrong. Instead, use:
where atol is the absolute tolerance and rtol is the relative
tolerance.
Machine arithmetic is like measuring with a ruler that has finite
precision marks. Two people measuring the same distance get slightly
different readings. assert_allclose checks that the readings agree
within the ruler's precision, not that they are identical.
Theorem: Reproducibility via Seeded RNG
Let be a deterministic function and a pseudorandom number generator initialized with seed . Then for any fixed seed , the sequence is deterministic, and produces the same output on every run. This makes stochastic algorithms testable.
A seeded RNG is not random β it is a very long, predetermined sequence of numbers that merely looks random. By fixing the seed, you turn a stochastic test into a deterministic one.
Always use np.random.default_rng(seed) instead of np.random.seed(seed) β the global state is a shared mutable resource.
Pass the Generator object to functions rather than relying on global state.
Example: Testing Numerical Code with assert_allclose
Write tests for a function that computes the inverse of a matrix, verifying that within numerical tolerance.
Write the test
import numpy as np
from numpy.testing import assert_allclose
def test_matrix_inverse():
rng = np.random.default_rng(42)
A = rng.standard_normal((5, 5))
A_inv = np.linalg.inv(A)
# A @ A_inv should be close to identity
result = A @ A_inv
expected = np.eye(5)
assert_allclose(result, expected, atol=1e-12)
Understand the tolerance
The default rtol=1e-7 works for well-conditioned matrices.
For ill-conditioned matrices (high condition number), you need
a larger tolerance:
def test_ill_conditioned():
# Hilbert matrix: notoriously ill-conditioned
from scipy.linalg import hilbert, invhilbert
n = 8
H = hilbert(n)
H_inv = np.linalg.inv(H)
expected = invhilbert(n)
# Need much looser tolerance due to ill-conditioning
assert_allclose(H_inv, expected, rtol=1e-2)
Example: Testing Stochastic Code
Write a test for a function that estimates BER via Monte Carlo simulation, ensuring the result is statistically reasonable.
Seeded deterministic test
def test_ber_bpsk_seeded():
"""Test BPSK BER at known SNR with fixed seed."""
rng = np.random.default_rng(12345)
ber = simulate_bpsk_ber(snr_db=10.0, n_bits=100_000, rng=rng)
# Theoretical BER for BPSK at 10 dB: ~3.87e-6
# With 100k bits, Monte Carlo estimate has high variance
# Use a wide tolerance for the seeded value
assert 0 <= ber <= 0.01 # Sanity check
Statistical property test
def test_ber_decreases_with_snr():
"""BER should decrease monotonically with SNR."""
rng = np.random.default_rng(42)
snrs = [0, 5, 10, 15]
bers = [simulate_bpsk_ber(snr, 50_000, rng=rng) for snr in snrs]
for i in range(len(bers) - 1):
assert bers[i] >= bers[i + 1], (
f"BER should decrease: BER({snrs[i]} dB)={bers[i]:.4e} "
f"> BER({snrs[i+1]} dB)={bers[i+1]:.4e}"
)
Example: Property-Based Testing with Hypothesis
Use the hypothesis library to automatically generate test cases
that verify algebraic properties of matrix operations.
Test that transpose is an involution
from hypothesis import given, settings
from hypothesis import strategies as st
import numpy as np
@given(
n=st.integers(min_value=1, max_value=50),
m=st.integers(min_value=1, max_value=50),
seed=st.integers(min_value=0, max_value=2**32 - 1),
)
@settings(max_examples=100)
def test_transpose_involution(n, m, seed):
rng = np.random.default_rng(seed)
A = rng.standard_normal((n, m))
assert_allclose(A.T.T, A)
Test that matrix multiply is associative
@given(
n=st.integers(min_value=1, max_value=20),
seed=st.integers(min_value=0, max_value=2**32 - 1),
)
def test_matmul_associative(n, seed):
rng = np.random.default_rng(seed)
A = rng.standard_normal((n, n))
B = rng.standard_normal((n, n))
C = rng.standard_normal((n, n))
assert_allclose((A @ B) @ C, A @ (B @ C), rtol=1e-10)
Parametrized Test Visualizer
See how pytest parametrize generates test cases across multiple parameter dimensions and which combinations pass or fail.
Parameters
pytest Test Discovery and Execution Flow
Common Mistake: Testing Floating-Point Equality with ==
Mistake:
def test_inverse():
A = np.array([[1, 2], [3, 4]], dtype=float)
result = np.linalg.inv(A) @ A
assert (result == np.eye(2)).all() # FAILS!
Correction:
def test_inverse():
A = np.array([[1, 2], [3, 4]], dtype=float)
result = np.linalg.inv(A) @ A
np.testing.assert_allclose(result, np.eye(2), atol=1e-14)
Common Mistake: Using np.random.seed() for Test Reproducibility
Mistake:
def test_noisy():
np.random.seed(42) # Global state β affects other tests!
noise = np.random.randn(100)
...
Correction:
def test_noisy():
rng = np.random.default_rng(42) # Local generator
noise = rng.standard_normal(100)
...
The Generator API is independent of global state, so tests
cannot interfere with each other.
Quick Check
How does pytest inject a fixture into a test function?
By inheritance β the test class must inherit from the fixture class
By matching parameter names β if a test parameter matches a fixture name, it is injected
By explicit import β you must import the fixture in the test file
By decorator β each test must be decorated with @use_fixture
pytest inspects the test function signature and injects fixtures whose names match parameter names.
Quick Check
What does np.testing.assert_allclose(a, b, atol=1e-8, rtol=1e-5) check?
|a - b| <= 1e-8 for every element
|a - b| <= 1e-8 + 1e-5 * |b| for every element
|a - b| / |b| <= 1e-5 for every element
max(|a - b|) <= 1e-8
|a - b| <= 1e-8 + 1e-5 * |b| for every elementThe formula combines absolute and relative tolerance: |a - b| <= atol + rtol * |b|.
fixture
A pytest mechanism for providing test setup, teardown, and shared resources via dependency injection based on function parameter names.
Related: parametrize
parametrize
A pytest decorator that generates multiple test cases by substituting different parameter values into a single test function.
Related: fixture
hypothesis (testing)
A property-based testing library that generates random inputs to verify that algebraic or structural properties always hold.
Related: parametrize
Key Takeaway
Never compare floating-point results with ==. Use
np.testing.assert_allclose with appropriate atol and rtol
for numerical code. The tolerance should reflect the condition
number of the computation, not just machine epsilon.
pytest Examples for Numerical Code
# Code from: ch04/python/pytest_examples.py
# Load from backend supplements endpointProperty-Based Testing with Hypothesis
# Code from: ch04/python/property_testing.py
# Load from backend supplements endpoint