Testing Scientific Code

Why Test Scientific Code?

Scientific code that produces wrong answers is worse than code that crashes — at least a crash tells you something is wrong. A simulation that silently computes the wrong BER, or an estimator with an off-by-one index, can waste months of research. Automated tests are the only scalable defense.

This section covers pytest — the de facto standard for Python testing — with a focus on techniques specific to numerical and stochastic code.

Historical Note: From unittest to pytest

2004-present

Python's standard library has included unittest since Python 2.1 (2001), modeled after Java's JUnit. The pytest framework, created by Holger Krekel in 2004 as "py.test," took a radically different approach: plain assert statements instead of self.assertEqual, automatic test discovery, and powerful fixtures. By 2015, pytest had become the dominant testing framework in the scientific Python ecosystem, used by NumPy, SciPy, pandas, scikit-learn, and most major projects.

Definition:
pytest Test Function

A test function is any function whose name starts with test_ in a file whose name starts with test_ or ends with _test.py:

# test_channel.py
import numpy as np

def test_channel_shape():
    H = make_rayleigh_channel(4, 2)
    assert H.shape == (4, 2)

def test_channel_dtype():
    H = make_rayleigh_channel(4, 2)
    assert H.dtype == np.complex128

Run with pytest test_channel.py -v. pytest discovers and runs both tests, reporting pass/fail with detailed diffs on failure.

Definition:
pytest Fixture

A fixture is a function decorated with @pytest.fixture that provides setup (and optional teardown) for tests:

import pytest
import numpy as np

@pytest.fixture
def channel_4x2():
    """A reproducible 4x2 Rayleigh channel."""
    rng = np.random.default_rng(42)
    return (rng.standard_normal((4, 2))
            + 1j * rng.standard_normal((4, 2))) / np.sqrt(2)

def test_zf_shape(channel_4x2):
    x_hat = zf_detect(channel_4x2, np.ones(4) + 0j)
    assert x_hat.shape == (2,)

Fixtures are injected by name — pytest matches function parameter names to available fixtures automatically.

Definition:
Parametrized Tests

@pytest.mark.parametrize generates multiple test cases from a list of parameter values:

@pytest.mark.parametrize("n_rx, n_tx", [
    (2, 2), (4, 2), (8, 4), (16, 8),
])
def test_channel_shape(n_rx, n_tx):
    H = make_rayleigh_channel(n_rx, n_tx)
    assert H.shape == (n_rx, n_tx)

This generates four separate test cases, each reported independently. Parametrized tests are essential for testing across different antenna configurations, SNR values, or modulation orders.

Theorem: Floating-Point Comparison Theorem

Two floating-point computations of the same mathematical expression can differ by up to $O(\epsilon_{\text{mach}} \cdot \kappa)$ where $\epsilon_{\text{mach}}$ is machine epsilon ( $\approx 2.2 \times 10^{-16}$ for float64) and $\kappa$ is the condition number of the computation. Therefore, testing floating-point equality with == is almost always wrong. Instead, use:

$|a - b| \leq \texttt{atol} + \texttt{rtol} \cdot |b|$

where atol is the absolute tolerance and rtol is the relative tolerance.

Machine arithmetic is like measuring with a ruler that has finite precision marks. Two people measuring the same distance get slightly different readings. assert_allclose checks that the readings agree within the ruler's precision, not that they are identical.

Theorem: Reproducibility via Seeded RNG

Let $f$ be a deterministic function and $G(s)$ a pseudorandom number generator initialized with seed $s$ . Then for any fixed seed $s_0$ , the sequence $G(s_0)$ is deterministic, and $f(G(s_0))$ produces the same output on every run. This makes stochastic algorithms testable.

A seeded RNG is not random — it is a very long, predetermined sequence of numbers that merely looks random. By fixing the seed, you turn a stochastic test into a deterministic one.

Show Hint

Always use np.random.default_rng(seed) instead of np.random.seed(seed) — the global state is a shared mutable resource.

Pass the Generator object to functions rather than relying on global state.

Example: Testing Numerical Code with assert_allclose

Write tests for a function that computes the inverse of a matrix, verifying that $A \cdot A^{-1} \approx I$ within numerical tolerance.

Solution

Write the test

import numpy as np
from numpy.testing import assert_allclose

def test_matrix_inverse():
    rng = np.random.default_rng(42)
    A = rng.standard_normal((5, 5))
    A_inv = np.linalg.inv(A)

    # A @ A_inv should be close to identity
    result = A @ A_inv
    expected = np.eye(5)
    assert_allclose(result, expected, atol=1e-12)

Understand the tolerance

The default rtol=1e-7 works for well-conditioned matrices. For ill-conditioned matrices (high condition number), you need a larger tolerance:

def test_ill_conditioned():
    # Hilbert matrix: notoriously ill-conditioned
    from scipy.linalg import hilbert, invhilbert
    n = 8
    H = hilbert(n)
    H_inv = np.linalg.inv(H)
    expected = invhilbert(n)
    # Need much looser tolerance due to ill-conditioning
    assert_allclose(H_inv, expected, rtol=1e-2)

Example: Testing Stochastic Code

Write a test for a function that estimates BER via Monte Carlo simulation, ensuring the result is statistically reasonable.

Solution

Seeded deterministic test

def test_ber_bpsk_seeded():
    """Test BPSK BER at known SNR with fixed seed."""
    rng = np.random.default_rng(12345)
    ber = simulate_bpsk_ber(snr_db=10.0, n_bits=100_000, rng=rng)
    # Theoretical BER for BPSK at 10 dB: ~3.87e-6
    # With 100k bits, Monte Carlo estimate has high variance
    # Use a wide tolerance for the seeded value
    assert 0 <= ber <= 0.01  # Sanity check

Statistical property test

def test_ber_decreases_with_snr():
    """BER should decrease monotonically with SNR."""
    rng = np.random.default_rng(42)
    snrs = [0, 5, 10, 15]
    bers = [simulate_bpsk_ber(snr, 50_000, rng=rng) for snr in snrs]
    for i in range(len(bers) - 1):
        assert bers[i] >= bers[i + 1], (
            f"BER should decrease: BER({snrs[i]} dB)={bers[i]:.4e} "
            f"> BER({snrs[i+1]} dB)={bers[i+1]:.4e}"
        )

Example: Property-Based Testing with Hypothesis

Use the hypothesis library to automatically generate test cases that verify algebraic properties of matrix operations.

Solution

Test that transpose is an involution

from hypothesis import given, settings
from hypothesis import strategies as st
import numpy as np

@given(
    n=st.integers(min_value=1, max_value=50),
    m=st.integers(min_value=1, max_value=50),
    seed=st.integers(min_value=0, max_value=2**32 - 1),
)
@settings(max_examples=100)
def test_transpose_involution(n, m, seed):
    rng = np.random.default_rng(seed)
    A = rng.standard_normal((n, m))
    assert_allclose(A.T.T, A)

Test that matrix multiply is associative

@given(
    n=st.integers(min_value=1, max_value=20),
    seed=st.integers(min_value=0, max_value=2**32 - 1),
)
def test_matmul_associative(n, seed):
    rng = np.random.default_rng(seed)
    A = rng.standard_normal((n, n))
    B = rng.standard_normal((n, n))
    C = rng.standard_normal((n, n))
    assert_allclose((A @ B) @ C, A @ (B @ C), rtol=1e-10)

Parametrized Test Visualizer

See how pytest parametrize generates test cases across multiple parameter dimensions and which combinations pass or fail.

Parameters

pytest Test Discovery and Execution Flow — The pytest pipeline: discovery (find test files and functions), collection (resolve parametrize and fixtures), execution (run tests with captured output), and reporting (pass/fail/error summary).

Common Mistake: Testing Floating-Point Equality with ==

Mistake:

def test_inverse():
    A = np.array([[1, 2], [3, 4]], dtype=float)
    result = np.linalg.inv(A) @ A
    assert (result == np.eye(2)).all()  # FAILS!

Correction:

def test_inverse():
    A = np.array([[1, 2], [3, 4]], dtype=float)
    result = np.linalg.inv(A) @ A
    np.testing.assert_allclose(result, np.eye(2), atol=1e-14)

Common Mistake: Using np.random.seed() for Test Reproducibility

Mistake:

def test_noisy():
    np.random.seed(42)  # Global state — affects other tests!
    noise = np.random.randn(100)
    ...

Correction:

def test_noisy():
    rng = np.random.default_rng(42)  # Local generator
    noise = rng.standard_normal(100)
    ...

The Generator API is independent of global state, so tests cannot interfere with each other.

Quick Check

How does pytest inject a fixture into a test function?

By inheritance — the test class must inherit from the fixture class

By matching parameter names — if a test parameter matches a fixture name, it is injected

By explicit import — you must import the fixture in the test file

By decorator — each test must be decorated with @use_fixture

Correction:

By matching parameter names — if a test parameter matches a fixture name, it is injected

pytest inspects the test function signature and injects fixtures whose names match parameter names.

Quick Check

What does np.testing.assert_allclose(a, b, atol=1e-8, rtol=1e-5) check?

|a - b| <= 1e-8 for every element

|a - b| <= 1e-8 + 1e-5 * |b| for every element

|a - b| / |b| <= 1e-5 for every element

max(|a - b|) <= 1e-8

Correction:

|a - b| <= 1e-8 + 1e-5 * |b| for every element

The formula combines absolute and relative tolerance: |a - b| <= atol + rtol * |b|.

fixture

A pytest mechanism for providing test setup, teardown, and shared resources via dependency injection based on function parameter names.

Related: parametrize

parametrize

A pytest decorator that generates multiple test cases by substituting different parameter values into a single test function.

Related: fixture

hypothesis (testing)

A property-based testing library that generates random inputs to verify that algebraic or structural properties always hold.

Related: parametrize

Key Takeaway

Never compare floating-point results with ==. Use np.testing.assert_allclose with appropriate atol and rtol for numerical code. The tolerance should reflect the condition number of the computation, not just machine epsilon.

pytest Examples for Numerical Code

python

Complete pytest test suite demonstrating fixtures, parametrize, assert_allclose, and seeded RNG patterns for scientific code.

# Code from: ch04/python/pytest_examples.py
# Load from backend supplements endpoint

Property-Based Testing with Hypothesis

python

Hypothesis-based property tests for linear algebra operations: associativity, involutions, and distribution laws.

# Code from: ch04/python/property_testing.py
# Load from backend supplements endpoint

Runtime Validation Debugging and Profiling

Testing Scientific Code

Why Test Scientific Code?

Historical Note: From unittest to pytest

Definition: pytest Test Function

Definition: pytest Fixture

Definition: Parametrized Tests

Theorem: Floating-Point Comparison Theorem

Theorem: Reproducibility via Seeded RNG

Example: Testing Numerical Code with assert_allclose

Write the test

Understand the tolerance

Example: Testing Stochastic Code

Seeded deterministic test

Statistical property test

Example: Property-Based Testing with Hypothesis

Test that transpose is an involution

Test that matrix multiply is associative

Parametrized Test Visualizer

Parameters

pytest Test Discovery and Execution Flow

Common Mistake: Testing Floating-Point Equality with ==

Common Mistake: Using np.random.seed() for Test Reproducibility

Quick Check

Quick Check

fixture

parametrize

hypothesis (testing)

Key Takeaway

pytest Examples for Numerical Code

Property-Based Testing with Hypothesis

Definition:
pytest Test Function

Definition:
pytest Fixture

Definition:
Parametrized Tests