Testing Scientific Code

Why Test Scientific Code?

Scientific code that produces wrong answers is worse than code that crashes β€” at least a crash tells you something is wrong. A simulation that silently computes the wrong BER, or an estimator with an off-by-one index, can waste months of research. Automated tests are the only scalable defense.

This section covers pytest β€” the de facto standard for Python testing β€” with a focus on techniques specific to numerical and stochastic code.

Historical Note: From unittest to pytest

2004-present

Python's standard library has included unittest since Python 2.1 (2001), modeled after Java's JUnit. The pytest framework, created by Holger Krekel in 2004 as "py.test," took a radically different approach: plain assert statements instead of self.assertEqual, automatic test discovery, and powerful fixtures. By 2015, pytest had become the dominant testing framework in the scientific Python ecosystem, used by NumPy, SciPy, pandas, scikit-learn, and most major projects.

Definition:

pytest Test Function

A test function is any function whose name starts with test_ in a file whose name starts with test_ or ends with _test.py:

# test_channel.py
import numpy as np

def test_channel_shape():
    H = make_rayleigh_channel(4, 2)
    assert H.shape == (4, 2)

def test_channel_dtype():
    H = make_rayleigh_channel(4, 2)
    assert H.dtype == np.complex128

Run with pytest test_channel.py -v. pytest discovers and runs both tests, reporting pass/fail with detailed diffs on failure.

Definition:

pytest Fixture

A fixture is a function decorated with @pytest.fixture that provides setup (and optional teardown) for tests:

import pytest
import numpy as np

@pytest.fixture
def channel_4x2():
    """A reproducible 4x2 Rayleigh channel."""
    rng = np.random.default_rng(42)
    return (rng.standard_normal((4, 2))
            + 1j * rng.standard_normal((4, 2))) / np.sqrt(2)

def test_zf_shape(channel_4x2):
    x_hat = zf_detect(channel_4x2, np.ones(4) + 0j)
    assert x_hat.shape == (2,)

Fixtures are injected by name β€” pytest matches function parameter names to available fixtures automatically.

Definition:

Parametrized Tests

@pytest.mark.parametrize generates multiple test cases from a list of parameter values:

@pytest.mark.parametrize("n_rx, n_tx", [
    (2, 2), (4, 2), (8, 4), (16, 8),
])
def test_channel_shape(n_rx, n_tx):
    H = make_rayleigh_channel(n_rx, n_tx)
    assert H.shape == (n_rx, n_tx)

This generates four separate test cases, each reported independently. Parametrized tests are essential for testing across different antenna configurations, SNR values, or modulation orders.

Theorem: Floating-Point Comparison Theorem

Two floating-point computations of the same mathematical expression can differ by up to O(Ο΅machβ‹…ΞΊ)O(\epsilon_{\text{mach}} \cdot \kappa) where Ο΅mach\epsilon_{\text{mach}} is machine epsilon (β‰ˆ2.2Γ—10βˆ’16\approx 2.2 \times 10^{-16} for float64) and ΞΊ\kappa is the condition number of the computation. Therefore, testing floating-point equality with == is almost always wrong. Instead, use:

∣aβˆ’bβˆ£β‰€atol+rtolβ‹…βˆ£b∣|a - b| \leq \texttt{atol} + \texttt{rtol} \cdot |b|

where atol is the absolute tolerance and rtol is the relative tolerance.

Machine arithmetic is like measuring with a ruler that has finite precision marks. Two people measuring the same distance get slightly different readings. assert_allclose checks that the readings agree within the ruler's precision, not that they are identical.

Theorem: Reproducibility via Seeded RNG

Let ff be a deterministic function and G(s)G(s) a pseudorandom number generator initialized with seed ss. Then for any fixed seed s0s_0, the sequence G(s0)G(s_0) is deterministic, and f(G(s0))f(G(s_0)) produces the same output on every run. This makes stochastic algorithms testable.

A seeded RNG is not random β€” it is a very long, predetermined sequence of numbers that merely looks random. By fixing the seed, you turn a stochastic test into a deterministic one.

Example: Testing Numerical Code with assert_allclose

Write tests for a function that computes the inverse of a matrix, verifying that Aβ‹…Aβˆ’1β‰ˆIA \cdot A^{-1} \approx I within numerical tolerance.

Example: Testing Stochastic Code

Write a test for a function that estimates BER via Monte Carlo simulation, ensuring the result is statistically reasonable.

Example: Property-Based Testing with Hypothesis

Use the hypothesis library to automatically generate test cases that verify algebraic properties of matrix operations.

Parametrized Test Visualizer

See how pytest parametrize generates test cases across multiple parameter dimensions and which combinations pass or fail.

Parameters

pytest Test Discovery and Execution Flow

pytest Test Discovery and Execution Flow
The pytest pipeline: discovery (find test files and functions), collection (resolve parametrize and fixtures), execution (run tests with captured output), and reporting (pass/fail/error summary).

Common Mistake: Testing Floating-Point Equality with ==

Mistake:

def test_inverse():
    A = np.array([[1, 2], [3, 4]], dtype=float)
    result = np.linalg.inv(A) @ A
    assert (result == np.eye(2)).all()  # FAILS!

Correction:

def test_inverse():
    A = np.array([[1, 2], [3, 4]], dtype=float)
    result = np.linalg.inv(A) @ A
    np.testing.assert_allclose(result, np.eye(2), atol=1e-14)

Common Mistake: Using np.random.seed() for Test Reproducibility

Mistake:

def test_noisy():
    np.random.seed(42)  # Global state β€” affects other tests!
    noise = np.random.randn(100)
    ...

Correction:

def test_noisy():
    rng = np.random.default_rng(42)  # Local generator
    noise = rng.standard_normal(100)
    ...

The Generator API is independent of global state, so tests cannot interfere with each other.

Quick Check

How does pytest inject a fixture into a test function?

By inheritance β€” the test class must inherit from the fixture class

By matching parameter names β€” if a test parameter matches a fixture name, it is injected

By explicit import β€” you must import the fixture in the test file

By decorator β€” each test must be decorated with @use_fixture

Quick Check

What does np.testing.assert_allclose(a, b, atol=1e-8, rtol=1e-5) check?

|a - b| <= 1e-8 for every element

|a - b| <= 1e-8 + 1e-5 * |b| for every element

|a - b| / |b| <= 1e-5 for every element

max(|a - b|) <= 1e-8

fixture

A pytest mechanism for providing test setup, teardown, and shared resources via dependency injection based on function parameter names.

Related: parametrize

parametrize

A pytest decorator that generates multiple test cases by substituting different parameter values into a single test function.

Related: fixture

hypothesis (testing)

A property-based testing library that generates random inputs to verify that algebraic or structural properties always hold.

Related: parametrize

Key Takeaway

Never compare floating-point results with ==. Use np.testing.assert_allclose with appropriate atol and rtol for numerical code. The tolerance should reflect the condition number of the computation, not just machine epsilon.

pytest Examples for Numerical Code

python
Complete pytest test suite demonstrating fixtures, parametrize, assert_allclose, and seeded RNG patterns for scientific code.
# Code from: ch04/python/pytest_examples.py
# Load from backend supplements endpoint

Property-Based Testing with Hypothesis

python
Hypothesis-based property tests for linear algebra operations: associativity, involutions, and distribution laws.
# Code from: ch04/python/property_testing.py
# Load from backend supplements endpoint