Function Design for Scientific Code

Why Function Design Matters in Scientific Computing

Scientific code is often written as a long script: load data, transform, compute, plot. This works for one-off explorations but falls apart when you need to:

  • Re-run with different parameters (e.g., sweep over regularization strengths)
  • Test individual components (unit tests on a 500-line script?)
  • Share code with collaborators who use different data

Well-designed functions are the first step toward reusable, testable, composable scientific software. This section covers the patterns that make functions robust in production research code.

Definition:

Pure Function

A pure function is a function that:

  1. Always returns the same output for the same input (deterministic)
  2. Has no side effects β€” it does not modify external state, write to files, print to stdout, or mutate its arguments
# Pure: depends only on inputs, no side effects
def normalize(x: np.ndarray) -> np.ndarray:
    return (x - x.mean()) / x.std()

# Impure: modifies the input array in place
def normalize_inplace(x: np.ndarray) -> None:
    x -= x.mean()
    x /= x.std()

Pure functions are easier to test, parallelize, and reason about. In scientific code, prefer pure functions for data transformations and reserve side effects for I/O boundaries.

Definition:

First-Class Functions

In Python, functions are first-class objects: they can be assigned to variables, stored in data structures, passed as arguments, and returned from other functions.

def square(x):
    return x ** 2

# Assign to a variable
f = square
print(f(5))  # 25

# Store in a list
transforms = [square, np.sqrt, np.log]

# Pass as argument
results = list(map(square, [1, 2, 3, 4]))

This property is the foundation for closures, decorators, and higher-order functions β€” the main topics of this chapter.

Definition:

Function Signature and Type Hints

A function signature specifies the function's name, parameters, default values, and return type. In scientific Python, type hints serve as executable documentation:

def compute_ber(
    tx_bits: np.ndarray,
    rx_bits: np.ndarray,
    *,                          # force keyword-only after this
    ignore_prefix: int = 0,
) -> float:
    """Compute bit error rate between transmitted and received bits."""
    tx = tx_bits[ignore_prefix:]
    rx = rx_bits[ignore_prefix:]
    return np.mean(tx != rx)

The * forces ignore_prefix to be keyword-only, preventing positional errors like compute_ber(tx, rx, 100).

Definition:

NumPy-Style Docstrings

The NumPy docstring convention is the standard for scientific Python projects. It provides structured sections for parameters, returns, examples, and references:

def add_awgn(
    signal: np.ndarray,
    snr_db: float,
) -> np.ndarray:
    """Add white Gaussian noise to a signal.

    Parameters
    ----------
    signal : np.ndarray
        Input signal (1-D complex or real).
    snr_db : float
        Signal-to-noise ratio in decibels.

    Returns
    -------
    np.ndarray
        Noisy signal with same shape as input.

    Examples
    --------
    >>> rng = np.random.default_rng(42)
    >>> x = np.ones(1000)
    >>> y = add_awgn(x, snr_db=10.0)
    >>> 10 * np.log10(np.mean(x**2) / np.mean((y - x)**2))
    10.0  # approximately
    """
    ...

Tools like Sphinx, pdoc, and VS Code IntelliSense parse these docstrings automatically.

Definition:

Variadic Arguments: *args and **kwargs

Python supports variadic arguments for flexible function signatures:

  • *args collects extra positional arguments into a tuple
  • **kwargs collects extra keyword arguments into a dict
def run_experiment(name: str, *configs, **overrides):
    """Run an experiment with multiple configs and parameter overrides."""
    for config in configs:
        params = {**config, **overrides}
        print(f"{name}: {params}")

run_experiment(
    "BER Sweep",
    {"modulation": "QPSK", "snr": 10},
    {"modulation": "16QAM", "snr": 10},
    num_trials=1000,
)

In scientific code, **kwargs is commonly used to forward parameters to plotting functions or solver backends.

Common Mistake: The Mutable Default Argument Trap

Mistake:

Using a mutable object (list, dict, set) as a default argument:

def collect_results(result, cache=[]):   # BUG!
    cache.append(result)
    return cache

print(collect_results(1))  # [1]
print(collect_results(2))  # [1, 2]  β€” cache persists!

The default [] is created once at function definition time and shared across all calls.

Correction:

Use None as the default and create a new object inside the function:

def collect_results(result, cache=None):
    if cache is None:
        cache = []
    cache.append(result)
    return cache

This is the standard Python idiom. NumPy, SciPy, and scikit-learn all use this pattern extensively.

Common Mistake: Accidental Array Mutation via Views

Mistake:

NumPy slices return views, not copies. Modifying a slice inside a function mutates the caller's array:

def zero_negative(x: np.ndarray) -> np.ndarray:
    result = x          # This is NOT a copy!
    result[result < 0] = 0
    return result

data = np.array([1, -2, 3, -4])
clean = zero_negative(data)
print(data)  # [1, 0, 3, 0]  β€” original mutated!

Correction:

Explicitly copy when you intend to leave the original unchanged:

def zero_negative(x: np.ndarray) -> np.ndarray:
    result = x.copy()
    result[result < 0] = 0
    return result

Or use a pure functional approach: np.where(x < 0, 0, x).

Example: Pure vs Impure Data Transformation Pipeline

Write a data preprocessing pipeline for simulation results. Compare a pure functional approach with a stateful approach. The pipeline should:

  1. Remove NaN values
  2. Apply log-scale transformation
  3. Normalize to zero mean and unit variance

Example: Designing a Flexible API with *args and **kwargs

Design a plot_results function that accepts multiple result arrays and forwards styling options to Matplotlib, following the NumPy docstring convention.

Example: Keyword-Only Arguments for Safety

Write a resample_signal function where critical parameters like target_rate and method must be passed as keyword arguments to prevent positional errors.

Pure Function Examples for Scientific Computing

python
Complete examples of pure vs impure functions for signal processing and data transformation, with unit tests.
# Code from: ch02/python/pure_functions.py
# Load from backend supplements endpoint

NumPy-Style Docstring Templates

python
Template functions with complete NumPy-style docstrings for common scientific computing patterns.
# Code from: ch02/python/docstring_examples.py
# Load from backend supplements endpoint

Quick Check

Which of the following functions is pure?

def f(x):
    print(f"Processing {x}")
    return x ** 2
def f(x):
    return x ** 2
def f(x):
    x.append(0)
    return sum(x)
counter = 0
def f(x):
    global counter
    counter += 1
    return x ** 2

Quick Check

What does the following code print on the third call?

def add_item(item, items=[]):
    items.append(item)
    return items

add_item('a')
add_item('b')
print(add_item('c'))

['c']

['a', 'b', 'c']

['b', 'c']

Raises TypeError

Function Dispatch Flow Visualization

Visualize how Python dispatches function calls: argument binding, default value resolution, *args/**kwargs packing, and return. Adjust parameters to see how different calling conventions work.

Parameters

pure function

A function with no side effects that returns the same output for the same inputs. Pure functions are deterministic and do not modify external state.

Related: side effect

side effect

Any observable change a function makes beyond returning a value: printing, writing files, modifying global variables, mutating arguments, or making network requests.

Related: pure function

first-class object

An entity that can be assigned to variables, passed as arguments, returned from functions, and stored in data structures. In Python, functions, classes, and modules are all first-class objects.

Historical Note: From Lambda Calculus to Python Functions

1930s-1950s

The idea of functions as first-class objects traces back to Alonzo Church's lambda calculus (1930s), a formal system for expressing computation through function abstraction and application. Python's lambda keyword is a direct nod to this heritage. Lisp (1958) was the first programming language to treat functions as first-class citizens, and this idea eventually made its way into Python through the influence of functional programming languages.

Key Takeaway

Design functions for composability. Prefer pure functions with explicit inputs and outputs. Use None for mutable defaults, keyword-only arguments (*) for safety-critical parameters, and NumPy-style docstrings for documentation. Reserve side effects for I/O boundaries.