Function Design for Scientific Code

Why Function Design Matters in Scientific Computing

Scientific code is often written as a long script: load data, transform, compute, plot. This works for one-off explorations but falls apart when you need to:

Re-run with different parameters (e.g., sweep over regularization strengths)
Test individual components (unit tests on a 500-line script?)
Share code with collaborators who use different data

Well-designed functions are the first step toward reusable, testable, composable scientific software. This section covers the patterns that make functions robust in production research code.

Definition:
Pure Function

A pure function is a function that:

Always returns the same output for the same input (deterministic)
Has no side effects — it does not modify external state, write to files, print to stdout, or mutate its arguments

# Pure: depends only on inputs, no side effects
def normalize(x: np.ndarray) -> np.ndarray:
    return (x - x.mean()) / x.std()

# Impure: modifies the input array in place
def normalize_inplace(x: np.ndarray) -> None:
    x -= x.mean()
    x /= x.std()

Pure functions are easier to test, parallelize, and reason about. In scientific code, prefer pure functions for data transformations and reserve side effects for I/O boundaries.

Definition:
First-Class Functions

In Python, functions are first-class objects: they can be assigned to variables, stored in data structures, passed as arguments, and returned from other functions.

def square(x):
    return x ** 2

# Assign to a variable
f = square
print(f(5))  # 25

# Store in a list
transforms = [square, np.sqrt, np.log]

# Pass as argument
results = list(map(square, [1, 2, 3, 4]))

This property is the foundation for closures, decorators, and higher-order functions — the main topics of this chapter.

Definition:
Function Signature and Type Hints

A function signature specifies the function's name, parameters, default values, and return type. In scientific Python, type hints serve as executable documentation:

def compute_ber(
    tx_bits: np.ndarray,
    rx_bits: np.ndarray,
    *,                          # force keyword-only after this
    ignore_prefix: int = 0,
) -> float:
    """Compute bit error rate between transmitted and received bits."""
    tx = tx_bits[ignore_prefix:]
    rx = rx_bits[ignore_prefix:]
    return np.mean(tx != rx)

The * forces ignore_prefix to be keyword-only, preventing positional errors like compute_ber(tx, rx, 100).

Definition:
NumPy-Style Docstrings

The NumPy docstring convention is the standard for scientific Python projects. It provides structured sections for parameters, returns, examples, and references:

def add_awgn(
    signal: np.ndarray,
    snr_db: float,
) -> np.ndarray:
    """Add white Gaussian noise to a signal.

    Parameters
    ----------
    signal : np.ndarray
        Input signal (1-D complex or real).
    snr_db : float
        Signal-to-noise ratio in decibels.

    Returns
    -------
    np.ndarray
        Noisy signal with same shape as input.

    Examples
    --------
    >>> rng = np.random.default_rng(42)
    >>> x = np.ones(1000)
    >>> y = add_awgn(x, snr_db=10.0)
    >>> 10 * np.log10(np.mean(x**2) / np.mean((y - x)**2))
    10.0  # approximately
    """
    ...

Tools like Sphinx, pdoc, and VS Code IntelliSense parse these docstrings automatically.

Definition:
Variadic Arguments: *args and **kwargs

Python supports variadic arguments for flexible function signatures:

*args collects extra positional arguments into a tuple
**kwargs collects extra keyword arguments into a dict

def run_experiment(name: str, *configs, **overrides):
    """Run an experiment with multiple configs and parameter overrides."""
    for config in configs:
        params = {**config, **overrides}
        print(f"{name}: {params}")

run_experiment(
    "BER Sweep",
    {"modulation": "QPSK", "snr": 10},
    {"modulation": "16QAM", "snr": 10},
    num_trials=1000,
)

In scientific code, **kwargs is commonly used to forward parameters to plotting functions or solver backends.

Common Mistake: The Mutable Default Argument Trap

Mistake:

Using a mutable object (list, dict, set) as a default argument:

def collect_results(result, cache=[]):   # BUG!
    cache.append(result)
    return cache

print(collect_results(1))  # [1]
print(collect_results(2))  # [1, 2]  — cache persists!

The default [] is created once at function definition time and shared across all calls.

Correction:

Use None as the default and create a new object inside the function:

def collect_results(result, cache=None):
    if cache is None:
        cache = []
    cache.append(result)
    return cache

This is the standard Python idiom. NumPy, SciPy, and scikit-learn all use this pattern extensively.

Common Mistake: Accidental Array Mutation via Views

Mistake:

NumPy slices return views, not copies. Modifying a slice inside a function mutates the caller's array:

def zero_negative(x: np.ndarray) -> np.ndarray:
    result = x          # This is NOT a copy!
    result[result < 0] = 0
    return result

data = np.array([1, -2, 3, -4])
clean = zero_negative(data)
print(data)  # [1, 0, 3, 0]  — original mutated!

Correction:

Explicitly copy when you intend to leave the original unchanged:

def zero_negative(x: np.ndarray) -> np.ndarray:
    result = x.copy()
    result[result < 0] = 0
    return result

Or use a pure functional approach: np.where(x < 0, 0, x).

Example: Pure vs Impure Data Transformation Pipeline

Write a data preprocessing pipeline for simulation results. Compare a pure functional approach with a stateful approach. The pipeline should:

Remove NaN values
Apply log-scale transformation
Normalize to zero mean and unit variance

Solution

Impure approach (mutates state)

class DataPipeline:
    def __init__(self, data):
        self.data = data    # shared mutable state

    def remove_nans(self):
        self.data = self.data[~np.isnan(self.data)]

    def log_transform(self):
        self.data = np.log(self.data + 1e-10)

    def normalize(self):
        self.data = (self.data - self.data.mean()) / self.data.std()

# Problem: order matters, state is implicit, hard to test individually
pipe = DataPipeline(raw_data)
pipe.remove_nans()
pipe.log_transform()
pipe.normalize()
result = pipe.data

Pure approach (composable functions)

def remove_nans(x: np.ndarray) -> np.ndarray:
    return x[~np.isnan(x)]

def log_transform(x: np.ndarray, eps: float = 1e-10) -> np.ndarray:
    return np.log(x + eps)

def normalize(x: np.ndarray) -> np.ndarray:
    return (x - x.mean()) / x.std()

# Each function is independently testable
result = normalize(log_transform(remove_nans(raw_data)))

# Or compose them:
from functools import reduce
pipeline = [remove_nans, log_transform, normalize]
result = reduce(lambda data, fn: fn(data), pipeline, raw_data)

Why pure wins

Each function can be unit tested in isolation
Functions can be reordered or swapped easily
No hidden coupling through shared state
Easy to parallelize (no race conditions on shared data)

Example: Designing a Flexible API with *args and **kwargs

Design a plot_results function that accepts multiple result arrays and forwards styling options to Matplotlib, following the NumPy docstring convention.

Solution

Implementation with forwarded kwargs

import matplotlib.pyplot as plt
import numpy as np

def plot_results(
    *datasets: np.ndarray,
    labels: list[str] | None = None,
    title: str = "Simulation Results",
    xlabel: str = "Index",
    ylabel: str = "Value",
    **plot_kwargs,
) -> plt.Figure:
    """Plot one or more datasets on the same axes.

    Parameters
    ----------
    *datasets : np.ndarray
        One or more 1-D arrays to plot.
    labels : list[str] or None
        Legend labels for each dataset.
    title, xlabel, ylabel : str
        Axis labels and title.
    **plot_kwargs
        Forwarded to ``ax.plot()`` (e.g., linewidth, marker).

    Returns
    -------
    plt.Figure
        The created figure.
    """
    fig, ax = plt.subplots()
    for i, data in enumerate(datasets):
        label = labels[i] if labels else f"Dataset {i}"
        ax.plot(data, label=label, **plot_kwargs)
    ax.set(title=title, xlabel=xlabel, ylabel=ylabel)
    ax.legend()
    return fig

Usage

snr_values = np.linspace(0, 20, 50)
ber_zf = np.exp(-snr_values / 5)
ber_mmse = np.exp(-snr_values / 3)

fig = plot_results(
    ber_zf, ber_mmse,
    labels=["ZF", "MMSE"],
    title="BER Comparison",
    xlabel="SNR (dB)",
    ylabel="BER",
    linewidth=2,
    marker='o',
    markersize=3,
)

Example: Keyword-Only Arguments for Safety

Write a resample_signal function where critical parameters like target_rate and method must be passed as keyword arguments to prevent positional errors.

Solution

Using the bare * separator

def resample_signal(
    signal: np.ndarray,
    original_rate: float,
    *,                           # everything after is keyword-only
    target_rate: float,
    method: str = "linear",
    preserve_energy: bool = True,
) -> np.ndarray:
    """Resample a signal to a new sample rate.

    Parameters
    ----------
    signal : np.ndarray
        Input signal.
    original_rate : float
        Original sample rate in Hz.
    target_rate : float
        Desired sample rate in Hz (keyword-only).
    method : str
        Interpolation method: 'linear', 'cubic', 'nearest'.
    preserve_energy : bool
        If True, scale amplitude to preserve signal energy.

    Returns
    -------
    np.ndarray
        Resampled signal.
    """
    ratio = target_rate / original_rate
    n_new = int(len(signal) * ratio)
    x_old = np.linspace(0, 1, len(signal))
    x_new = np.linspace(0, 1, n_new)
    resampled = np.interp(x_new, x_old, signal)
    if preserve_energy:
        resampled *= np.sqrt(1 / ratio)
    return resampled

Why keyword-only matters

# This works — clear and unambiguous:
resample_signal(sig, 44100, target_rate=22050, method="cubic")

# This raises TypeError — prevents positional mistakes:
resample_signal(sig, 44100, 22050, "cubic")
# TypeError: resample_signal() takes 2 positional arguments
#            but 4 were given

Pure Function Examples for Scientific Computing

python

Complete examples of pure vs impure functions for signal processing and data transformation, with unit tests.

# Code from: ch02/python/pure_functions.py
# Load from backend supplements endpoint

NumPy-Style Docstring Templates

python

Template functions with complete NumPy-style docstrings for common scientific computing patterns.

# Code from: ch02/python/docstring_examples.py
# Load from backend supplements endpoint

Quick Check

Which of the following functions is pure?

def f(x):
    print(f"Processing {x}")
    return x ** 2

def f(x):
    return x ** 2

def f(x):
    x.append(0)
    return sum(x)

counter = 0
def f(x):
    global counter
    counter += 1
    return x ** 2

Correction:

python def f(x): return x ** 2

No side effects, deterministic output for any input.

Quick Check

What does the following code print on the third call?

def add_item(item, items=[]):
    items.append(item)
    return items

add_item('a')
add_item('b')
print(add_item('c'))

['c']

['a', 'b', 'c']

['b', 'c']

Raises TypeError

Correction:

['a', 'b', 'c']

The mutable default [] is created once and accumulates items across all calls.

Function Dispatch Flow Visualization

Visualize how Python dispatches function calls: argument binding, default value resolution, *args/**kwargs packing, and return. Adjust parameters to see how different calling conventions work.

Parameters

pure function

A function with no side effects that returns the same output for the same inputs. Pure functions are deterministic and do not modify external state.

Related: side effect

side effect

Any observable change a function makes beyond returning a value: printing, writing files, modifying global variables, mutating arguments, or making network requests.

Related: pure function

first-class object

An entity that can be assigned to variables, passed as arguments, returned from functions, and stored in data structures. In Python, functions, classes, and modules are all first-class objects.

Historical Note: From Lambda Calculus to Python Functions

1930s-1950s

The idea of functions as first-class objects traces back to Alonzo Church's lambda calculus (1930s), a formal system for expressing computation through function abstraction and application. Python's lambda keyword is a direct nod to this heritage. Lisp (1958) was the first programming language to treat functions as first-class citizens, and this idea eventually made its way into Python through the influence of functional programming languages.

Key Takeaway

Design functions for composability. Prefer pure functions with explicit inputs and outputs. Use None for mutable defaults, keyword-only arguments (*) for safety-critical parameters, and NumPy-style docstrings for documentation. Reserve side effects for I/O boundaries.

Prerequisites & Notation Closures and Higher-Order Functions

Function Design for Scientific Code

Why Function Design Matters in Scientific Computing

Definition: Pure Function

Definition: First-Class Functions

Definition: Function Signature and Type Hints

Definition: NumPy-Style Docstrings

Definition: Variadic Arguments: *args and **kwargs

Common Mistake: The Mutable Default Argument Trap

Common Mistake: Accidental Array Mutation via Views

Example: Pure vs Impure Data Transformation Pipeline

Impure approach (mutates state)

Pure approach (composable functions)

Why pure wins

Example: Designing a Flexible API with *args and **kwargs

Implementation with forwarded kwargs

Usage

Example: Keyword-Only Arguments for Safety

Using the bare * separator

Why keyword-only matters

Pure Function Examples for Scientific Computing

NumPy-Style Docstring Templates

Quick Check

Quick Check

Function Dispatch Flow Visualization

Parameters

pure function

side effect

first-class object

Historical Note: From Lambda Calculus to Python Functions

Key Takeaway

Definition:
Pure Function

Definition:
First-Class Functions

Definition:
Function Signature and Type Hints

Definition:
NumPy-Style Docstrings

Definition:
Variadic Arguments: *args and **kwargs