The array and array_ufunc Protocols
Making Custom Objects Play Nicely with NumPy
NumPy defines several protocols that let custom objects integrate seamlessly
with NumPy functions. When you call np.sin(my_object), NumPy doesn't need
to know about your class — it asks your object how to handle the operation
through well-defined dunder methods.
This is the foundation of libraries like CuPy, JAX, and Dask that provide NumPy-compatible interfaces without being NumPy.
Definition: The array Protocol
The array Protocol
The __array__ method tells NumPy how to convert your object to an ndarray.
NumPy calls this whenever it needs to operate on your object:
class PhysicalQuantity:
"""A value with units that converts to raw NumPy array."""
def __init__(self, value: np.ndarray, unit: str = "V"):
self._value = np.asarray(value)
self.unit = unit
def __array__(self, dtype=None, copy=None) -> np.ndarray:
if dtype is not None:
return self._value.astype(dtype)
return self._value
def __repr__(self) -> str:
return f"PhysicalQuantity({self._value}, unit='{self.unit}')"
Now np.array(PhysicalQuantity([1, 2, 3])) works, and so does
np.mean(PhysicalQuantity([1, 2, 3])).
Definition: The array_ufunc Protocol
The array_ufunc Protocol
__array_ufunc__ intercepts NumPy universal function (ufunc) calls on custom
objects. It gives your class full control over how operations like np.add,
np.multiply, and np.sin are handled:
class PhysicalQuantity:
# ... (previous code) ...
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
"""Intercept NumPy ufunc calls to preserve units."""
# Convert all PhysicalQuantity inputs to raw arrays
raw_inputs = []
units = set()
for inp in inputs:
if isinstance(inp, PhysicalQuantity):
raw_inputs.append(inp._value)
units.add(inp.unit)
else:
raw_inputs.append(inp)
# Apply the ufunc to raw arrays
result = getattr(ufunc, method)(*raw_inputs, **kwargs)
# Wrap result back in PhysicalQuantity
if len(units) == 1:
return PhysicalQuantity(result, unit=units.pop())
return result # mixed units: return raw
The signature is always (self, ufunc, method, *inputs, **kwargs) where
method is typically '__call__', 'reduce', 'accumulate', or 'outer'.
Definition: The array_function Protocol
The array_function Protocol
While __array_ufunc__ handles element-wise operations (ufuncs),
__array_function__ (NEP 18) intercepts all NumPy functions including
np.concatenate, np.linalg.norm, np.fft.fft, etc.:
import functools
HANDLED_FUNCTIONS = {}
def implements(np_function):
"""Register an implementation for a NumPy function."""
def decorator(func):
HANDLED_FUNCTIONS[np_function] = func
return func
return decorator
class PhysicalQuantity:
def __array_function__(self, func, types, args, kwargs):
if func not in HANDLED_FUNCTIONS:
return NotImplemented
return HANDLED_FUNCTIONS[func](*args, **kwargs)
@implements(np.concatenate)
def concatenate(arrays, axis=0):
raw = [np.asarray(a) for a in arrays]
units = {a.unit for a in arrays if isinstance(a, PhysicalQuantity)}
result = np.concatenate(raw, axis=axis)
if len(units) == 1:
return PhysicalQuantity(result, unit=units.pop())
return result
This is how libraries like pint and astropy.units provide transparent
NumPy integration.
Theorem: NumPy Ufunc Dispatch Priority
When a NumPy ufunc is called with mixed types (e.g., np.add(custom_obj, ndarray)),
NumPy checks __array_ufunc__ on each input in order. If an input returns
NotImplemented, NumPy tries the next input. The dispatch priority is:
- Subclasses of
ndarrayare checked before non-subclasses - Among non-subclasses, inputs are checked left to right
- If all inputs return
NotImplemented, NumPy raisesTypeError - Setting
__array_ufunc__ = Noneopts a class out entirely, forcingTypeErrorimmediately
This ensures that custom types can always override default NumPy behavior without monkey-patching.
Think of it like Python's __radd__ mechanism but generalized to all
ufuncs. The key design principle is that custom types always get a chance
to handle the operation before NumPy falls back to converting them to
arrays.
Example: Building a Unit-Aware Array Class
Create a PhysicalQuantity class that wraps a NumPy array with a unit
string. It should work transparently with NumPy ufuncs (addition, multiplication,
trigonometric functions) while preserving or computing the correct units.
Core class with __array__ and __array_ufunc__
import numpy as np
class PhysicalQuantity:
def __init__(self, value, unit="dimensionless"):
self._value = np.asarray(value, dtype=float)
self.unit = unit
def __array__(self, dtype=None, copy=None):
return self._value if dtype is None else self._value.astype(dtype)
def __repr__(self):
return f"PQ({self._value}, '{self.unit}')"
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
raw = [np.asarray(i) if isinstance(i, PhysicalQuantity) else i
for i in inputs]
units = [i.unit for i in inputs if isinstance(i, PhysicalQuantity)]
result = getattr(ufunc, method)(*raw, **kwargs)
# Determine output unit
if ufunc in (np.add, np.subtract):
if len(set(units)) > 1:
raise ValueError(f"Cannot {ufunc.__name__} different units: {units}")
return PhysicalQuantity(result, units[0])
elif ufunc in (np.multiply,):
new_unit = "*".join(units) if len(units) > 1 else units[0]
return PhysicalQuantity(result, new_unit)
elif ufunc in (np.sin, np.cos, np.exp, np.log):
return PhysicalQuantity(result, "dimensionless")
return result
Testing NumPy interop
voltage = PhysicalQuantity([1.0, 2.0, 3.0], "V")
current = PhysicalQuantity([0.1, 0.2, 0.3], "A")
# Addition (same units)
v_sum = np.add(voltage, PhysicalQuantity([0.5, 0.5, 0.5], "V"))
print(v_sum) # PQ([1.5, 2.5, 3.5], 'V')
# Multiplication (different units)
power = np.multiply(voltage, current)
print(power) # PQ([0.1, 0.4, 0.9], 'V*A')
# Mean (via ufunc reduce)
print(np.mean(voltage)) # Works via __array__ fallback
# Error on mismatched units
np.add(voltage, current) # ValueError: Cannot add different units
NumPy Interop Benchmark
Compare the performance overhead of custom array classes using array_ufunc vs. plain NumPy arrays. See how the overhead scales with array size and operation complexity.
Parameters
NumPy Protocol Dispatch Flow
Common Mistake: Forgetting copy Semantics in array
Mistake:
class MyArray:
def __array__(self, dtype=None, copy=None):
return self._data # Returns a reference, not a copy!
obj = MyArray(np.array([1, 2, 3]))
arr = np.array(obj)
arr[0] = 999 # Modifies obj._data too!
Correction:
class MyArray:
def __array__(self, dtype=None, copy=None):
if copy is False:
# NumPy 2.0+: caller explicitly wants no copy
return self._data.view()
return self._data.copy() # Safe: return a copy by default
NumPy 2.0 added the copy parameter to __array__. Handle it to support
both zero-copy views and safe copies.
The cuda_array_interface for GPU Interop
Just as __array__ enables NumPy interop, __cuda_array_interface__ enables
GPU array interop between CuPy, PyTorch, Numba CUDA, and other GPU libraries:
class GPUSignal:
"""Signal stored on GPU with CuPy interop."""
def __init__(self, data_gpu):
self._data = data_gpu # CuPy array
@property
def __cuda_array_interface__(self):
return self._data.__cuda_array_interface__
This protocol exposes the GPU memory pointer, shape, dtype, and strides, enabling zero-copy data exchange between GPU libraries.
Universal Function (ufunc)
A NumPy function that operates element-wise on arrays (e.g., np.add, np.sin, np.exp). Custom classes can intercept ufunc calls via __array_ufunc__.
Related: Array Protocol
Array Protocol
A set of dunder methods (__array__, __array_ufunc__, __array_function__) that let custom objects integrate with NumPy's dispatch system.
Related: Universal Function (ufunc)
Quick Check
What happens when you call np.add(custom_obj, np_array) and custom_obj.__array_ufunc__ returns NotImplemented?
NumPy raises TypeError immediately
NumPy falls back to calling array on custom_obj
NumPy checks np_array's array_ufunc next
The operation returns NotImplemented
NumPy checks each input's array_ufunc in order. If all return NotImplemented, it raises TypeError.
NumPy Interop Protocols
# Code from: ch03/python/numpy_interop.py
# Load from backend supplements endpointKey Takeaway
NumPy's __array__, __array_ufunc__, and __array_function__ protocols let
custom classes integrate seamlessly with the NumPy ecosystem. Implement
__array__ for basic conversion, __array_ufunc__ for element-wise operations,
and __array_function__ for full NumPy API coverage. The __cuda_array_interface__
extends this pattern to GPU interop.