The Python Data Model
Why the Data Model Matters for Scientific Computing
NumPy arrays support A @ B for matrix multiplication, A[i:j] for
slicing, and len(A) for shape queries β none of this is magic. These
operations work because NumPy implements specific dunder methods
(__matmul__, __getitem__, __len__) defined by Python's data model.
Understanding the data model lets you read NumPy and PyTorch source code, write custom classes that integrate seamlessly with scientific libraries, and debug unexpected behavior when operator overloading goes wrong.
Definition: The Python Data Model
The Python Data Model
The Python data model (sometimes called the "object model") is the set of interfaces that objects can implement to interact with the most fundamental features of the language. These interfaces are defined by special methods (dunder methods) such as:
__init__(self, ...)β object initialization (constructor)__repr__(self)β unambiguous string representation (for debugging)__str__(self)β human-readable string representation__len__(self)β called bylen(obj)__getitem__(self, key)β called byobj[key]__iter__(self)β makes an object iterable
Every operator, built-in function, and language construct in Python dispatches to a corresponding dunder method.
Definition: Dunder Methods for Operator Overloading
Dunder Methods for Operator Overloading
Python maps operators to dunder methods:
| Operator | Method | Example |
|---|---|---|
+ |
__add__ |
a + b calls a.__add__(b) |
* |
__mul__ |
a * b calls a.__mul__(b) |
@ |
__matmul__ |
A @ B calls A.__matmul__(B) |
[] |
__getitem__ |
a[i] calls a.__getitem__(i) |
== |
__eq__ |
a == b calls a.__eq__(b) |
< |
__lt__ |
a < b calls a.__lt__(b) |
in |
__contains__ |
x in a calls a.__contains__(x) |
The @ operator (PEP 465, Python 3.5+) was added specifically for
matrix multiplication and is used by NumPy (np.ndarray.__matmul__)
and PyTorch (torch.Tensor.__matmul__).
dunder method
A method with double-underscore prefix and suffix (e.g., __init__,
__repr__). These methods implement Python's object protocols and
are called implicitly by the interpreter.
Related: protocol
protocol
An informal interface defined by a set of dunder methods. For
example, the "iterable protocol" requires __iter__ and __next__.
Objects that implement the right dunder methods are said to
"satisfy" the protocol.
Related: dunder method
Example: A Scientific Vector Class with Dunder Methods
Build a Vector class that supports +, * (scalar), @ (dot product),
len(), indexing, and a clean repr. This mimics how NumPy arrays work.
Define the class with core dunders
class Vector:
"""A simple vector class demonstrating the data model."""
def __init__(self, components: list[float]) -> None:
self._data = list(components)
def __repr__(self) -> str:
return f"Vector({self._data})"
def __len__(self) -> int:
return len(self._data)
def __getitem__(self, index: int) -> float:
return self._data[index]
Add arithmetic operators
def __add__(self, other: 'Vector') -> 'Vector':
if len(self) != len(other):
raise ValueError(f"Dimension mismatch: {len(self)} vs {len(other)}")
return Vector([a + b for a, b in zip(self._data, other._data)])
def __mul__(self, scalar: float) -> 'Vector':
return Vector([x * scalar for x in self._data])
def __rmul__(self, scalar: float) -> 'Vector':
return self.__mul__(scalar)
def __matmul__(self, other: 'Vector') -> float:
"""Dot product via @ operator."""
return sum(a * b for a, b in zip(self._data, other._data))
Use it
v = Vector([1.0, 2.0, 3.0])
w = Vector([4.0, 5.0, 6.0])
print(v + w) # Vector([5.0, 7.0, 9.0])
print(2.0 * v) # Vector([2.0, 4.0, 6.0]) β uses __rmul__
print(v @ w) # 32.0 β dot product
print(len(v)) # 3
print(v[1]) # 2.0
This is exactly how NumPy works under the hood, except with C-level arrays instead of Python lists.
Python Data Model Demo
# Code from: ch01/python/data_model_demo.py
# Load from backend supplements endpointHistorical Note: The @ Operator: PEP 465
2015Before Python 3.5 (2015), there was no dedicated matrix multiplication
operator. NumPy users had to write np.dot(A, B) or A.dot(B),
which made chained operations like
unreadable. PEP 465, championed by Nathaniel Smith, introduced @
specifically for scientific computing. Today A @ B is the standard
way to write matrix multiplication in Python.
Common Mistake: Confusing repr and str
Mistake:
Implementing only __str__ and wondering why debugging is hard,
or implementing __repr__ to return something that looks "pretty"
but is ambiguous.
Correction:
__repr__ should return an unambiguous string that ideally could
recreate the object: Vector([1.0, 2.0]). __str__ is for
human-friendly display. If only one is implemented, implement
__repr__ β Python falls back to it when __str__ is missing,
but not vice versa.
Python Object Protocol Map
Explore how Python operators map to dunder methods. Select an
operator category to see which methods are called and in what order
(including fallback to reflected methods like __radd__).
Parameters
Connection to NumPy and PyTorch
NumPy's ndarray implements over 50 dunder methods. When you write
A + B with NumPy arrays, Python calls A.__add__(B), which dispatches
to optimized C code for element-wise addition. PyTorch tensors work
identically. The key protocols are:
__array__β allowsnp.array(obj)to convert custom objects__array_ufunc__β lets custom classes intercept NumPy ufuncs__cuda_array_interface__β enables CuPy interoperability
We will use these protocols in Chapters 3, 5, and 11.
Quick Check
What does v @ w call when v is a custom class instance?
v.__matmul__(w)
v.__mul__(w)
v.dot(w)
matmul(v, w)
v.__matmul__(w)The @ operator dispatches to the __matmul__ dunder method on the left operand.
Key Takeaway
Everything in Python is an object, and every operation dispatches to a dunder method. Understanding this mechanism is the key to reading and writing code that integrates with NumPy, PyTorch, and other scientific libraries.