The Python Data Model

Why the Data Model Matters for Scientific Computing

NumPy arrays support A @ B for matrix multiplication, A[i:j] for slicing, and len(A) for shape queries — none of this is magic. These operations work because NumPy implements specific dunder methods (__matmul__, __getitem__, __len__) defined by Python's data model.

Understanding the data model lets you read NumPy and PyTorch source code, write custom classes that integrate seamlessly with scientific libraries, and debug unexpected behavior when operator overloading goes wrong.

Definition:
The Python Data Model

The Python data model (sometimes called the "object model") is the set of interfaces that objects can implement to interact with the most fundamental features of the language. These interfaces are defined by special methods (dunder methods) such as:

__init__(self, ...) — object initialization (constructor)
__repr__(self) — unambiguous string representation (for debugging)
__str__(self) — human-readable string representation
__len__(self) — called by len(obj)
__getitem__(self, key) — called by obj[key]
__iter__(self) — makes an object iterable

Every operator, built-in function, and language construct in Python dispatches to a corresponding dunder method.

Definition:
Dunder Methods for Operator Overloading

Python maps operators to dunder methods:

Operator	Method	Example
`+`	`__add__`	`a + b` calls `a.__add__(b)`
`*`	`__mul__`	`a * b` calls `a.__mul__(b)`
`@`	`__matmul__`	`A @ B` calls `A.__matmul__(B)`
`[]`	`__getitem__`	`a[i]` calls `a.__getitem__(i)`
`==`	`__eq__`	`a == b` calls `a.__eq__(b)`
`<`	`__lt__`	`a < b` calls `a.__lt__(b)`
`in`	`__contains__`	`x in a` calls `a.__contains__(x)`

The @ operator (PEP 465, Python 3.5+) was added specifically for matrix multiplication and is used by NumPy (np.ndarray.__matmul__) and PyTorch (torch.Tensor.__matmul__).

dunder method

A method with double-underscore prefix and suffix (e.g., __init__, __repr__). These methods implement Python's object protocols and are called implicitly by the interpreter.

Related: protocol

protocol

An informal interface defined by a set of dunder methods. For example, the "iterable protocol" requires __iter__ and __next__. Objects that implement the right dunder methods are said to "satisfy" the protocol.

Related: dunder method

Example: A Scientific Vector Class with Dunder Methods

Build a Vector class that supports +, * (scalar), @ (dot product), len(), indexing, and a clean repr. This mimics how NumPy arrays work.

Solution

Define the class with core dunders

class Vector:
    """A simple vector class demonstrating the data model."""

    def __init__(self, components: list[float]) -> None:
        self._data = list(components)

    def __repr__(self) -> str:
        return f"Vector({self._data})"

    def __len__(self) -> int:
        return len(self._data)

    def __getitem__(self, index: int) -> float:
        return self._data[index]

Add arithmetic operators

    def __add__(self, other: 'Vector') -> 'Vector':
        if len(self) != len(other):
            raise ValueError(f"Dimension mismatch: {len(self)} vs {len(other)}")
        return Vector([a + b for a, b in zip(self._data, other._data)])

    def __mul__(self, scalar: float) -> 'Vector':
        return Vector([x * scalar for x in self._data])

    def __rmul__(self, scalar: float) -> 'Vector':
        return self.__mul__(scalar)

    def __matmul__(self, other: 'Vector') -> float:
        """Dot product via @ operator."""
        return sum(a * b for a, b in zip(self._data, other._data))

Use it

v = Vector([1.0, 2.0, 3.0])
w = Vector([4.0, 5.0, 6.0])
print(v + w)        # Vector([5.0, 7.0, 9.0])
print(2.0 * v)      # Vector([2.0, 4.0, 6.0]) — uses __rmul__
print(v @ w)         # 32.0 — dot product
print(len(v))        # 3
print(v[1])          # 2.0

This is exactly how NumPy works under the hood, except with C-level arrays instead of Python lists.

Python Data Model Demo

python

A complete Vector class demonstrating all major dunder methods, including comparison operators, iteration, and the `@` operator.

# Code from: ch01/python/data_model_demo.py
# Load from backend supplements endpoint

Historical Note: The @ Operator: PEP 465

2015

Before Python 3.5 (2015), there was no dedicated matrix multiplication operator. NumPy users had to write np.dot(A, B) or A.dot(B), which made chained operations like $(\mathbf{A}^T \mathbf{A})^{-1} \mathbf{A}^T \mathbf{b}$ unreadable. PEP 465, championed by Nathaniel Smith, introduced @ specifically for scientific computing. Today A @ B is the standard way to write matrix multiplication in Python.

Common Mistake: Confusing repr and str

Mistake:

Implementing only __str__ and wondering why debugging is hard, or implementing __repr__ to return something that looks "pretty" but is ambiguous.

Correction:

__repr__ should return an unambiguous string that ideally could recreate the object: Vector([1.0, 2.0]). __str__ is for human-friendly display. If only one is implemented, implement __repr__ — Python falls back to it when __str__ is missing, but not vice versa.

Python Object Protocol Map

Explore how Python operators map to dunder methods. Select an operator category to see which methods are called and in what order (including fallback to reflected methods like __radd__).

Parameters

Operator Category

Connection to NumPy and PyTorch

NumPy's ndarray implements over 50 dunder methods. When you write A + B with NumPy arrays, Python calls A.__add__(B), which dispatches to optimized C code for element-wise addition. PyTorch tensors work identically. The key protocols are:

__array__ — allows np.array(obj) to convert custom objects
__array_ufunc__ — lets custom classes intercept NumPy ufuncs
__cuda_array_interface__ — enables CuPy interoperability

We will use these protocols in Chapters 3, 5, and 11.

Quick Check

What does v @ w call when v is a custom class instance?

v.__matmul__(w)

v.__mul__(w)

v.dot(w)

matmul(v, w)