The Python Data Model

Why the Data Model Matters for Scientific Computing

NumPy arrays support A @ B for matrix multiplication, A[i:j] for slicing, and len(A) for shape queries β€” none of this is magic. These operations work because NumPy implements specific dunder methods (__matmul__, __getitem__, __len__) defined by Python's data model.

Understanding the data model lets you read NumPy and PyTorch source code, write custom classes that integrate seamlessly with scientific libraries, and debug unexpected behavior when operator overloading goes wrong.

Definition:

The Python Data Model

The Python data model (sometimes called the "object model") is the set of interfaces that objects can implement to interact with the most fundamental features of the language. These interfaces are defined by special methods (dunder methods) such as:

  • __init__(self, ...) β€” object initialization (constructor)
  • __repr__(self) β€” unambiguous string representation (for debugging)
  • __str__(self) β€” human-readable string representation
  • __len__(self) β€” called by len(obj)
  • __getitem__(self, key) β€” called by obj[key]
  • __iter__(self) β€” makes an object iterable

Every operator, built-in function, and language construct in Python dispatches to a corresponding dunder method.

Definition:

Dunder Methods for Operator Overloading

Python maps operators to dunder methods:

Operator Method Example
+ __add__ a + b calls a.__add__(b)
* __mul__ a * b calls a.__mul__(b)
@ __matmul__ A @ B calls A.__matmul__(B)
[] __getitem__ a[i] calls a.__getitem__(i)
== __eq__ a == b calls a.__eq__(b)
< __lt__ a < b calls a.__lt__(b)
in __contains__ x in a calls a.__contains__(x)

The @ operator (PEP 465, Python 3.5+) was added specifically for matrix multiplication and is used by NumPy (np.ndarray.__matmul__) and PyTorch (torch.Tensor.__matmul__).

dunder method

A method with double-underscore prefix and suffix (e.g., __init__, __repr__). These methods implement Python's object protocols and are called implicitly by the interpreter.

Related: protocol

protocol

An informal interface defined by a set of dunder methods. For example, the "iterable protocol" requires __iter__ and __next__. Objects that implement the right dunder methods are said to "satisfy" the protocol.

Related: dunder method

Example: A Scientific Vector Class with Dunder Methods

Build a Vector class that supports +, * (scalar), @ (dot product), len(), indexing, and a clean repr. This mimics how NumPy arrays work.

Python Data Model Demo

python
A complete Vector class demonstrating all major dunder methods, including comparison operators, iteration, and the `@` operator.
# Code from: ch01/python/data_model_demo.py
# Load from backend supplements endpoint

Historical Note: The @ Operator: PEP 465

2015

Before Python 3.5 (2015), there was no dedicated matrix multiplication operator. NumPy users had to write np.dot(A, B) or A.dot(B), which made chained operations like (ATA)βˆ’1ATb(\mathbf{A}^T \mathbf{A})^{-1} \mathbf{A}^T \mathbf{b} unreadable. PEP 465, championed by Nathaniel Smith, introduced @ specifically for scientific computing. Today A @ B is the standard way to write matrix multiplication in Python.

Common Mistake: Confusing repr and str

Mistake:

Implementing only __str__ and wondering why debugging is hard, or implementing __repr__ to return something that looks "pretty" but is ambiguous.

Correction:

__repr__ should return an unambiguous string that ideally could recreate the object: Vector([1.0, 2.0]). __str__ is for human-friendly display. If only one is implemented, implement __repr__ β€” Python falls back to it when __str__ is missing, but not vice versa.

Python Object Protocol Map

Explore how Python operators map to dunder methods. Select an operator category to see which methods are called and in what order (including fallback to reflected methods like __radd__).

Parameters

Connection to NumPy and PyTorch

NumPy's ndarray implements over 50 dunder methods. When you write A + B with NumPy arrays, Python calls A.__add__(B), which dispatches to optimized C code for element-wise addition. PyTorch tensors work identically. The key protocols are:

  • __array__ β€” allows np.array(obj) to convert custom objects
  • __array_ufunc__ β€” lets custom classes intercept NumPy ufuncs
  • __cuda_array_interface__ β€” enables CuPy interoperability

We will use these protocols in Chapters 3, 5, and 11.

Quick Check

What does v @ w call when v is a custom class instance?

v.__matmul__(w)

v.__mul__(w)

v.dot(w)

matmul(v, w)

Key Takeaway

Everything in Python is an object, and every operation dispatches to a dunder method. Understanding this mechanism is the key to reading and writing code that integrates with NumPy, PyTorch, and other scientific libraries.