ndarray Internals

Why Understanding ndarray Internals Matters

NumPy arrays feel like Python lists at first glance, but their power comes from a radically different memory model. Understanding how data is laid out in memory, what strides are, and when you get a view vs a copy is the difference between code that runs in milliseconds and code that takes minutes.

This section peels back the abstraction to show you the raw machinery inside every np.ndarray.

Definition:

The ndarray Object

An np.ndarray is a fixed-size, homogeneous, n-dimensional container for numerical data. Every ndarray consists of:

  1. A data buffer β€” a contiguous block of raw bytes in memory
  2. A dtype β€” specifies how to interpret each element (e.g., float64 = 8 bytes)
  3. A shape β€” tuple of axis lengths, e.g., (3, 4) means 3 rows, 4 columns
  4. Strides β€” tuple of byte steps to advance along each axis
import numpy as np
a = np.array([[1.0, 2.0, 3.0],
              [4.0, 5.0, 6.0]], dtype=np.float64)
print(a.shape)    # (2, 3)
print(a.dtype)    # float64
print(a.strides)  # (24, 8) β€” 24 bytes to next row, 8 bytes to next column
print(a.nbytes)   # 48 β€” total bytes = 2 * 3 * 8

Definition:

C-Contiguous vs Fortran-Contiguous

C-contiguous (row-major): elements in each row are stored consecutively. The last index changes fastest. This is NumPy's default.

Fortran-contiguous (column-major): elements in each column are stored consecutively. The first index changes fastest. Used by MATLAB and Fortran.

c = np.array([[1, 2, 3], [4, 5, 6]], order='C')
f = np.array([[1, 2, 3], [4, 5, 6]], order='F')

print(c.strides)  # (24, 8)  β€” row-major
print(f.strides)  # (8, 16)  β€” column-major

print(c.flags['C_CONTIGUOUS'])  # True
print(f.flags['F_CONTIGUOUS'])  # True

The memory layout affects cache performance: iterate along contiguous axes for speed.

Definition:

Strides

Strides is a tuple of integers telling NumPy how many bytes to skip in memory to advance one position along each axis.

For a 2-D array with shape (m, n) and dtype float64 (8 bytes):

  • C-contiguous: strides = (n * 8, 8) β€” skip an entire row to go down, skip 8 bytes to go right
  • F-contiguous: strides = (8, m * 8) β€” skip 8 bytes to go down, skip an entire column to go right

Strides enable views without copying: slicing just changes the strides and the starting pointer.

a = np.arange(12, dtype=np.float64).reshape(3, 4)
print(a.strides)       # (32, 8)
print(a[::2].strides)  # (64, 8) β€” every other row, stride doubled
print(a[:, ::2].strides)  # (32, 16) β€” every other column

Definition:

View vs Copy

A view shares the same data buffer as the original array. Modifying the view modifies the original. A copy allocates a new data buffer.

Rules of thumb:

Operation Result
Basic slicing: a[1:3], a[::2] View
Transpose: a.T View
Reshape (when possible): a.reshape(...) View
Fancy indexing: a[[0, 2, 4]] Copy
Boolean indexing: a[mask] Copy
a.copy() Copy

Use np.shares_memory(a, b) to check at runtime:

a = np.arange(10)
b = a[::2]           # view
c = a[[0, 2, 4]]     # copy

print(np.shares_memory(a, b))  # True
print(np.shares_memory(a, c))  # False

Definition:

Data Types (dtype)

Every ndarray element has the same dtype. Common scientific dtypes:

dtype Size Range / Precision
float32 4 B ~7 decimal digits, Β±3.4Γ—1038\pm 3.4 \times 10^{38}
float64 8 B ~16 decimal digits, Β±1.8Γ—10308\pm 1.8 \times 10^{308}
complex64 8 B Two float32 (real + imag)
complex128 16 B Two float64 (real + imag)
int32 4 B βˆ’231-2^{31} to 231βˆ’12^{31}-1
int64 8 B βˆ’263-2^{63} to 263βˆ’12^{63}-1
bool_ 1 B True / False
a = np.array([1.0, 2.0, 3.0])       # default float64
b = a.astype(np.float32)             # downcast: saves memory
c = np.array([1+2j, 3+4j])           # complex128
print(c.real.dtype)                   # float64

Rule: use float64 unless memory or GPU constraints force float32.

Theorem: Stride Computation Formula

For a C-contiguous array with shape (d0,d1,…,dnβˆ’1)(d_0, d_1, \ldots, d_{n-1}) and element size ss bytes, the stride along axis kk is:

sk=sβ‹…βˆj=k+1nβˆ’1djs_k = s \cdot \prod_{j=k+1}^{n-1} d_j

In particular, the last axis always has stride ss, and the first axis has the largest stride.

To move one step along axis kk, you must skip over all elements in the remaining axes β€” that is, dk+1Γ—dk+2Γ—β‹―Γ—dnβˆ’1d_{k+1} \times d_{k+2} \times \cdots \times d_{n-1} elements, each of size ss bytes.

Theorem: View vs Copy Rule

A NumPy operation produces a view if and only if the result can be described by the same data buffer with a different combination of (offset, shape, strides). Otherwise, it must produce a copy.

Basic slicing with start:stop:step can always be expressed as new strides (multiply by step) and new shape (number of selected elements). Fancy indexing with an arbitrary list of indices cannot be expressed as a regular stride pattern, so NumPy must allocate new memory.

Theorem: Row-Major Iteration is Cache-Friendly

For a C-contiguous array, iterating over the last axis first (innermost loop) accesses memory sequentially and maximizes CPU cache utilization. Iterating over the first axis first causes cache misses.

Modern CPUs load data in cache lines (typically 64 bytes = 8 float64s). Sequential access loads a full cache line and uses every element. Strided access may load a cache line but use only one element before evicting it.

Example: Strides Under Slicing

Given a = np.arange(24, dtype=np.float64).reshape(4, 6), compute the strides and shape of b = a[::2, 1::3] without running the code.

Example: Detecting Views with np.shares_memory

Determine which of the following operations produce views and which produce copies for a = np.arange(10):

  1. b = a[2:7]
  2. c = a[[1, 3, 5]]
  3. d = a.reshape(2, 5)
  4. e = a[a > 3]

Example: C vs Fortran Order Performance

Create a large matrix in both C and Fortran order. Sum along rows (axis 1) and along columns (axis 0). Which is faster for each order?

Memory Layout Explorer

Visualize how ndarray elements are stored in memory for different shapes, dtypes, and memory orders. See strides, contiguity flags, and the raw byte layout.

Parameters

ndarray Memory Layout

ndarray Memory Layout
C-contiguous (row-major) vs Fortran-contiguous (column-major) memory layout for a 3x4 array. Colors indicate which elements are stored consecutively in memory.

Historical Note: From Numeric to NumPy

2005

NumPy traces its lineage to Numeric (1995), one of the first array libraries for Python, created by Jim Hugunin. A competing library, Numarray, offered better support for large arrays. In 2005, Travis Oliphant unified both into NumPy, which became the foundation of the entire scientific Python ecosystem. The C-contiguous default was inherited from Numeric's C implementation.

Historical Note: Row-Major vs Column-Major: A Language War

1950s-1970s

C and its descendants (C++, Python/NumPy) use row-major order. Fortran, MATLAB, R, and Julia use column-major order. This split dates back to the 1950s: Fortran's array layout was optimized for the IBM 704's memory architecture. When Dennis Ritchie designed C in the 1970s, he chose the opposite convention for consistency with pointer arithmetic and multi-dimensional array decay.

Quick Check

For a C-contiguous float64 array with shape (3, 5), what are the strides in bytes?

(40, 8)

(8, 40)

(24, 8)

(8, 8)

Common Mistake: Accidental Mutation Through Views

Mistake:

Modifying a slice without realizing it is a view, which silently changes the original array:

a = np.arange(10)
b = a[3:7]
b[:] = 0       # a is now [0, 1, 2, 0, 0, 0, 0, 7, 8, 9]!

Correction:

Use .copy() explicitly when you need an independent array:

b = a[3:7].copy()
b[:] = 0       # a is unchanged

Common Mistake: Assuming Fancy Indexing Creates a View

Mistake:

Assigning through fancy indexing and expecting the original to update in a view-like manner:

a = np.arange(10)
b = a[[1, 3, 5]]     # COPY, not view
b[0] = 99             # a is unchanged!

Correction:

To modify elements via fancy indexing, assign directly to the original:

a[[1, 3, 5]] = [99, 99, 99]   # modifies a directly

ndarray

NumPy's core data structure: a fixed-size, homogeneous, n-dimensional array with a contiguous memory buffer.

Related: Data Types (dtype), Strides, Shape

stride

The number of bytes to step in memory to advance one position along a given axis of an ndarray.

Related: view, Contiguity and Stride Formula

view

An ndarray that shares its data buffer with another array. Modifications to the view are reflected in the original.

Related: Zero-Copy NumPy <-> PyTorch Conversion, Shares Memory

contiguous

An array whose elements are stored in a single unbroken block of memory, without gaps or out-of-order elements.

Related: C Contiguous, Fortran Contiguous

ndarray Internals

python
Hands-on exploration of memory layout, strides, views vs copies, and np.shares_memory.
# Code from: ch05/python/ndarray_internals.py
# Load from backend supplements endpoint

Why This Matters: GPU Arrays Use the Same Memory Model

CuPy and PyTorch tensors adopt the same concepts: contiguous memory buffers, strides, dtypes, and views. Understanding ndarray internals transfers directly to GPU programming. In Chapter 12, you will use tensor.is_contiguous() and tensor.stride() β€” the same mental model applies.

See full treatment in Chapter 12

Key Takeaway

Basic slicing (a[::2], a[1:5]) creates views; fancy indexing (a[[0,2,4]]) and boolean indexing (a[mask]) create copies. Always use np.shares_memory() when in doubt.