Indexing and Slicing — Advanced

Beyond Basic Slicing

Basic indexing (a[1:5], a[:, 0]) covers 80% of use cases. But scientific computing demands more: selecting scattered elements, masking invalid data, building open meshes for multi-dimensional lookups, and expressing tensor contractions concisely.

This section covers the advanced indexing tools that make NumPy a language for array manipulation, not just a container.

Definition:
Basic Indexing (Produces Views)

Basic indexing uses integers, slices (start:stop:step), or np.newaxis / ... (Ellipsis). It always returns a view.

a = np.arange(20).reshape(4, 5)
a[1, 3]          # scalar, integer indexing
a[1:3]           # rows 1-2, view
a[:, ::2]        # every other column, view
a[..., -1]       # last column, Ellipsis expands remaining axes
a[np.newaxis, :] # shape (1, 4, 5), adds axis

Key property: basic indexing never copies data.

Definition:
Fancy (Advanced) Indexing (Produces Copies)

Fancy indexing uses arrays of integers or booleans as indices. It always returns a copy.

a = np.arange(20).reshape(4, 5)

# Integer array indexing — select specific rows
rows = np.array([0, 2, 3])
print(a[rows])            # shape (3, 5), COPY

# Multi-dimensional fancy indexing
row_idx = np.array([0, 1, 3])
col_idx = np.array([2, 4, 0])
print(a[row_idx, col_idx])  # elements (0,2), (1,4), (3,0)

Fancy indexing can select arbitrary subsets, but beware: the result is always a new array.

Definition:
Boolean Indexing (Masking)

Boolean indexing selects elements where a boolean array is True. It always returns a copy (a 1-D array of selected elements).

a = np.array([1.0, -2.5, 3.7, -0.1, 4.2])
mask = a > 0
print(a[mask])           # [1.0, 3.7, 4.2]

# Common pattern: replace negative values with zero
a[a < 0] = 0.0           # modifies a in place via boolean assignment

Boolean indexing is the NumPy equivalent of SQL's WHERE clause.

Definition:
np.ix_ — Open Mesh Indexing

np.ix_ creates an open mesh from multiple 1-D index arrays, enabling selection of rectangular sub-arrays via fancy indexing:

a = np.arange(20).reshape(4, 5)
rows = [0, 2, 3]
cols = [1, 4]

# Without np.ix_: gets 3 elements (paired indexing)
# a[rows, cols]  -> ERROR: shape mismatch

# With np.ix_: gets 3x2 sub-matrix
sub = a[np.ix_(rows, cols)]
print(sub.shape)  # (3, 2)
# sub = [[1, 4], [11, 14], [16, 19]]

np.ix_ reshapes each index array to broadcast correctly: rows becomes shape (3, 1), cols becomes shape (1, 2).

Definition:
np.einsum — Einstein Summation

np.einsum expresses tensor operations using Einstein summation notation. Repeated indices are summed over; free indices define the output shape.

A = np.random.randn(3, 4)
B = np.random.randn(4, 5)

# Matrix multiply: C_ik = sum_j A_ij B_jk
C = np.einsum('ij,jk->ik', A, B)

# Trace: sum of diagonal
tr = np.einsum('ii->', np.eye(4))   # 4.0

# Batch matrix multiply: (batch, m, k) @ (batch, k, n) -> (batch, m, n)
X = np.random.randn(10, 3, 4)
Y = np.random.randn(10, 4, 5)
Z = np.einsum('bij,bjk->bik', X, Y)  # shape (10, 3, 5)

# Outer product
u = np.array([1, 2, 3])
v = np.array([4, 5])
outer = np.einsum('i,j->ij', u, v)  # shape (3, 2)

Theorem: Fancy Indexing Shape Rule

When indexing an array a with integer arrays idx0, idx1, ..., all index arrays must be broadcastable to a common shape, and the output shape is that common broadcast shape (plus any non-indexed axes from a).

Think of fancy indexing as a mapping: for each position in the output, the index arrays tell NumPy which element of a to fetch. The index arrays are broadcast together to define the mapping grid.

Theorem: Einstein Summation Rules

In np.einsum('subscripts', *operands):

Each operand gets a subscript string of single-letter indices
Indices appearing in two or more operands but not in the output are summed over (contracted)
Indices appearing in exactly one operand and in the output are free indices (define output axes)
The output shape is determined by the free indices in the order they appear in the output subscript

Repeated index = contraction (generalized dot product). Free index = that axis survives in the output. This notation can express any linear algebra operation.

Example: Masking Invalid Data

Given a signal array with some NaN values, compute the mean of valid samples using boolean indexing.

Solution

Create masked computation

signal = np.array([1.2, np.nan, 3.4, np.nan, 5.6, 7.8])
valid_mask = ~np.isnan(signal)
valid_mean = signal[valid_mask].mean()
print(f"Mean of valid samples: {valid_mean:.2f}")  # 4.50

# Equivalent using np.nanmean:
print(f"np.nanmean: {np.nanmean(signal):.2f}")     # 4.50

Example: Extracting Sub-matrices with np.ix_

From a 6x6 correlation matrix, extract the 3x3 sub-matrix corresponding to variables 0, 2, and 5.

Solution

Use np.ix_ for rectangular selection

rng = np.random.default_rng(42)
X = rng.standard_normal((100, 6))
corr = np.corrcoef(X.T)   # 6x6 correlation matrix

indices = [0, 2, 5]
sub_corr = corr[np.ix_(indices, indices)]
print(sub_corr.shape)     # (3, 3)
print(sub_corr)
# A valid correlation sub-matrix (symmetric, ones on diagonal)

Example: Pairwise Distances with einsum

Compute the pairwise Euclidean distance matrix for $n$ points in $d$ dimensions using np.einsum.

Solution

Expand the squared distance formula

The squared distance between points $\mathbf{x}_i$ and $\mathbf{x}_j$ is:

$\|\mathbf{x}_i - \mathbf{x}_j\|^2 = \|\mathbf{x}_i\|^2 + \|\mathbf{x}_j\|^2 - 2 \mathbf{x}_i \cdot \mathbf{x}_j$

Implementation

X = rng.standard_normal((100, 3))  # 100 points in 3-D

# Squared norms using einsum
sq_norms = np.einsum('ij,ij->i', X, X)  # shape (100,)

# Cross terms
cross = np.einsum('ij,kj->ik', X, X)    # shape (100, 100)

# Pairwise squared distances
D2 = sq_norms[:, None] + sq_norms[None, :] - 2 * cross
D = np.sqrt(np.maximum(D2, 0))  # clamp for numerical safety
print(D.shape)  # (100, 100)

Einsum Operation Explorer

Visualize different einsum operations: see the input shapes, subscript notation, contracted indices, and output shape.

Parameters

Quick Check

What does a[np.array([True, False, True, False])] return for a = np.array([10, 20, 30, 40])?

np.array([10, 30])

np.array([20, 40])

np.array([10, 20, 30, 40])

A view of a

Correction:

np.array([10, 30])

Boolean indexing selects elements where True.

Quick Check

What does np.einsum('ij,ij->', A, B) compute for matrices A and B of the same shape?

The sum of element-wise products (Frobenius inner product)

Matrix multiplication A @ B

Element-wise product (Hadamard)

The trace of A @ B

Correction:

The sum of element-wise products (Frobenius inner product)

Repeated indices i,j are summed; no free indices means scalar output.

Common Mistake: Fancy Index Shape Mismatch

Mistake:

Using two index arrays of different shapes expecting paired indexing:

a = np.arange(20).reshape(4, 5)
rows = [0, 2, 3]
cols = [1, 4]
a[rows, cols]  # ValueError: shape mismatch

Correction:

Use np.ix_ for rectangular sub-array selection, or ensure index arrays have the same length for paired element selection:

a[np.ix_(rows, cols)]  # 3x2 sub-array

fancy indexing

Indexing an array with integer or boolean arrays. Always produces a copy.

einsum

Einstein summation: a compact notation for expressing tensor contractions, dot products, outer products, and other linear algebra operations.

Advanced Indexing

python

Fancy indexing, boolean masking, np.ix_, multi-dimensional indexing patterns.

# Code from: ch05/python/advanced_indexing.py
# Load from backend supplements endpoint

Einsum Cookbook

python

np.einsum recipes: matrix multiply, trace, batch operations, outer products.

# Code from: ch05/python/einsum_cookbook.py
# Load from backend supplements endpoint

Key Takeaway

Basic slicing returns views; fancy and boolean indexing return copies. Use np.ix_ for rectangular sub-array selection. np.einsum can express virtually any linear algebra operation in a single line.

ndarray Internals Broadcasting

Indexing and Slicing — Advanced

Beyond Basic Slicing

Definition: Basic Indexing (Produces Views)

Definition: Fancy (Advanced) Indexing (Produces Copies)

Definition: Boolean Indexing (Masking)

Definition: np.ix_ — Open Mesh Indexing

Definition: np.einsum — Einstein Summation

Theorem: Fancy Indexing Shape Rule

Theorem: Einstein Summation Rules

Example: Masking Invalid Data

Create masked computation

Example: Extracting Sub-matrices with np.ix_

Use np.ix_ for rectangular selection

Example: Pairwise Distances with einsum

Expand the squared distance formula

Implementation

Einsum Operation Explorer

Parameters

Quick Check

Quick Check

Common Mistake: Fancy Index Shape Mismatch

fancy indexing

einsum

Advanced Indexing

Einsum Cookbook

Key Takeaway

Definition:
Basic Indexing (Produces Views)

Definition:
Fancy (Advanced) Indexing (Produces Copies)

Definition:
Boolean Indexing (Masking)

Definition:
np.ix_ — Open Mesh Indexing

Definition:
np.einsum — Einstein Summation