Indexing and Slicing β Advanced
Beyond Basic Slicing
Basic indexing (a[1:5], a[:, 0]) covers 80% of use cases. But
scientific computing demands more: selecting scattered elements,
masking invalid data, building open meshes for multi-dimensional
lookups, and expressing tensor contractions concisely.
This section covers the advanced indexing tools that make NumPy a language for array manipulation, not just a container.
Definition: Basic Indexing (Produces Views)
Basic Indexing (Produces Views)
Basic indexing uses integers, slices (start:stop:step), or
np.newaxis / ... (Ellipsis). It always returns a view.
a = np.arange(20).reshape(4, 5)
a[1, 3] # scalar, integer indexing
a[1:3] # rows 1-2, view
a[:, ::2] # every other column, view
a[..., -1] # last column, Ellipsis expands remaining axes
a[np.newaxis, :] # shape (1, 4, 5), adds axis
Key property: basic indexing never copies data.
Definition: Fancy (Advanced) Indexing (Produces Copies)
Fancy (Advanced) Indexing (Produces Copies)
Fancy indexing uses arrays of integers or booleans as indices. It always returns a copy.
a = np.arange(20).reshape(4, 5)
# Integer array indexing β select specific rows
rows = np.array([0, 2, 3])
print(a[rows]) # shape (3, 5), COPY
# Multi-dimensional fancy indexing
row_idx = np.array([0, 1, 3])
col_idx = np.array([2, 4, 0])
print(a[row_idx, col_idx]) # elements (0,2), (1,4), (3,0)
Fancy indexing can select arbitrary subsets, but beware: the result is always a new array.
Definition: Boolean Indexing (Masking)
Boolean Indexing (Masking)
Boolean indexing selects elements where a boolean array is True.
It always returns a copy (a 1-D array of selected elements).
a = np.array([1.0, -2.5, 3.7, -0.1, 4.2])
mask = a > 0
print(a[mask]) # [1.0, 3.7, 4.2]
# Common pattern: replace negative values with zero
a[a < 0] = 0.0 # modifies a in place via boolean assignment
Boolean indexing is the NumPy equivalent of SQL's WHERE clause.
Definition: np.ix_ β Open Mesh Indexing
np.ix_ β Open Mesh Indexing
np.ix_ creates an open mesh from multiple 1-D index arrays,
enabling selection of rectangular sub-arrays via fancy indexing:
a = np.arange(20).reshape(4, 5)
rows = [0, 2, 3]
cols = [1, 4]
# Without np.ix_: gets 3 elements (paired indexing)
# a[rows, cols] -> ERROR: shape mismatch
# With np.ix_: gets 3x2 sub-matrix
sub = a[np.ix_(rows, cols)]
print(sub.shape) # (3, 2)
# sub = [[1, 4], [11, 14], [16, 19]]
np.ix_ reshapes each index array to broadcast correctly:
rows becomes shape (3, 1), cols becomes shape (1, 2).
Definition: np.einsum β Einstein Summation
np.einsum β Einstein Summation
np.einsum expresses tensor operations using Einstein summation
notation. Repeated indices are summed over; free indices define
the output shape.
A = np.random.randn(3, 4)
B = np.random.randn(4, 5)
# Matrix multiply: C_ik = sum_j A_ij B_jk
C = np.einsum('ij,jk->ik', A, B)
# Trace: sum of diagonal
tr = np.einsum('ii->', np.eye(4)) # 4.0
# Batch matrix multiply: (batch, m, k) @ (batch, k, n) -> (batch, m, n)
X = np.random.randn(10, 3, 4)
Y = np.random.randn(10, 4, 5)
Z = np.einsum('bij,bjk->bik', X, Y) # shape (10, 3, 5)
# Outer product
u = np.array([1, 2, 3])
v = np.array([4, 5])
outer = np.einsum('i,j->ij', u, v) # shape (3, 2)
Theorem: Fancy Indexing Shape Rule
When indexing an array a with integer arrays idx0, idx1, ...,
all index arrays must be broadcastable to a common shape, and
the output shape is that common broadcast shape (plus any non-indexed
axes from a).
Think of fancy indexing as a mapping: for each position in the output,
the index arrays tell NumPy which element of a to fetch. The index
arrays are broadcast together to define the mapping grid.
Theorem: Einstein Summation Rules
In np.einsum('subscripts', *operands):
- Each operand gets a subscript string of single-letter indices
- Indices appearing in two or more operands but not in the output are summed over (contracted)
- Indices appearing in exactly one operand and in the output are free indices (define output axes)
- The output shape is determined by the free indices in the order they appear in the output subscript
Repeated index = contraction (generalized dot product). Free index = that axis survives in the output. This notation can express any linear algebra operation.
Example: Masking Invalid Data
Given a signal array with some NaN values, compute the mean of
valid samples using boolean indexing.
Create masked computation
signal = np.array([1.2, np.nan, 3.4, np.nan, 5.6, 7.8])
valid_mask = ~np.isnan(signal)
valid_mean = signal[valid_mask].mean()
print(f"Mean of valid samples: {valid_mean:.2f}") # 4.50
# Equivalent using np.nanmean:
print(f"np.nanmean: {np.nanmean(signal):.2f}") # 4.50
Example: Extracting Sub-matrices with np.ix_
From a 6x6 correlation matrix, extract the 3x3 sub-matrix corresponding to variables 0, 2, and 5.
Use np.ix_ for rectangular selection
rng = np.random.default_rng(42)
X = rng.standard_normal((100, 6))
corr = np.corrcoef(X.T) # 6x6 correlation matrix
indices = [0, 2, 5]
sub_corr = corr[np.ix_(indices, indices)]
print(sub_corr.shape) # (3, 3)
print(sub_corr)
# A valid correlation sub-matrix (symmetric, ones on diagonal)
Example: Pairwise Distances with einsum
Compute the pairwise Euclidean distance matrix for points in
dimensions using np.einsum.
Expand the squared distance formula
The squared distance between points and is:
Implementation
X = rng.standard_normal((100, 3)) # 100 points in 3-D
# Squared norms using einsum
sq_norms = np.einsum('ij,ij->i', X, X) # shape (100,)
# Cross terms
cross = np.einsum('ij,kj->ik', X, X) # shape (100, 100)
# Pairwise squared distances
D2 = sq_norms[:, None] + sq_norms[None, :] - 2 * cross
D = np.sqrt(np.maximum(D2, 0)) # clamp for numerical safety
print(D.shape) # (100, 100)
Einsum Operation Explorer
Visualize different einsum operations: see the input shapes, subscript notation, contracted indices, and output shape.
Parameters
Quick Check
What does a[np.array([True, False, True, False])] return for
a = np.array([10, 20, 30, 40])?
np.array([10, 30])
np.array([20, 40])
np.array([10, 20, 30, 40])
A view of a
np.array([10, 30])Boolean indexing selects elements where True.
Quick Check
What does np.einsum('ij,ij->', A, B) compute for matrices A and B
of the same shape?
The sum of element-wise products (Frobenius inner product)
Matrix multiplication A @ B
Element-wise product (Hadamard)
The trace of A @ B
Repeated indices i,j are summed; no free indices means scalar output.
Common Mistake: Fancy Index Shape Mismatch
Mistake:
Using two index arrays of different shapes expecting paired indexing:
a = np.arange(20).reshape(4, 5)
rows = [0, 2, 3]
cols = [1, 4]
a[rows, cols] # ValueError: shape mismatch
Correction:
Use np.ix_ for rectangular sub-array selection, or ensure index
arrays have the same length for paired element selection:
a[np.ix_(rows, cols)] # 3x2 sub-array
fancy indexing
Indexing an array with integer or boolean arrays. Always produces a copy.
Related: basic indexing, boolean indexing
einsum
Einstein summation: a compact notation for expressing tensor contractions, dot products, outer products, and other linear algebra operations.
Related: Broadcasting, tensor contraction
Advanced Indexing
# Code from: ch05/python/advanced_indexing.py
# Load from backend supplements endpointEinsum Cookbook
# Code from: ch05/python/einsum_cookbook.py
# Load from backend supplements endpointKey Takeaway
Basic slicing returns views; fancy and boolean indexing return copies.
Use np.ix_ for rectangular sub-array selection. np.einsum can
express virtually any linear algebra operation in a single line.