Exercises

ex-sp-ch05-01

Easy

Create a 5x4 array of float64 zeros. Print its shape, dtype, strides, and total number of bytes. Verify it is C-contiguous.

Show Hint

Use np.zeros((5, 4)) and inspect .flags["C_CONTIGUOUS"].

Solution

import numpy as np
a = np.zeros((5, 4), dtype=np.float64)
print(f"shape:   {a.shape}")       # (5, 4)
print(f"dtype:   {a.dtype}")       # float64
print(f"strides: {a.strides}")     # (32, 8)
print(f"nbytes:  {a.nbytes}")      # 160
print(f"C-contig: {a.flags['C_CONTIGUOUS']}")  # True

ex-sp-ch05-02

Easy

Given a = np.arange(20).reshape(4, 5), use boolean indexing to extract all elements greater than 10 and less than 18.

Show Hint

Combine conditions with & and parentheses.

Solution

a = np.arange(20).reshape(4, 5)
mask = (a > 10) & (a < 18)
result = a[mask]
print(result)  # [11, 12, 13, 14, 15, 16, 17]

ex-sp-ch05-03

Easy

Add a bias vector b = np.array([1, 2, 3]) to every row of a matrix X = np.ones((5, 3)) using broadcasting (no loops).

Show Hint

Broadcasting aligns shapes from the right: (5, 3) + (3,) works directly.

Solution

X = np.ones((5, 3))
b = np.array([1, 2, 3])
result = X + b    # shape (5, 3)
print(result)
# [[2. 3. 4.]
#  [2. 3. 4.]
#  ...]

ex-sp-ch05-04

Easy

Replace a Python loop that computes the ReLU function (max(0, x) for each element) with a single vectorized call.

Show Hint

Use np.maximum(x, 0) or np.where(x > 0, x, 0).

Solution

x = np.array([-2, -1, 0, 1, 2, 3], dtype=np.float64)
relu = np.maximum(x, 0)
print(relu)  # [0. 0. 0. 1. 2. 3.]

ex-sp-ch05-05

Easy

Generate 1000 samples from a Gaussian distribution with mean 5 and standard deviation 2 using the modern Generator API. Compute and print the sample mean and standard deviation.

Show Hint

Use np.random.default_rng(42).normal(5, 2, 1000).

Solution

rng = np.random.default_rng(42)
samples = rng.normal(loc=5.0, scale=2.0, size=1000)
print(f"mean: {samples.mean():.3f}")   # ~5.0
print(f"std:  {samples.std():.3f}")    # ~2.0

ex-sp-ch05-06

Medium

For a = np.arange(24).reshape(2, 3, 4), compute the strides by hand for both C-order and Fortran-order. Verify with NumPy.

Show Hint

C-order stride for axis k: product of all subsequent dimensions times element size.

F-order: stride for axis k: product of all preceding dimensions times element size.

Solution

C-order strides

Element size: 8 bytes (int64 on most systems).

Axis 0: $3 \times 4 \times 8 = 96$
Axis 1: $4 \times 8 = 32$
Axis 2: $8$ Strides: (96, 32, 8).

F-order strides

Axis 0: $8$
Axis 1: $2 \times 8 = 16$
Axis 2: $2 \times 3 \times 8 = 48$ Strides: (8, 16, 48).

Verification

a_c = np.arange(24).reshape(2, 3, 4)
a_f = np.asfortranarray(a_c)
print(a_c.strides)  # (96, 32, 8)
print(a_f.strides)  # (8, 16, 48)

ex-sp-ch05-07

Medium

Use np.ix_ to extract a 3x3 sub-matrix from a 6x6 matrix, selecting rows [0, 2, 5] and columns [1, 3, 4]. Verify the result is a copy.

Show Hint

np.ix_([0,2,5], [1,3,4]) creates the open mesh.

Solution

a = np.arange(36).reshape(6, 6)
sub = a[np.ix_([0, 2, 5], [1, 3, 4])]
print(sub.shape)  # (3, 3)
print(sub)
# [[ 1,  3,  4],
#  [13, 15, 16],
#  [31, 33, 34]]
print(np.shares_memory(a, sub))  # False — copy

ex-sp-ch05-08

Medium

Compute the outer product of two vectors u = [1, 2, 3] and v = [4, 5, 6, 7] using: (a) broadcasting, (b) np.outer, and (c) np.einsum. Verify all three give the same result.

Show Hint

Broadcasting: u[:, None] * v[None, :].

einsum: np.einsum("i,j->ij", u, v).

Solution

Three approaches

u = np.array([1, 2, 3])
v = np.array([4, 5, 6, 7])

r1 = u[:, None] * v[None, :]     # broadcasting
r2 = np.outer(u, v)              # explicit
r3 = np.einsum('i,j->ij', u, v)  # einsum

assert np.array_equal(r1, r2)
assert np.array_equal(r1, r3)
print(r1.shape)  # (3, 4)

ex-sp-ch05-09

Medium

Replace the following loop with a vectorized version using np.select:

result = np.empty(len(x))
for i in range(len(x)):
    if x[i] < -1:
        result[i] = -1
    elif x[i] > 1:
        result[i] = 1
    else:
        result[i] = x[i] ** 3

Show Hint

Define three conditions and three choices for np.select.

Solution

Vectorized solution

x = np.linspace(-3, 3, 1000)
conditions = [x < -1, x > 1]
choices = [-1, 1]
result = np.select(conditions, choices, default=x ** 3)

ex-sp-ch05-10

Medium

Create a structured array with fields (id: int32, x: float64, y: float64, label: U5) and populate it with 10 random points. Sort by the x field.

Show Hint

Use np.dtype with a list of (name, type) tuples.

Solution

dt = np.dtype([('id', np.int32), ('x', np.float64),
               ('y', np.float64), ('label', 'U5')])
rng = np.random.default_rng(42)
points = np.zeros(10, dtype=dt)
points['id'] = np.arange(10)
points['x'] = rng.uniform(-5, 5, 10)
points['y'] = rng.uniform(-5, 5, 10)
points['label'] = [f"P{i}" for i in range(10)]

sorted_pts = np.sort(points, order='x')
print(sorted_pts[['id', 'x']])

ex-sp-ch05-11

Medium

Generate 4 independent streams of 1000 Gaussian samples using SeedSequence.spawn(). Verify that the streams are uncorrelated by computing pairwise correlation coefficients.

Show Hint

Use np.corrcoef on the stacked streams.

Solution

ss = np.random.SeedSequence(42)
rngs = [np.random.default_rng(s) for s in ss.spawn(4)]
streams = np.array([rng.standard_normal(1000) for rng in rngs])

corr = np.corrcoef(streams)
print(corr)
# Diagonal is 1.0; off-diagonal values should be close to 0
off_diag = corr[~np.eye(4, dtype=bool)]
print(f"Max off-diagonal correlation: {np.abs(off_diag).max():.4f}")

ex-sp-ch05-12

Hard

Implement a function sliding_window_view(a, window_size) that returns a 2-D array where row i contains a[i:i+window_size], using only stride manipulation (no loops, no copying).

Hint: np.lib.stride_tricks.as_strided can create a view with custom strides.

Show Hint

The output shape is (len(a) - window_size + 1, window_size).

The strides should be (a.strides[0], a.strides[0]) — same stride along both axes.

Solution

Stride trick implementation

from numpy.lib.stride_tricks import as_strided

def sliding_window_view(a, window_size):
    shape = (len(a) - window_size + 1, window_size)
    strides = (a.strides[0], a.strides[0])
    return as_strided(a, shape=shape, strides=strides)

# Or use the built-in (NumPy >= 1.20):
# np.lib.stride_tricks.sliding_window_view(a, window_size)

a = np.arange(10)
windows = sliding_window_view(a, 3)
print(windows)
# [[0, 1, 2], [1, 2, 3], ..., [7, 8, 9]]
print(np.shares_memory(a, windows))  # True — it is a view!

ex-sp-ch05-13

Hard

Compute the pairwise Euclidean distance matrix for n points in d dimensions using three different methods: (a) broadcasting, (b) einsum, and (c) scipy.spatial.distance.cdist. Benchmark all three for n=1000, d=3.

Show Hint

Broadcasting: diff = X[:, None, :] - X[None, :, :].

einsum for squared norms: np.einsum("ij,ij->i", X, X).

Solution

Three implementations

import time
from scipy.spatial.distance import cdist

rng = np.random.default_rng(42)
X = rng.standard_normal((1000, 3))

# (a) Broadcasting
t0 = time.perf_counter()
diff = X[:, None, :] - X[None, :, :]
D1 = np.sqrt((diff ** 2).sum(axis=-1))
print(f"Broadcasting: {time.perf_counter()-t0:.4f}s")

# (b) einsum
t0 = time.perf_counter()
sq = np.einsum('ij,ij->i', X, X)
cross = np.einsum('ij,kj->ik', X, X)
D2 = np.sqrt(np.maximum(sq[:, None] + sq[None, :] - 2*cross, 0))
print(f"einsum: {time.perf_counter()-t0:.4f}s")

# (c) cdist
t0 = time.perf_counter()
D3 = cdist(X, X, metric='euclidean')
print(f"cdist: {time.perf_counter()-t0:.4f}s")

assert np.allclose(D1, D2, atol=1e-10)
assert np.allclose(D1, D3, atol=1e-10)

ex-sp-ch05-14

Hard

Use np.einsum to implement batch matrix multiplication: given A of shape (batch, m, k) and B of shape (batch, k, n), compute C of shape (batch, m, n) where C[i] = A[i] @ B[i]. Verify against np.matmul.

Show Hint

einsum subscripts: "bij,bjk->bik".

Solution

rng = np.random.default_rng(42)
batch, m, k, n = 32, 4, 5, 3
A = rng.standard_normal((batch, m, k))
B = rng.standard_normal((batch, k, n))

C_einsum = np.einsum('bij,bjk->bik', A, B)
C_matmul = np.matmul(A, B)   # or A @ B

assert np.allclose(C_einsum, C_matmul)
print(f"Shape: {C_einsum.shape}")  # (32, 4, 3)

ex-sp-ch05-15

Hard

Write a function that generates circularly-symmetric complex Gaussian noise with a given spatial covariance matrix $\mathbf{C}$ (shape (n_antennas, n_antennas)). Generate 10,000 samples and verify the sample covariance matches $\mathbf{C}$ .

Show Hint

Use Cholesky decomposition: $\mathbf{L}\mathbf{L}^H = \mathbf{C}$ .

Generate i.i.d. CN(0,1) and multiply by $\mathbf{L}$ .

Solution

def correlated_complex_gaussian(rng, C, n_samples):
    n = C.shape[0]
    L = np.linalg.cholesky(C)
    # i.i.d. CN(0, 1) samples
    z = (rng.standard_normal((n_samples, n))
         + 1j * rng.standard_normal((n_samples, n))) / np.sqrt(2)
    return z @ L.T  # shape (n_samples, n)

rng = np.random.default_rng(42)
n_ant = 4
# Create a valid positive-definite Hermitian covariance
A = rng.standard_normal((n_ant, n_ant)) + 1j * rng.standard_normal((n_ant, n_ant))
C = A @ A.conj().T / n_ant

samples = correlated_complex_gaussian(rng, C, 10_000)
C_est = samples.conj().T @ samples / 10_000
print(f"Max error: {np.abs(C - C_est).max():.4f}")  # should be small

ex-sp-ch05-16

Hard

Create a memory-mapped file containing a 10,000 x 100 float32 array. Write data in chunks of 1,000 rows, then memory-map it read-only and compute column-wise statistics without loading the entire file.

Show Hint

Use np.memmap with mode="w+" for writing.

Solution

import tempfile, os
rng = np.random.default_rng(42)

path = os.path.join(tempfile.gettempdir(), 'test_memmap.bin')

# Write in chunks
mm_w = np.memmap(path, dtype=np.float32, mode='w+',
                 shape=(10_000, 100))
for start in range(0, 10_000, 1000):
    mm_w[start:start+1000] = rng.standard_normal((1000, 100)).astype(np.float32)
mm_w.flush()
del mm_w

# Read-only access
mm_r = np.memmap(path, dtype=np.float32, mode='r',
                 shape=(10_000, 100))
col_means = mm_r.mean(axis=0)    # computed via chunked access
col_stds = mm_r.std(axis=0)
print(f"Mean range: [{col_means.min():.3f}, {col_means.max():.3f}]")
print(f"Std range:  [{col_stds.min():.3f}, {col_stds.max():.3f}]")

os.remove(path)

ex-sp-ch05-17

Challenge

Implement a vectorized 2-D convolution (no loops) using stride tricks. Given an image of shape (H, W) and a kernel of shape (kH, kW), create a view of all patches using as_strided, then compute the convolution with a single einsum call.

Show Hint

Output shape: (H - kH + 1, W - kW + 1).

Use as_strided to create shape (out_H, out_W, kH, kW) with appropriate strides.

Then np.einsum("ijkl,kl->ij", patches, kernel).

Solution

from numpy.lib.stride_tricks import as_strided

def conv2d_strided(image, kernel):
    H, W = image.shape
    kH, kW = kernel.shape
    out_H, out_W = H - kH + 1, W - kW + 1

    s0, s1 = image.strides  # bytes per row, bytes per col
    patches = as_strided(
        image,
        shape=(out_H, out_W, kH, kW),
        strides=(s0, s1, s0, s1),
    )
    return np.einsum('ijkl,kl->ij', patches, kernel)

# Test
rng = np.random.default_rng(42)
img = rng.standard_normal((100, 100))
kern = np.array([[1, 0, -1], [2, 0, -2], [1, 0, -1]], dtype=np.float64)  # Sobel
edges = conv2d_strided(img, kern)
print(f"Output shape: {edges.shape}")  # (98, 98)

ex-sp-ch05-18

Challenge

Benchmark the performance difference between iterating over rows (axis 0) vs columns (axis 1) for both C-contiguous and Fortran-contiguous arrays. Create a 5000x5000 matrix and measure the time to compute row sums and column sums for each layout. Explain the results in terms of cache behavior.

Show Hint

Use np.asfortranarray to create F-contiguous version.

Time a.sum(axis=0) and a.sum(axis=1) for each layout.

Solution

Benchmark and analysis

import time
n = 5000
rng = np.random.default_rng(42)
c_arr = rng.standard_normal((n, n))
f_arr = np.asfortranarray(c_arr)

results = {}
for name, arr in [('C', c_arr), ('F', f_arr)]:
    for axis, label in [(0, 'col_sum'), (1, 'row_sum')]:
        t0 = time.perf_counter()
        for _ in range(10):
            _ = arr.sum(axis=axis)
        elapsed = (time.perf_counter() - t0) / 10
        results[f"{name}_{label}"] = elapsed

for k, v in results.items():
    print(f"{k}: {v:.4f}s")
# C_row_sum < C_col_sum  (rows are contiguous in C-order)
# F_col_sum < F_row_sum  (columns are contiguous in F-order)

ex-sp-ch05-19

Challenge

Implement a Monte Carlo simulation that estimates $\pi$ using vectorized NumPy operations. Run it with $10^7$ samples. Then run 100 independent trials with different seeds to estimate the standard error and construct a 95% confidence interval for $\pi$ .

Show Hint

Fraction of random points in unit square that fall inside unit circle = pi/4.

Use SeedSequence.spawn(100) for independent trials.

Solution

def estimate_pi(rng, n):
    x = rng.uniform(0, 1, n)
    y = rng.uniform(0, 1, n)
    return 4 * np.mean(x**2 + y**2 <= 1)

# Single high-precision estimate
rng = np.random.default_rng(42)
pi_hat = estimate_pi(rng, 10_000_000)
print(f"Pi estimate: {pi_hat:.6f}")

# 100 independent trials
ss = np.random.SeedSequence(42)
rngs = [np.random.default_rng(s) for s in ss.spawn(100)]
estimates = np.array([estimate_pi(rng, 10_000_000) for rng in rngs])

mean = estimates.mean()
se = estimates.std() / np.sqrt(len(estimates))
ci = (mean - 1.96 * se, mean + 1.96 * se)
print(f"Mean: {mean:.6f}, SE: {se:.6f}")
print(f"95% CI: ({ci[0]:.6f}, {ci[1]:.6f})")

ex-sp-ch05-20

Challenge

Use np.einsum to compute the following tensor operations without any loops:

Trace of a matrix: $\text{tr}(A)$
Matrix-vector product: $A\mathbf{x}$
Quadratic form: $\mathbf{x}^T A \mathbf{x}$
Batch outer products: given vectors u[i] and v[i] (both shape (batch, d)), compute all outer products u[i] v[i]^T
Tensor contraction: contract a 4-D tensor T[i,j,k,l] with a matrix M[k,m] along axis k

Verify each result against explicit NumPy operations.

Show Hint

Trace: "ii->"; matrix-vector: "ij,j->i"; quadratic: "i,ij,j->".

Solution

All five operations

rng = np.random.default_rng(42)
n, d, batch = 5, 4, 10

A = rng.standard_normal((n, n))
x = rng.standard_normal(n)

# 1. Trace
tr_ein = np.einsum('ii->', A)
assert np.isclose(tr_ein, np.trace(A))

# 2. Matrix-vector
mv_ein = np.einsum('ij,j->i', A, x)
assert np.allclose(mv_ein, A @ x)

# 3. Quadratic form
qf_ein = np.einsum('i,ij,j->', x, A, x)
assert np.isclose(qf_ein, x @ A @ x)

# 4. Batch outer products
u = rng.standard_normal((batch, d))
v = rng.standard_normal((batch, d))
bo_ein = np.einsum('bi,bj->bij', u, v)
bo_exp = u[:, :, None] * v[:, None, :]
assert np.allclose(bo_ein, bo_exp)

# 5. Tensor contraction
T = rng.standard_normal((3, 4, 5, 6))
M = rng.standard_normal((5, 7))
tc_ein = np.einsum('ijkl,km->ijml', T, M)
tc_exp = np.tensordot(T, M, axes=([2], [0]))
# tensordot puts contracted result at end; transpose to match
assert np.allclose(tc_ein, tc_exp.transpose(0, 1, 3, 2))

Chapter Summary References & Further Reading