Exercises

ex-sp-ch14-01

Easy

Write a @numba.njit function that computes the element-wise ReLU activation ReLU(x)=max⁑(0,x)\text{ReLU}(x) = \max(0, x) using an explicit loop. Benchmark it against np.maximum(0, x) for N=106N = 10^6.

ex-sp-ch14-02

Easy

Use jax.grad to compute the derivative of f(x)=x3sin⁑(x)f(x) = x^3 \sin(x) and verify it against the analytical derivative fβ€²(x)=3x2sin⁑(x)+x3cos⁑(x)f'(x) = 3x^2 \sin(x) + x^3 \cos(x) at x=2.0x = 2.0.

ex-sp-ch14-03

Easy

Use multiprocessing.Pool to compute the sum of squares for 4 arrays of size 10610^6 each, in parallel. Verify the results match the sequential computation.

ex-sp-ch14-04

Easy

Load the standard C math library with ctypes and call cos(1.0) from Python. Verify it matches math.cos(1.0).

ex-sp-ch14-05

Medium

Create a @numba.vectorize ufunc that computes the soft-plus function softplus(x)=ln⁑(1+ex)\text{softplus}(x) = \ln(1 + e^x) with numerical stability (avoid overflow for large xx). Test with target='parallel'.

ex-sp-ch14-06

Medium

Use jax.vmap to compute the Jacobian of a function f:R3β†’R2f: \mathbb{R}^3 \to \mathbb{R}^2 defined as f(x)=[sin⁑(x1x2), x2ex3]f(x) = [\sin(x_1 x_2),\, x_2 e^{x_3}] at the point x=[1,2,3]x = [1, 2, 3]. Verify against jax.jacobian.

ex-sp-ch14-07

Medium

Compare ProcessPoolExecutor and ThreadPoolExecutor for a CPU-bound task (computing eigenvalues of random matrices) and an I/O-bound task (reading files). Measure speedup for each.

ex-sp-ch14-08

Medium

Use joblib.Parallel with backend='loky' to parallelize a grid search over 100 hyperparameter combinations. Compare wall-clock time with sequential execution.

ex-sp-ch14-09

Hard

Implement a Numba CUDA kernel for matrix multiplication C=Aβ‹…BC = A \cdot B using shared memory tiling. Compare against CuPy's cupy.matmul for N=1024N = 1024.

ex-sp-ch14-10

Hard

Implement Newton's method for finding roots using JAX's automatic differentiation. Use jax.jit to compile the iteration loop and find the root of f(x)=x3βˆ’2xβˆ’5f(x) = x^3 - 2x - 5 starting from x0=2x_0 = 2.

ex-sp-ch14-11

Hard

Write a pybind11 extension that implements a moving average filter on a NumPy array. Compare its performance against np.convolve and scipy.ndimage.uniform_filter1d for N=106N = 10^6.

ex-sp-ch14-12

Challenge

Build a complete acceleration benchmark suite that tests the same algorithm (e.g., pairwise Euclidean distances for NN points in Rd\mathbb{R}^d) across all methods: pure Python, NumPy, Numba (CPU + GPU), JAX, multiprocessing, and a ctypes C implementation. Plot speedup vs. NN for each method.