Chapter Summary
Chapter Summary
Key Points
- 1.
Use Numba for tight numerical loops.
@numba.njitcompiles Python loops to native machine code via LLVM, achieving 50-200x speedup over pure Python. Always usenopython=True(or@njit), warm up before benchmarking, and useparallel=Truewithprangefor multi-core scaling.@numba.vectorizecreates custom ufuncs, and@numba.cuda.jittargets NVIDIA GPUs. - 2.
Use JAX for functional numerical computing. JAX provides composable transformations:
jax.jitfor XLA compilation,jax.gradfor automatic differentiation, andjax.vmapfor automatic vectorization. Write pure functions withjax.numpy, and compose transformations freely:jax.jit(jax.vmap(jax.grad(f)))gives a compiled, batched gradient. JAX arrays are immutable; usex.at[i].set(v)instead ofx[i] = v. - 3.
Choose the right parallelism tool. The GIL prevents CPU-bound thread parallelism in CPython. Use
multiprocessing.Poolorconcurrent.futures.ProcessPoolExecutorfor CPU-bound work,ThreadPoolExecutorfor I/O-bound work, andjoblib.Parallelfor scikit-learn-style loops. Amdahl's Law sets the speedup ceiling: where is the serial fraction. - 4.
Interface with C/C++ when needed. Use ctypes for quick calls to existing shared libraries (no compilation), cffi for safer C API wrapping, pybind11 for new C++ extensions with automatic NumPy conversion, and f2py for Fortran codes. Always pass contiguous arrays and batch operations to amortize FFI call overhead.
Looking Ahead
Chapter 15 applies these acceleration techniques to large-scale data processing with Pandas and Dask. The multiprocessing patterns from Section 14.3 appear again in Dask's distributed scheduler, and the JIT compilation from Section 14.1 can accelerate custom Pandas apply functions via Numba.