Chapter Summary

Chapter Summary

Key Points

  • 1.

    ndarray internals determine performance. Every NumPy array is a thin wrapper around a contiguous data buffer with shape, strides, and dtype metadata. C-contiguous (row-major) layout is the default; iterating along the last axis is cache-friendly. Understanding strides explains why basic slicing creates views while fancy indexing creates copies.

  • 2.

    Advanced indexing unlocks expressive data selection. Boolean indexing provides SQL-like WHERE filtering. np.ix_ creates open meshes for rectangular sub-array selection. np.einsum expresses any linear algebra operation in a single line of subscript notation — master its rules and you can replace pages of loop-based code.

  • 3.

    Broadcasting eliminates explicit loops and data replication. Dimensions align from the right; size-1 axes stretch to match. The pattern a[:, None] op b[None, :] computes outer operations with zero memory overhead. Broadcasting replaces np.tile / np.repeat in virtually every case.

  • 4.

    Vectorization delivers 50-200x speedups. Python loops over arrays pay ~100 ns per element in interpreter overhead. Vectorized NumPy operations run in compiled C at ~1-5 ns per element. Use np.where for conditionals, np.select for multi-branch logic, and avoid np.vectorize (it is just a loop in disguise).

  • 5.

    The modern RNG API provides reproducible, independent random streams. Use np.random.default_rng(seed) instead of the legacy np.random.seed(). Construct complex Gaussian noise from two independent real Gaussians. Use SeedSequence.spawn() for parallel-safe independent streams.

  • 6.

    Structured arrays and memory-mapped files extend NumPy to heterogeneous and out-of-core data. Structured dtypes store mixed-type records; np.memmap enables array operations on files larger than RAM; HDF5 and zarr add metadata, compression, and cloud support.

Looking Ahead

Chapter 6 takes the NumPy foundation and builds on it with SciPy: linear algebra routines, sparse matrices, optimization, signal processing, and statistical functions. Every SciPy function operates on NumPy arrays — the ndarray is the universal currency of scientific Python.