References & Further Reading

References

  1. C. R. Harris, K. J. Millman, S. J. van der Walt, et al., Array programming with NumPy, Nature, vol. 585, pp. 357-362, 2020

    The definitive paper on NumPy's design and ecosystem. Covers the ndarray data model, broadcasting, ufuncs, and NumPy's role as the foundation of scientific Python.

  2. NumPy Development Team, NumPy Reference Documentation, https://numpy.org/doc/stable/, 2024

    The official reference for all NumPy functions. The "NumPy Fundamentals" section on indexing, broadcasting, and byte-swapping is essential reading.

  3. S. van der Walt, S. C. Colbert, G. Varoquaux, The NumPy array: a structure for efficient numerical computation, Computing in Science & Engineering, vol. 13, no. 2, 2011

    Earlier paper explaining the ndarray memory model, strides, and broadcasting in detail. The figures on memory layout and cache performance are particularly instructive.

  4. T. E. Oliphant, A Guide to NumPy, Trelgol Publishing, 2006

    The original book by NumPy's creator. Covers the transition from Numeric and Numarray to NumPy and explains design decisions behind the array interface.

  5. R. Kern, K. Sheppard, NEP 19 — Random Number Generator Policy, https://numpy.org/neps/nep-0019-rng-policy.html, 2018

    The design proposal for NumPy's modern Generator API, explaining why the legacy np.random global state was replaced and how SeedSequence enables reproducible parallel streams.

  6. A. Rocklin, Einstein Summation in NumPy, https://ajcr.net/Basic-guide-to-einsum/, 2018

    An accessible introduction to np.einsum with visual examples of contraction, trace, outer product, and batch operations.

Further Reading

  • NumPy internals and C-level implementation

    NumPy source code, numpy/core/src/multiarray/

    Reading the C source for ndarray creation, stride computation, and ufunc dispatch is the ultimate way to understand performance characteristics.

  • Advanced einsum and tensor networks

    opt_einsum library (https://dgasmith.github.io/opt_einsum/)

    For complex multi-operand contractions, opt_einsum finds the optimal contraction order, often providing 10-1000x speedup over naive einsum for tensor network computations.

  • Memory-mapped computing at scale

    Dask documentation (https://docs.dask.org/)

    Dask extends NumPy to out-of-core and distributed computing, automatically chunking arrays and scheduling operations across cores or machines.

  • GPU arrays with the NumPy API

    CuPy documentation (https://cupy.dev/)

    CuPy provides a NumPy-compatible API on NVIDIA GPUs. If you know NumPy, you already know 90% of CuPy — just replace `import numpy as np` with `import cupy as cp`.