References & Further Reading
References
- C. R. Harris, K. J. Millman, S. J. van der Walt, et al., Array programming with NumPy, Nature, vol. 585, pp. 357-362, 2020
The definitive paper on NumPy's design and ecosystem. Covers the ndarray data model, broadcasting, ufuncs, and NumPy's role as the foundation of scientific Python.
- NumPy Development Team, NumPy Reference Documentation, https://numpy.org/doc/stable/, 2024
The official reference for all NumPy functions. The "NumPy Fundamentals" section on indexing, broadcasting, and byte-swapping is essential reading.
- S. van der Walt, S. C. Colbert, G. Varoquaux, The NumPy array: a structure for efficient numerical computation, Computing in Science & Engineering, vol. 13, no. 2, 2011
Earlier paper explaining the ndarray memory model, strides, and broadcasting in detail. The figures on memory layout and cache performance are particularly instructive.
- T. E. Oliphant, A Guide to NumPy, Trelgol Publishing, 2006
The original book by NumPy's creator. Covers the transition from Numeric and Numarray to NumPy and explains design decisions behind the array interface.
- R. Kern, K. Sheppard, NEP 19 — Random Number Generator Policy, https://numpy.org/neps/nep-0019-rng-policy.html, 2018
The design proposal for NumPy's modern Generator API, explaining why the legacy np.random global state was replaced and how SeedSequence enables reproducible parallel streams.
- A. Rocklin, Einstein Summation in NumPy, https://ajcr.net/Basic-guide-to-einsum/, 2018
An accessible introduction to np.einsum with visual examples of contraction, trace, outer product, and batch operations.
Further Reading
NumPy internals and C-level implementation
NumPy source code, numpy/core/src/multiarray/
Reading the C source for ndarray creation, stride computation, and ufunc dispatch is the ultimate way to understand performance characteristics.
Advanced einsum and tensor networks
opt_einsum library (https://dgasmith.github.io/opt_einsum/)
For complex multi-operand contractions, opt_einsum finds the optimal contraction order, often providing 10-1000x speedup over naive einsum for tensor network computations.
Memory-mapped computing at scale
Dask documentation (https://docs.dask.org/)
Dask extends NumPy to out-of-core and distributed computing, automatically chunking arrays and scheduling operations across cores or machines.
GPU arrays with the NumPy API
CuPy documentation (https://cupy.dev/)
CuPy provides a NumPy-compatible API on NVIDIA GPUs. If you know NumPy, you already know 90% of CuPy — just replace `import numpy as np` with `import cupy as cp`.