References & Further Reading

References

  1. S. K. Lam, A. Pitrou, and S. Seibert, Numba: A LLVM-Based Python JIT Compiler, LLVM-HPC Workshop, SC15, 2015

    The original Numba paper describing the architecture of the LLVM-based JIT compiler for NumPy-centric Python code. Covers type inference, compilation pipeline, and GPU code generation.

  2. J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, JAX: Composable Transformations of Python+NumPy Programs, 2018

    The JAX documentation and design paper. Describes the functional transformation approach (jit, grad, vmap, pmap) and the XLA compilation backend.

  3. G. M. Amdahl, Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities, AFIPS Conference Proceedings, 1967

    The seminal paper on parallel speedup limitations. Shows that the serial fraction of a program fundamentally limits the achievable speedup regardless of the number of processors.

  4. W. Jakob, J. Rhinelander, and D. Moldovan, pybind11 -- Seamless Operability Between C++11 and Python, 2017

    The pybind11 documentation. Describes the header-only C++ library for creating Python bindings with automatic type conversion, NumPy support, and STL container handling.

  5. J. L. Gustafson, Reevaluating Amdahl's Law, Communications of the ACM, 1988

    Introduces scaled speedup (Gustafson's Law), showing that parallel speedup can grow linearly if the problem size scales with the number of processors.

  6. C. Lattner and V. Adve, LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, CGO, 2004

    The foundational LLVM paper describing the modular compiler infrastructure that Numba uses for code generation.

  7. A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Automatic Differentiation in Machine Learning: A Survey, JMLR, 2018

    Comprehensive survey of automatic differentiation techniques including forward mode, reverse mode, and their implementations in modern ML frameworks.

Further Reading

  • Numba documentation and tutorials

    Numba documentation (https://numba.readthedocs.io/)

    The official Numba documentation covers all decorators, supported Python and NumPy features, CUDA programming, and performance tips.

  • JAX documentation

    JAX documentation (https://jax.readthedocs.io/)

    Comprehensive guide to JAX's functional transformations, XLA compilation, and the growing ecosystem (Flax, Optax, Haiku).

  • Cython for static compilation

    K. W. Smith, *Cython*, O'Reilly, 2015

    Cython offers an alternative to Numba: you annotate Python code with C types and compile to a C extension. More control than Numba but requires a separate compilation step.

  • High-performance Python

    M. Gorelick and I. Ozsvald, *High Performance Python*, 2nd ed., O'Reilly, 2020

    Covers profiling, Cython, Numba, multiprocessing, and distributed computing with a focus on practical optimization strategies.