References & Further Reading

References

  1. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, et al., PyTorch: An Imperative Style, High-Performance Deep Learning Library, NeurIPS, 2019

    The foundational paper describing PyTorch's design philosophy: define-by-run autograd, eager execution, and the tensor abstraction. Essential reading for understanding why PyTorch works the way it does.

  2. A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic Differentiation in Machine Learning: A Survey, Journal of Machine Learning Research, 2018

    Comprehensive survey of forward-mode and reverse-mode automatic differentiation. Explains why reverse-mode (backpropagation) is optimal for scalar-output functions with many parameters.

  3. D. H. Brandwood, A Complex Gradient Operator and Its Application in Adaptive Array Theory, IEE Proceedings F, 1983

    The paper that brought Wirtinger calculus to engineering. Shows that the conjugate Wirtinger derivative is the correct gradient for optimizing real-valued functions of complex variables.

  4. PyTorch Contributors, torch.linalg Documentation, 2025. [Link]

    Official reference for all torch.linalg functions including SVD, eigendecomposition, solve, Cholesky, and their batched variants.

  5. DMLC Community, DLPack: Open In Memory Tensor Structure, 2023. [Link]

    The specification for the DLPack tensor exchange protocol. Describes the memory layout contract that enables zero-copy sharing between NumPy, PyTorch, CuPy, JAX, and TensorFlow.

  6. Consortium for Python Data API Standards, Python Array API Standard, 2023. [Link]

    The specification for a common array API across Python libraries. Enables backend-agnostic code that works with NumPy, PyTorch, CuPy, and JAX without modification.

Further Reading

  • PyTorch internals and the dispatcher

    E. Z. Yang, *PyTorch Internals* (blog series), 2019

    Deep dive into how PyTorch dispatches operations to different backends (CPU, CUDA, MPS) and how autograd integrates with the dispatcher. Essential for understanding performance.

  • Complex-valued neural networks

    C. Trabelsi et al., *Deep Complex Networks*, ICLR 2018

    Extends deep learning to complex-valued parameters and activations. Uses the Wirtinger calculus framework that PyTorch implements in its complex autograd.

  • Differentiable programming for scientific computing

    M. Innes et al., *A Differentiable Programming System to Bridge Machine Learning and Scientific Computing*, arXiv:1907.07587

    Argues that automatic differentiation should be a first-class tool in scientific computing, not just deep learning. PyTorch's autograd is one implementation of this vision.

  • GPU-accelerated linear algebra

    NVIDIA cuSOLVER documentation

    PyTorch's GPU linear algebra (SVD, eigh, solve) is built on cuSOLVER. Understanding its algorithms helps predict performance and numerical behavior.