References & Further Reading
References
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, et al., PyTorch: An Imperative Style, High-Performance Deep Learning Library, NeurIPS, 2019
The foundational paper describing PyTorch's design philosophy: define-by-run autograd, eager execution, and the tensor abstraction. Essential reading for understanding why PyTorch works the way it does.
- A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic Differentiation in Machine Learning: A Survey, Journal of Machine Learning Research, 2018
Comprehensive survey of forward-mode and reverse-mode automatic differentiation. Explains why reverse-mode (backpropagation) is optimal for scalar-output functions with many parameters.
- D. H. Brandwood, A Complex Gradient Operator and Its Application in Adaptive Array Theory, IEE Proceedings F, 1983
The paper that brought Wirtinger calculus to engineering. Shows that the conjugate Wirtinger derivative is the correct gradient for optimizing real-valued functions of complex variables.
- PyTorch Contributors, torch.linalg Documentation, 2025. [Link]
Official reference for all torch.linalg functions including SVD, eigendecomposition, solve, Cholesky, and their batched variants.
- DMLC Community, DLPack: Open In Memory Tensor Structure, 2023. [Link]
The specification for the DLPack tensor exchange protocol. Describes the memory layout contract that enables zero-copy sharing between NumPy, PyTorch, CuPy, JAX, and TensorFlow.
- Consortium for Python Data API Standards, Python Array API Standard, 2023. [Link]
The specification for a common array API across Python libraries. Enables backend-agnostic code that works with NumPy, PyTorch, CuPy, and JAX without modification.
Further Reading
PyTorch internals and the dispatcher
E. Z. Yang, *PyTorch Internals* (blog series), 2019
Deep dive into how PyTorch dispatches operations to different backends (CPU, CUDA, MPS) and how autograd integrates with the dispatcher. Essential for understanding performance.
Complex-valued neural networks
C. Trabelsi et al., *Deep Complex Networks*, ICLR 2018
Extends deep learning to complex-valued parameters and activations. Uses the Wirtinger calculus framework that PyTorch implements in its complex autograd.
Differentiable programming for scientific computing
M. Innes et al., *A Differentiable Programming System to Bridge Machine Learning and Scientific Computing*, arXiv:1907.07587
Argues that automatic differentiation should be a first-class tool in scientific computing, not just deep learning. PyTorch's autograd is one implementation of this vision.
GPU-accelerated linear algebra
NVIDIA cuSOLVER documentation
PyTorch's GPU linear algebra (SVD, eigh, solve) is built on cuSOLVER. Understanding its algorithms helps predict performance and numerical behavior.