References & Further Reading
References
- I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016
The comprehensive deep learning textbook. Chapters 6-8 cover feed-forward networks, regularization, and optimization in depth.
- A. Paszke et al., PyTorch: An Imperative Style, High-Performance Deep Learning Library, NeurIPS, 2019
The original PyTorch paper describing the design philosophy, autograd engine, and performance characteristics.
- K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, ICCV, 2015
Introduces Kaiming initialization for ReLU networks, showing proper init is critical for training very deep networks.
- D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, ICLR, 2014
The Adam optimizer paper. Combines adaptive per-parameter learning rates with momentum.
- I. Loshchilov and F. Hutter, Decoupled Weight Decay Regularization, ICLR, 2019
Introduces AdamW, fixing the interaction between weight decay and adaptive learning rates in Adam.
- T. O'Shea and J. Hoydis, An Introduction to Deep Learning for the Physical Layer, IEEE Trans. CSIT, 2017
Pioneering work on end-to-end learning of communication systems using neural network autoencoders.
Further Reading
PyTorch official tutorials
https://pytorch.org/tutorials/
Hands-on tutorials covering all aspects of PyTorch.
Andrej Karpathy — A Recipe for Training Neural Networks
https://karpathy.github.io/2019/04/25/recipe/
Practical debugging and training advice from an expert practitioner.
torch.compile and PyTorch 2.0
https://pytorch.org/get-started/pytorch-2.0/
Graph-mode compilation for 2x+ speedups without code changes.