References & Further Reading

References

  1. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016

    The comprehensive deep learning textbook. Chapters 6-8 cover feed-forward networks, regularization, and optimization in depth.

  2. A. Paszke et al., PyTorch: An Imperative Style, High-Performance Deep Learning Library, NeurIPS, 2019

    The original PyTorch paper describing the design philosophy, autograd engine, and performance characteristics.

  3. K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, ICCV, 2015

    Introduces Kaiming initialization for ReLU networks, showing proper init is critical for training very deep networks.

  4. D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, ICLR, 2014

    The Adam optimizer paper. Combines adaptive per-parameter learning rates with momentum.

  5. I. Loshchilov and F. Hutter, Decoupled Weight Decay Regularization, ICLR, 2019

    Introduces AdamW, fixing the interaction between weight decay and adaptive learning rates in Adam.

  6. T. O'Shea and J. Hoydis, An Introduction to Deep Learning for the Physical Layer, IEEE Trans. CSIT, 2017

    Pioneering work on end-to-end learning of communication systems using neural network autoencoders.

Further Reading

  • PyTorch official tutorials

    https://pytorch.org/tutorials/

    Hands-on tutorials covering all aspects of PyTorch.

  • Andrej Karpathy — A Recipe for Training Neural Networks

    https://karpathy.github.io/2019/04/25/recipe/

    Practical debugging and training advice from an expert practitioner.

  • torch.compile and PyTorch 2.0

    https://pytorch.org/get-started/pytorch-2.0/

    Graph-mode compilation for 2x+ speedups without code changes.