References & Further Reading

References

  1. S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, 1997

    The original LSTM paper solving the vanishing gradient problem.

  2. K. Cho et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP, 2014

    Introduces the GRU cell and the encoder-decoder (Seq2Seq) architecture.

  3. S. Bai, J. Z. Kolter, and V. Koltun, An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, arXiv:1803.01271, 2018

    Shows TCNs often outperform LSTMs on standard sequence benchmarks.

  4. A. Graves, A. Mohamed, and G. Hinton, Speech Recognition with Deep Recurrent Neural Networks, ICASSP, 2013

    Demonstrated LSTMs for large-scale speech recognition.

Further Reading

  • Understanding LSTM Networks

    Christopher Olah's blog post (colah.github.io)

    Best visual explanation of LSTM gates and information flow.

  • TCN implementation

    https://github.com/locuslab/TCN

    Reference implementation of temporal convolutional networks.