References & Further Reading
References
- S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, 1997
The original LSTM paper solving the vanishing gradient problem.
- K. Cho et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, EMNLP, 2014
Introduces the GRU cell and the encoder-decoder (Seq2Seq) architecture.
- S. Bai, J. Z. Kolter, and V. Koltun, An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, arXiv:1803.01271, 2018
Shows TCNs often outperform LSTMs on standard sequence benchmarks.
- A. Graves, A. Mohamed, and G. Hinton, Speech Recognition with Deep Recurrent Neural Networks, ICASSP, 2013
Demonstrated LSTMs for large-scale speech recognition.
Further Reading
Understanding LSTM Networks
Christopher Olah's blog post (colah.github.io)
Best visual explanation of LSTM gates and information flow.
TCN implementation
https://github.com/locuslab/TCN
Reference implementation of temporal convolutional networks.