Chapter Summary
Chapter Summary
Key Points
- 1.
LSTM is the workhorse recurrent cell. The cell state acts as a gradient highway, enabling learning of long-range dependencies. Use GRU for fewer parameters with similar performance.
- 2.
Always pack variable-length sequences. Use pack_padded_sequence before LSTM and pad_packed_sequence after. This avoids wasted computation on padding and hidden state corruption.
- 3.
Seq2Seq encodes input into a context vector. The decoder generates output autoregressively. Teacher forcing accelerates training but creates exposure bias. Beam search improves inference quality.
- 4.
TCNs offer parallel sequence processing. Causal dilated convolutions process all time steps simultaneously, enabling much faster training than LSTMs. The receptive field grows exponentially with depth.
- 5.
Gradient clipping is essential for RNN training. Recurrent connections amplify gradient magnitudes. Always use clip_grad_norm_ with max_norm around 1-5.
Looking Ahead
Chapter 30 introduces attention mechanisms that allow models to selectively focus on relevant parts of the input, overcoming the bottleneck of compressing the entire sequence into a fixed-size vector.