Chapter Summary

Chapter Summary

Key Points

  • 1.

    LSTM is the workhorse recurrent cell. The cell state acts as a gradient highway, enabling learning of long-range dependencies. Use GRU for fewer parameters with similar performance.

  • 2.

    Always pack variable-length sequences. Use pack_padded_sequence before LSTM and pad_packed_sequence after. This avoids wasted computation on padding and hidden state corruption.

  • 3.

    Seq2Seq encodes input into a context vector. The decoder generates output autoregressively. Teacher forcing accelerates training but creates exposure bias. Beam search improves inference quality.

  • 4.

    TCNs offer parallel sequence processing. Causal dilated convolutions process all time steps simultaneously, enabling much faster training than LSTMs. The receptive field grows exponentially with depth.

  • 5.

    Gradient clipping is essential for RNN training. Recurrent connections amplify gradient magnitudes. Always use clip_grad_norm_ with max_norm around 1-5.

Looking Ahead

Chapter 30 introduces attention mechanisms that allow models to selectively focus on relevant parts of the input, overcoming the bottleneck of compressing the entire sequence into a fixed-size vector.