Sequence-to-Sequence and Beam Search
Definition: Sequence-to-Sequence Architecture
Sequence-to-Sequence Architecture
Seq2Seq uses an encoder LSTM to compress input into a context vector , then a decoder LSTM generates output autoregressively:
class Seq2Seq(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super().__init__()
self.encoder = nn.LSTM(input_dim, hidden_dim, batch_first=True)
self.decoder = nn.LSTM(output_dim, hidden_dim, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_dim)
Definition: Beam Search Decoding
Beam Search Decoding
Beam search maintains candidate sequences (beams) at each step, expanding each with all possible next tokens, then keeping the top- by cumulative log-probability:
Beam width is greedy search. Typical .
Example: Teacher Forcing vs Free Running
Compare training with teacher forcing (feeding ground truth) vs free running (feeding model predictions).
Solution
Trade-off
Teacher forcing gives faster convergence but creates exposure bias: the model never sees its own errors during training. Scheduled sampling gradually transitions from teacher forcing to free running.
Beam Search Visualisation
See how beam search expands and prunes candidate sequences.