Sequence-to-Sequence and Beam Search

Definition:

Sequence-to-Sequence Architecture

Seq2Seq uses an encoder LSTM to compress input into a context vector c=hTenc\mathbf{c} = \mathbf{h}_T^{\text{enc}}, then a decoder LSTM generates output autoregressively:

htdec=LSTM(yt1,ht1dec),yt=Linear(htdec)\mathbf{h}_t^{\text{dec}} = \text{LSTM}(\mathbf{y}_{t-1}, \mathbf{h}_{t-1}^{\text{dec}}), \quad \mathbf{y}_t = \text{Linear}(\mathbf{h}_t^{\text{dec}})

class Seq2Seq(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.encoder = nn.LSTM(input_dim, hidden_dim, batch_first=True)
        self.decoder = nn.LSTM(output_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

Example: Teacher Forcing vs Free Running

Compare training with teacher forcing (feeding ground truth) vs free running (feeding model predictions).