Temporal Convolutional Networks (TCN)

Definition:
Temporal Convolutional Network (TCN)

TCN uses causal 1D dilated convolutions for sequence modelling:

$y[t] = \sum_{k=0}^{K-1} w[k] \cdot x[t - d \cdot k]$

where $d$ is the dilation rate. Causal padding ensures $y[t]$ depends only on $x[\le t]$ . Stacking layers with exponentially increasing dilation ( $d = 1, 2, 4, 8, \ldots$ ) gives logarithmic receptive field growth.

class TCNBlock(nn.Module):
    def __init__(self, channels, kernel_size, dilation):
        super().__init__()
        pad = (kernel_size - 1) * dilation
        self.conv = nn.Conv1d(channels, channels, kernel_size,
                               padding=pad, dilation=dilation)
        self.chomp = pad  # remove future samples
        self.relu = nn.ReLU()
    def forward(self, x):
        out = self.conv(x)[:, :, :-self.chomp] if self.chomp > 0 else self.conv(x)
        return self.relu(out) + x

Example: TCN vs LSTM: When to Use Which

Compare TCN and LSTM on a sequence prediction task.

Solution

Key differences

Parallelism: TCN processes all time steps simultaneously (fast on GPU). LSTM must process sequentially.
Memory: TCN receptive field is fixed at architecture time. LSTM can theoretically remember indefinitely.
Practice: TCN often matches or beats LSTM for fixed-length sequences and is much faster to train.

Sequence Model Comparison

Model	Parallelisable	Memory	Training Speed	Best For
Vanilla RNN	No	Short-term	Slow	Simple short sequences
LSTM	No	Long-term	Slow	Variable-length, long dependencies
GRU	No	Long-term	Medium	Fewer params than LSTM
TCN	Yes	Fixed receptive field	Fast	Fixed-length, parallel training
Transformer	Yes	Attention-based	Fast	Long sequences (Ch30)

Sequence-to-Sequence and Beam Search Chapter Summary

Temporal Convolutional Networks (TCN)

Definition: Temporal Convolutional Network (TCN)

Example: TCN vs LSTM: When to Use Which

Key differences

Sequence Model Comparison

Definition:
Temporal Convolutional Network (TCN)