Temporal Convolutional Networks (TCN)
Definition: Temporal Convolutional Network (TCN)
Temporal Convolutional Network (TCN)
TCN uses causal 1D dilated convolutions for sequence modelling:
where is the dilation rate. Causal padding ensures depends only on . Stacking layers with exponentially increasing dilation () gives logarithmic receptive field growth.
class TCNBlock(nn.Module):
def __init__(self, channels, kernel_size, dilation):
super().__init__()
pad = (kernel_size - 1) * dilation
self.conv = nn.Conv1d(channels, channels, kernel_size,
padding=pad, dilation=dilation)
self.chomp = pad # remove future samples
self.relu = nn.ReLU()
def forward(self, x):
out = self.conv(x)[:, :, :-self.chomp] if self.chomp > 0 else self.conv(x)
return self.relu(out) + x
Example: TCN vs LSTM: When to Use Which
Compare TCN and LSTM on a sequence prediction task.
Solution
Key differences
- Parallelism: TCN processes all time steps simultaneously (fast on GPU). LSTM must process sequentially.
- Memory: TCN receptive field is fixed at architecture time. LSTM can theoretically remember indefinitely.
- Practice: TCN often matches or beats LSTM for fixed-length sequences and is much faster to train.
Sequence Model Comparison
| Model | Parallelisable | Memory | Training Speed | Best For |
|---|---|---|---|---|
| Vanilla RNN | No | Short-term | Slow | Simple short sequences |
| LSTM | No | Long-term | Slow | Variable-length, long dependencies |
| GRU | No | Long-term | Medium | Fewer params than LSTM |
| TCN | Yes | Fixed receptive field | Fast | Fixed-length, parallel training |
| Transformer | Yes | Attention-based | Fast | Long sequences (Ch30) |