Temporal Convolutional Networks (TCN)

Definition:

Temporal Convolutional Network (TCN)

TCN uses causal 1D dilated convolutions for sequence modelling:

y[t]=k=0K1w[k]x[tdk]y[t] = \sum_{k=0}^{K-1} w[k] \cdot x[t - d \cdot k]

where dd is the dilation rate. Causal padding ensures y[t]y[t] depends only on x[t]x[\le t]. Stacking layers with exponentially increasing dilation (d=1,2,4,8,d = 1, 2, 4, 8, \ldots) gives logarithmic receptive field growth.

class TCNBlock(nn.Module):
    def __init__(self, channels, kernel_size, dilation):
        super().__init__()
        pad = (kernel_size - 1) * dilation
        self.conv = nn.Conv1d(channels, channels, kernel_size,
                               padding=pad, dilation=dilation)
        self.chomp = pad  # remove future samples
        self.relu = nn.ReLU()
    def forward(self, x):
        out = self.conv(x)[:, :, :-self.chomp] if self.chomp > 0 else self.conv(x)
        return self.relu(out) + x

Example: TCN vs LSTM: When to Use Which

Compare TCN and LSTM on a sequence prediction task.

Sequence Model Comparison

ModelParallelisableMemoryTraining SpeedBest For
Vanilla RNNNoShort-termSlowSimple short sequences
LSTMNoLong-termSlowVariable-length, long dependencies
GRUNoLong-termMediumFewer params than LSTM
TCNYesFixed receptive fieldFastFixed-length, parallel training
TransformerYesAttention-basedFastLong sequences (Ch30)