Ferkans — Interactive Telecom Tutor

Definition:
Variational Autoencoder

A VAE consists of:

Encoder $q_\phi(\mathbf{z}|\mathbf{x})$ : maps input to latent distribution
Decoder $p_\theta(\mathbf{x}|\mathbf{z})$ : generates from latent
Loss (ELBO): $L = \mathbb{E}_{q}[\log p_\theta(\mathbf{x}|\mathbf{z})] - D_{\text{KL}}(q_\phi(\mathbf{z}|\mathbf{x}) \| p(\mathbf{z}))$

The reparameterisation trick: $\mathbf{z} = \boldsymbol{\mu} + \boldsymbol{\sigma} \odot \boldsymbol{\varepsilon}$ , $\boldsymbol{\varepsilon} \sim \mathcal{N}(0, I)$ .

Definition:
KL Divergence for Gaussians

For $q = \mathcal{N}(\boldsymbol{\mu}, \text{diag}(\boldsymbol{\sigma}^2))$ and $p = \mathcal{N}(0, I)$ :

$D_{\text{KL}} = -\frac{1}{2}\sum_{j=1}^{d}(1 + \log\sigma_j^2 - \mu_j^2 - \sigma_j^2)$

Definition:
Reparameterisation Trick

Instead of sampling $\mathbf{z} \sim q_\phi$ , sample $\boldsymbol{\varepsilon} \sim \mathcal{N}(0,I)$ and compute $\mathbf{z} = \boldsymbol{\mu} + \boldsymbol{\sigma} \odot \boldsymbol{\varepsilon}$ . This makes the sampling differentiable with respect to $\phi$ .

Definition:
Evidence Lower Bound (ELBO)

The ELBO is a lower bound on the log-likelihood:

$\log p(\mathbf{x}) \ge \mathbb{E}_{q_\phi(\mathbf{z}|\mathbf{x})}[\log p_\theta(\mathbf{x}|\mathbf{z})] - D_{\text{KL}}(q_\phi(\mathbf{z}|\mathbf{x}) \| p(\mathbf{z}))$

The first term is reconstruction quality; the second is latent regularisation.

Definition:
Beta-VAE

Multiply the KL term by $\beta > 1$ to encourage more disentangled latent representations:

$L_{\beta} = \text{Recon} + \beta \cdot D_{\text{KL}}$

Theorem: ELBO Derivation

Starting from Jensen's inequality applied to $\log p(\mathbf{x}) = \log \int p(\mathbf{x}|\mathbf{z})p(\mathbf{z})d\mathbf{z}$ :

$\log p(\mathbf{x}) = \log \int \frac{p(\mathbf{x}|\mathbf{z})p(\mathbf{z})}{q(\mathbf{z}|\mathbf{x})} q(\mathbf{z}|\mathbf{x}) d\mathbf{z} \ge \int q(\mathbf{z}|\mathbf{x}) \log \frac{p(\mathbf{x}|\mathbf{z})p(\mathbf{z})}{q(\mathbf{z}|\mathbf{x})} d\mathbf{z}$

The gap equals $D_{\text{KL}}(q \| p(\mathbf{z}|\mathbf{x})) \ge 0$ .

Maximising the ELBO simultaneously improves reconstruction and makes the approximate posterior closer to the true posterior.

Example: Implementing a VAE

Build a VAE for 28x28 grayscale images.

Solution

Implementation

class VAE(nn.Module):
    def __init__(self, latent_dim=16):
        super().__init__()
        self.encoder = nn.Sequential(nn.Flatten(),
            nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 128), nn.ReLU())
        self.fc_mu = nn.Linear(128, latent_dim)
        self.fc_logvar = nn.Linear(128, latent_dim)
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 128), nn.ReLU(),
            nn.Linear(128, 256), nn.ReLU(), nn.Linear(256, 784), nn.Sigmoid())

    def encode(self, x):
        h = self.encoder(x)
        return self.fc_mu(h), self.fc_logvar(h)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        return mu + std * torch.randn_like(std)

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decoder(z).view_as(x), mu, logvar

Example: VAE Loss Function

Implement the ELBO loss.

Solution

Implementation

def vae_loss(x_recon, x, mu, logvar):
    recon = F.binary_cross_entropy(x_recon, x, reduction='sum')
    kl = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return recon + kl

Example: Latent Space Interpolation

Interpolate between two images in the latent space.

Solution

Approach

Encode both images to get $\mathbf{z}_1, \mathbf{z}_2$ . Generate $\mathbf{z}_\alpha = (1-\alpha)\mathbf{z}_1 + \alpha \mathbf{z}_2$ for $\alpha \in [0, 1]$ and decode each $\mathbf{z}_\alpha$ .

VAE Latent Space Explorer

Explore the 2D latent space of a trained VAE.

Parameters

KL vs Reconstruction Trade-off

See how beta affects the KL-reconstruction balance.

Parameters

Generative Model Taxonomy — VAE, GAN, Diffusion, and Flow models with their key differences.

VAE Architecture — Encoder maps to latent distribution, reparameterisation trick enables gradient flow, decoder generates.

Quick Check

Why is the reparameterisation trick needed in VAEs?

To make sampling faster

To make the sampling operation differentiable with respect to encoder parameters

To reduce the latent dimension

Correction:

To make the sampling operation differentiable with respect to encoder parameters

Quick Check

What does the KL term in the VAE loss encourage?

Better reconstruction

The approximate posterior to be close to the prior (regularisation)

Faster training

Correction:

The approximate posterior to be close to the prior (regularisation)

Quick Check

In beta-VAE with beta > 1, what happens?

Better reconstruction quality

More disentangled latent space but blurrier reconstructions

Faster convergence

Correction:

More disentangled latent space but blurrier reconstructions

Common Mistake: KL Vanishing (Posterior Collapse)

Mistake:

The KL term drops to zero and the decoder ignores the latent code.

Correction:

Use KL annealing (warm up beta from 0 to 1), free bits, or cyclic annealing.

Common Mistake: BCE Loss Without Sigmoid

Mistake:

Using BCELoss on decoder output without constraining to [0,1].

Correction:

Add sigmoid to the last decoder layer, or use BCEWithLogitsLoss.

Common Mistake: Predicting sigma Instead of log(sigma^2)

Mistake:

Predicting sigma directly, which requires softplus to ensure positivity.

Correction:

Predict log(sigma^2) and use exp(0.5*logvar) for std. More numerically stable.

Key Takeaway

VAEs provide a principled probabilistic framework for generation. The ELBO balances reconstruction and regularisation. The reparameterisation trick makes training end-to-end differentiable.

Key Takeaway

Generative models learn to sample from the data distribution. VAEs are simple but produce blurry samples. GANs are sharp but unstable. Diffusion models offer the best quality but are slow.

Why This Matters: VAEs for Channel Model Generation

VAEs can learn to generate realistic wireless channel realisations from measured data. The latent space captures channel parameters (delay spread, angular spread) in a continuous representation, enabling interpolation between channel conditions.

Historical Note: VAE: Probabilistic Deep Learning

2013

Kingma and Welling introduced the VAE in 2013, unifying variational inference with deep learning. The reparameterisation trick was the key insight enabling backpropagation through stochastic layers.

Historical Note: GANs: Adversarial Training

2014

Goodfellow et al. introduced GANs in 2014, training a generator against a discriminator. The resulting min-max game produces sharp samples but is notoriously difficult to train.

VAE

Variational Autoencoder: generative model that learns a latent space via variational inference.

ELBO

Evidence Lower Bound: the objective maximised in VAE training. Lower bound on log-likelihood.

KL Divergence

Kullback-Leibler divergence: measures how one distribution differs from another.

Diffusion Model

Generative model that learns to reverse a gradual noising process.

GAN

Generative Adversarial Network: generator and discriminator trained in a min-max game.

Generative Model Comparison

Model	Training	Sample Quality	Diversity	Speed
VAE	Stable (ELBO)	Blurry	High	Fast
GAN	Unstable (adversarial)	Sharp	Mode collapse risk	Fast
Diffusion (DDPM)	Stable (denoising)	Best	High	Slow (iterative)
Flow Matching	Stable (ODE)	High	High	Medium

Variational Autoencoder (VAE)

Definition: Variational Autoencoder

Definition: KL Divergence for Gaussians

Definition: Reparameterisation Trick

Definition: Evidence Lower Bound (ELBO)

Definition: Beta-VAE

Theorem: ELBO Derivation

Example: Implementing a VAE

Implementation

Example: VAE Loss Function

Implementation

Example: Latent Space Interpolation

Approach

VAE Latent Space Explorer

Parameters

KL vs Reconstruction Trade-off

Parameters

Generative Model Taxonomy

VAE Architecture

Quick Check

Quick Check

Quick Check

Common Mistake: KL Vanishing (Posterior Collapse)

Common Mistake: BCE Loss Without Sigmoid

Common Mistake: Predicting sigma Instead of log(sigma^2)

Key Takeaway

Key Takeaway

Why This Matters: VAEs for Channel Model Generation

Historical Note: VAE: Probabilistic Deep Learning

Historical Note: GANs: Adversarial Training

VAE

ELBO

KL Divergence

Diffusion Model

GAN

Generative Model Comparison

Definition:
Variational Autoencoder

Definition:
KL Divergence for Gaussians

Definition:
Reparameterisation Trick

Definition:
Evidence Lower Bound (ELBO)

Definition:
Beta-VAE