Ferkans — Interactive Telecom Tutor

Why Variance Reduction Can Save Days of Simulation Time

At $P_b = 10^{-6}$ , a standard Monte Carlo simulation needs $\sim 10^8$ trials for 10% relative precision. Variance reduction techniques can reduce this by factors of 10-1000, cutting simulation time from days to minutes. The price is additional mathematical complexity in designing the estimator — but the payoff is enormous for low-BER simulations.

Definition:
Importance Sampling

Instead of sampling from the original distribution $f(x)$ , draw from a proposal distribution $g(x)$ that concentrates samples in the important region:

$\theta = E_f[h(X)] = \int h(x) \frac{f(x)}{g(x)} g(x)\,dx = E_g\!\left[h(X) \frac{f(X)}{g(X)}\right]$

The IS estimator is:

$\hat{\theta}_{\mathrm{IS}} = \frac{1}{N}\sum_{i=1}^{N} h(X_i) w(X_i), \quad w(X_i) = \frac{f(X_i)}{g(X_i)}$

where $w(X_i)$ is the likelihood ratio (importance weight).

# IS for BPSK BER: shift noise mean toward decision boundary
mu_shift = np.sqrt(2 * ebno)  # optimal shift
noise_is = mu_shift + rng.standard_normal(N)
weights = np.exp(-mu_shift * noise_is + mu_shift**2 / 2)
errors_is = (noise_is > np.sqrt(2 * ebno))
ber_is = np.mean(errors_is * weights)

The key challenge is choosing $g(x)$ . A poor choice can increase variance (even to infinity). For AWGN BER estimation, the optimal choice is to shift the noise mean to the decision boundary.

Definition:
Optimal Importance Sampling Distribution

The zero-variance IS estimator uses the proposal:

$g^*(x) = \frac{|h(x)| f(x)}{\int |h(x)| f(x)\,dx} = \frac{|h(x)| f(x)}{\theta}$

This is impractical (requires knowing $\theta$ ), but it guides the design of good proposals. For BER estimation:

The error event is $h(x) = \mathbf{1}[x > d]$ where $d$ is the decision boundary
The optimal shift concentrates samples near $d$
A practical choice: shift the noise mean by $d$ (exponential tilting)

Definition:
Antithetic Variates

The antithetic variates technique uses negatively correlated pairs to reduce variance:

$\hat{\theta}_{\mathrm{AV}} = \frac{1}{2N}\sum_{i=1}^{N} \left[g(U_i) + g(1-U_i)\right]$

where $U_i \sim \mathrm{Uniform}(0,1)$ and $1-U_i$ is the antithetic sample. If $g$ is monotonic, $\mathrm{Cov}[g(U), g(1-U)] < 0$ , reducing the variance.

u = rng.uniform(size=N)
x1 = norm.ppf(u)          # original samples
x2 = norm.ppf(1 - u)      # antithetic samples
estimate = 0.5 * (g(x1) + g(x2))
theta_hat = np.mean(estimate)

Antithetic variates is free in terms of computational cost — you use the same random numbers twice. The variance reduction is typically 2-5x for smooth functions.

Definition:
Control Variates

If $Z$ is a random variable with known mean $E[Z] = \mu_Z$ , the control variate estimator for $\theta = E[g(X)]$ is:

$\hat{\theta}_{\mathrm{CV}} = \frac{1}{N}\sum_{i=1}^N g(X_i) - c\left(\frac{1}{N}\sum_{i=1}^N Z_i - \mu_Z\right)$

The optimal coefficient is $c^* = \mathrm{Cov}[g(X), Z] / \mathrm{Var}[Z]$ , giving variance reduction factor:

$\frac{\mathrm{Var}[\hat{\theta}_{\mathrm{CV}}]}{\mathrm{Var}[\hat{\theta}]} = 1 - \rho_{gZ}^2$

where $\rho_{gZ}$ is the correlation between $g(X)$ and $Z$ .

# Control variate: use SNR as control
g_samples = compute_ber_per_block(...)
z_samples = compute_snr_per_block(...)
mu_z = theoretical_snr
c_star = np.cov(g_samples, z_samples)[0,1] / np.var(z_samples)
theta_cv = np.mean(g_samples) - c_star * (np.mean(z_samples) - mu_z)

Definition:
Stratified Sampling

Divide the sample space into $K$ non-overlapping strata $S_1, \dots, S_K$ with probabilities $p_k = P(X \in S_k)$ . Sample $n_k$ points from each stratum:

$\hat{\theta}_{\mathrm{SS}} = \sum_{k=1}^{K} p_k \cdot \frac{1}{n_k}\sum_{i=1}^{n_k} g(X_{ki})$

Stratified sampling always reduces variance compared to simple random sampling (it removes the between-strata variance component).

For BER simulation: stratify on the channel realization $|h|^2$ to ensure coverage of both deep fades and strong channels.

Theorem: Importance Sampling Variance

The IS estimator with proposal $g$ has variance:

$\mathrm{Var}_g[\hat{\theta}_{\mathrm{IS}}] = \frac{1}{N}\left(E_g\!\left[h^2(X)\frac{f^2(X)}{g^2(X)}\right] - \theta^2\right)$

The variance is minimized when $g(x) \propto |h(x)| f(x)$ . A poor choice of $g$ (thin tails relative to $f$ ) can make the IS variance larger than crude Monte Carlo.

If $g$ does not cover the tails of $f$ where $h$ is nonzero, some samples will have extremely large weights $f/g$ , causing high variance. The "weight degeneracy" problem is the main practical challenge of IS.

Theorem: Antithetic Variates Variance Reduction

For antithetic variates with $g$ monotonic:

$\mathrm{Var}[\hat{\theta}_{\mathrm{AV}}] = \frac{1}{N}\left(\frac{\sigma^2}{2} + \frac{\mathrm{Cov}[g(U), g(1-U)]}{2}\right)$

Since $\mathrm{Cov}[g(U), g(1-U)] \le 0$ for monotonic $g$ :

$\mathrm{Var}[\hat{\theta}_{\mathrm{AV}}] \le \frac{\sigma^2}{2N} \le \mathrm{Var}[\hat{\theta}_{\mathrm{MC}}]$

The factor-of-2 comes for free; additional reduction depends on the strength of the negative correlation.

When $U$ gives a high value of $g$ , $1-U$ tends to give a low value (and vice versa). The pair average is less variable than two independent samples.

Theorem: Exponential Tilting for Rare-Event BER

For estimating $P_b = P(N > d)$ where $N \sim \mathcal{N}(0, 1)$ and $d = \sqrt{2 E_b/N_0}$ , the optimal IS shift is $\mu^* = d$ . The IS variance satisfies:

$\mathrm{Var}[\hat{P}_{b,\mathrm{IS}}] \approx \frac{P_b^2}{N} \left(e^{d^2} P_b^{-2} \cdot \frac{1}{\sqrt{4\pi d^2}} - 1\right)$

This is exponentially smaller than the crude MC variance $P_b(1-P_b)/N \approx P_b/N$ for large $d$ .

By shifting the noise distribution so errors become likely events, IS converts a rare-event problem into an ordinary estimation problem. The importance weights correct for the biased sampling.

Example: Importance Sampling for Low-BER BPSK

Estimate the BER of BPSK at $E_b/N_0 = 12$ dB using importance sampling with only 10000 samples, and compare with crude MC.

Solution

Setup

import numpy as np
from scipy.special import erfc

ebno_db = 12.0
ebno = 10 ** (ebno_db / 10)
d = np.sqrt(2 * ebno)   # decision boundary distance
ber_theory = 0.5 * erfc(np.sqrt(ebno))
print(f"Theory: {ber_theory:.6e}")

Crude Monte Carlo

rng = np.random.default_rng(42)
N = 10000
noise = rng.standard_normal(N)
errors_mc = np.sum(noise > d)
ber_mc = errors_mc / N
print(f"Crude MC: {ber_mc:.6e} ({errors_mc} errors)")

Importance sampling with optimal shift

mu_is = d   # shift to decision boundary
noise_is = mu_is + rng.standard_normal(N)
weights = np.exp(-mu_is * noise_is + mu_is**2 / 2)
errors_is = (noise_is > d).astype(float)
ber_is = np.mean(errors_is * weights)
print(f"IS estimate: {ber_is:.6e}")
print(f"Relative error: {abs(ber_is-ber_theory)/ber_theory:.2%}")

Example: Antithetic Variates for BER Estimation

Apply antithetic variates to reduce BER estimation variance.

Solution

Implementation

import numpy as np
from scipy.stats import norm

rng = np.random.default_rng(42)
ebno_db = 8.0
ebno = 10 ** (ebno_db / 10)
d = np.sqrt(2 * ebno)
N = 50000

# Standard MC
u = rng.uniform(size=N)
noise = norm.ppf(u)
ber_mc = np.mean(noise > d)

# Antithetic: use (u, 1-u) pairs
u_half = rng.uniform(size=N//2)
noise1 = norm.ppf(u_half)
noise2 = norm.ppf(1 - u_half)
ber_av = np.mean(
    0.5 * ((noise1 > d).astype(float) + (noise2 > d).astype(float))
)

print(f"MC:  {ber_mc:.6e}")
print(f"AV:  {ber_av:.6e}")
print(f"Theory: {0.5*erfc(np.sqrt(ebno)):.6e}")

Example: Control Variates for Fading Channel BER

Use the channel power $|h|^2$ as a control variate when estimating BER over a Rayleigh fading channel.

Solution

Implementation

import numpy as np

rng = np.random.default_rng(42)
N = 100000
ebno_db = 15.0
ebno = 10 ** (ebno_db / 10)

# Rayleigh channel + BPSK
h = (rng.standard_normal(N)
     + 1j * rng.standard_normal(N)) / np.sqrt(2)
gamma = np.abs(h)**2 * ebno  # instantaneous SNR
noise = rng.standard_normal(N)
error_indicator = (noise > np.sqrt(2 * gamma)).astype(float)

# Control variate: |h|^2 with known mean E[|h|^2] = 1
z = np.abs(h)**2
mu_z = 1.0
c_star = np.cov(error_indicator, z)[0,1] / np.var(z)
ber_cv = np.mean(error_indicator) \
       - c_star * (np.mean(z) - mu_z)

print(f"MC:  {np.mean(error_indicator):.6e}")
print(f"CV:  {ber_cv:.6e}")

Variance Reduction Comparison

Compare crude Monte Carlo, importance sampling, and antithetic variates for BPSK BER estimation. See how variance reduction improves estimation accuracy at high SNR.

Parameters

Variance Reduction Techniques

python

Importance sampling, antithetic variates, control variates for BER.

# Code from: ch09/python/variance_reduction.py
# Load from backend supplements endpoint

Quick Check

In importance sampling for BER, what happens if the proposal distribution $g(x)$ has thinner tails than the original $f(x)$ ?

The estimator becomes biased

The variance may become infinite

The estimator converges faster

No effect — all proposals give the same variance

Correction:

The variance may become infinite

When g has thinner tails, some likelihood ratios f/g become extremely large, leading to high or infinite variance.

Common Mistake: Importance Sampling with Wrong Tail Behavior

Mistake:

Using a proposal distribution $g(x)$ that does not cover the tails of $f(x) \cdot h(x)$ . For example, using a uniform proposal for estimating Gaussian tail probabilities — the weights $f/g$ explode in the tails.

Correction:

Ensure $g(x) > 0$ wherever $f(x) h(x) \neq 0$ , and use exponential tilting (shift the mean) rather than arbitrary proposals. Monitor the effective sample size $N_{\mathrm{eff}} = (\sum w_i)^2 / \sum w_i^2$ .

Key Takeaway

For BER below $10^{-4}$ , always consider variance reduction. Importance sampling with exponential tilting is the most powerful technique (exponential speedup). Antithetic variates are free and give a reliable 2x improvement. Control variates work well when a correlated quantity with known mean is available.

Why This Matters: Importance Sampling in 5G Simulation

5G NR targets BLER of $10^{-5}$ for URLLC (Ultra-Reliable Low Latency Communication). Simulating this with crude MC requires $\sim 10^7$ codewords — prohibitive for complex LDPC/Polar coded systems. Industry link-level simulators use importance sampling to speed up low-BLER verification by 100-1000x.

Variance Reduction Methods Comparison

Method	Variance Reduction	Complexity	Best For
Crude MC	1x (baseline)	None	Moderate BER ( $>10^{-3}$ )
Antithetic Variates	Up to 2x	Minimal	Any smooth estimator
Control Variates	$(1-\rho^2)^{-1}$ x	Need known-mean correlate	When good control available
Importance Sampling	Up to $10^3$ x+	Proposal design	Rare events, low BER
Stratified Sampling	Moderate	Stratum design	Heterogeneous domains

Historical Note: Importance Sampling Origins

1950s-1980s

Importance sampling was introduced by Herman Kahn and Andy Marshall at RAND Corporation in 1953 for neutron transport simulations. The technique was independently developed for telecommunications by Jeruchim, Balaban, and Shanmugan in the 1980s, who applied it to BER estimation of digital communication systems with rare error events.

Advanced Importance Sampling

python

Optimal IS for BPSK/QPSK, adaptive IS, effective sample size monitoring.

# Code from: ch09/python/importance_sampling.py
# Load from backend supplements endpoint

Importance Sampling

A variance reduction technique that samples from a proposal distribution $g(x)$ and corrects with likelihood ratios $f(x)/g(x)$ .

Related: Likelihood Ratio

Likelihood Ratio

The importance weight $w(x) = f(x)/g(x)$ , the ratio of the original density to the proposal density.

Related: Importance Sampling

Antithetic Variates

A variance reduction technique using negatively correlated sample pairs $(U, 1-U)$ to reduce estimator variance.

Control Variate

A random variable with known mean used to reduce the variance of a Monte Carlo estimator via correlation.

Variance Reduction Techniques

Why Variance Reduction Can Save Days of Simulation Time

Definition: Importance Sampling

Definition: Optimal Importance Sampling Distribution

Definition: Antithetic Variates

Definition: Control Variates

Definition: Stratified Sampling

Theorem: Importance Sampling Variance

Theorem: Antithetic Variates Variance Reduction

Theorem: Exponential Tilting for Rare-Event BER

Example: Importance Sampling for Low-BER BPSK

Setup

Crude Monte Carlo

Importance sampling with optimal shift

Example: Antithetic Variates for BER Estimation

Implementation

Example: Control Variates for Fading Channel BER

Implementation

Variance Reduction Comparison

Parameters

Variance Reduction Techniques

Quick Check

Common Mistake: Importance Sampling with Wrong Tail Behavior

Key Takeaway

Why This Matters: Importance Sampling in 5G Simulation

Variance Reduction Methods Comparison

Historical Note: Importance Sampling Origins

Advanced Importance Sampling

Importance Sampling

Likelihood Ratio

Antithetic Variates

Control Variate

Definition:
Importance Sampling

Definition:
Optimal Importance Sampling Distribution

Definition:
Antithetic Variates

Definition:
Control Variates

Definition:
Stratified Sampling