Variance Reduction Techniques
Why Variance Reduction Can Save Days of Simulation Time
At , a standard Monte Carlo simulation needs trials for 10% relative precision. Variance reduction techniques can reduce this by factors of 10-1000, cutting simulation time from days to minutes. The price is additional mathematical complexity in designing the estimator β but the payoff is enormous for low-BER simulations.
Definition: Importance Sampling
Importance Sampling
Instead of sampling from the original distribution , draw from a proposal distribution that concentrates samples in the important region:
The IS estimator is:
where is the likelihood ratio (importance weight).
# IS for BPSK BER: shift noise mean toward decision boundary
mu_shift = np.sqrt(2 * ebno) # optimal shift
noise_is = mu_shift + rng.standard_normal(N)
weights = np.exp(-mu_shift * noise_is + mu_shift**2 / 2)
errors_is = (noise_is > np.sqrt(2 * ebno))
ber_is = np.mean(errors_is * weights)
The key challenge is choosing . A poor choice can increase variance (even to infinity). For AWGN BER estimation, the optimal choice is to shift the noise mean to the decision boundary.
Definition: Optimal Importance Sampling Distribution
Optimal Importance Sampling Distribution
The zero-variance IS estimator uses the proposal:
This is impractical (requires knowing ), but it guides the design of good proposals. For BER estimation:
- The error event is where is the decision boundary
- The optimal shift concentrates samples near
- A practical choice: shift the noise mean by (exponential tilting)
Definition: Antithetic Variates
Antithetic Variates
The antithetic variates technique uses negatively correlated pairs to reduce variance:
where and is the antithetic sample. If is monotonic, , reducing the variance.
u = rng.uniform(size=N)
x1 = norm.ppf(u) # original samples
x2 = norm.ppf(1 - u) # antithetic samples
estimate = 0.5 * (g(x1) + g(x2))
theta_hat = np.mean(estimate)
Antithetic variates is free in terms of computational cost β you use the same random numbers twice. The variance reduction is typically 2-5x for smooth functions.
Definition: Control Variates
Control Variates
If is a random variable with known mean , the control variate estimator for is:
The optimal coefficient is , giving variance reduction factor:
where is the correlation between and .
# Control variate: use SNR as control
g_samples = compute_ber_per_block(...)
z_samples = compute_snr_per_block(...)
mu_z = theoretical_snr
c_star = np.cov(g_samples, z_samples)[0,1] / np.var(z_samples)
theta_cv = np.mean(g_samples) - c_star * (np.mean(z_samples) - mu_z)
Definition: Stratified Sampling
Stratified Sampling
Divide the sample space into non-overlapping strata with probabilities . Sample points from each stratum:
Stratified sampling always reduces variance compared to simple random sampling (it removes the between-strata variance component).
For BER simulation: stratify on the channel realization to ensure coverage of both deep fades and strong channels.
Theorem: Importance Sampling Variance
The IS estimator with proposal has variance:
The variance is minimized when . A poor choice of (thin tails relative to ) can make the IS variance larger than crude Monte Carlo.
If does not cover the tails of where is nonzero, some samples will have extremely large weights , causing high variance. The "weight degeneracy" problem is the main practical challenge of IS.
Theorem: Antithetic Variates Variance Reduction
For antithetic variates with monotonic:
Since for monotonic :
The factor-of-2 comes for free; additional reduction depends on the strength of the negative correlation.
When gives a high value of , tends to give a low value (and vice versa). The pair average is less variable than two independent samples.
Theorem: Exponential Tilting for Rare-Event BER
For estimating where and , the optimal IS shift is . The IS variance satisfies:
This is exponentially smaller than the crude MC variance for large .
By shifting the noise distribution so errors become likely events, IS converts a rare-event problem into an ordinary estimation problem. The importance weights correct for the biased sampling.
Example: Importance Sampling for Low-BER BPSK
Estimate the BER of BPSK at dB using importance sampling with only 10000 samples, and compare with crude MC.
Setup
import numpy as np
from scipy.special import erfc
ebno_db = 12.0
ebno = 10 ** (ebno_db / 10)
d = np.sqrt(2 * ebno) # decision boundary distance
ber_theory = 0.5 * erfc(np.sqrt(ebno))
print(f"Theory: {ber_theory:.6e}")
Crude Monte Carlo
rng = np.random.default_rng(42)
N = 10000
noise = rng.standard_normal(N)
errors_mc = np.sum(noise > d)
ber_mc = errors_mc / N
print(f"Crude MC: {ber_mc:.6e} ({errors_mc} errors)")
Importance sampling with optimal shift
mu_is = d # shift to decision boundary
noise_is = mu_is + rng.standard_normal(N)
weights = np.exp(-mu_is * noise_is + mu_is**2 / 2)
errors_is = (noise_is > d).astype(float)
ber_is = np.mean(errors_is * weights)
print(f"IS estimate: {ber_is:.6e}")
print(f"Relative error: {abs(ber_is-ber_theory)/ber_theory:.2%}")
Example: Antithetic Variates for BER Estimation
Apply antithetic variates to reduce BER estimation variance.
Implementation
import numpy as np
from scipy.stats import norm
rng = np.random.default_rng(42)
ebno_db = 8.0
ebno = 10 ** (ebno_db / 10)
d = np.sqrt(2 * ebno)
N = 50000
# Standard MC
u = rng.uniform(size=N)
noise = norm.ppf(u)
ber_mc = np.mean(noise > d)
# Antithetic: use (u, 1-u) pairs
u_half = rng.uniform(size=N//2)
noise1 = norm.ppf(u_half)
noise2 = norm.ppf(1 - u_half)
ber_av = np.mean(
0.5 * ((noise1 > d).astype(float) + (noise2 > d).astype(float))
)
print(f"MC: {ber_mc:.6e}")
print(f"AV: {ber_av:.6e}")
print(f"Theory: {0.5*erfc(np.sqrt(ebno)):.6e}")
Example: Control Variates for Fading Channel BER
Use the channel power as a control variate when estimating BER over a Rayleigh fading channel.
Implementation
import numpy as np
rng = np.random.default_rng(42)
N = 100000
ebno_db = 15.0
ebno = 10 ** (ebno_db / 10)
# Rayleigh channel + BPSK
h = (rng.standard_normal(N)
+ 1j * rng.standard_normal(N)) / np.sqrt(2)
gamma = np.abs(h)**2 * ebno # instantaneous SNR
noise = rng.standard_normal(N)
error_indicator = (noise > np.sqrt(2 * gamma)).astype(float)
# Control variate: |h|^2 with known mean E[|h|^2] = 1
z = np.abs(h)**2
mu_z = 1.0
c_star = np.cov(error_indicator, z)[0,1] / np.var(z)
ber_cv = np.mean(error_indicator) \
- c_star * (np.mean(z) - mu_z)
print(f"MC: {np.mean(error_indicator):.6e}")
print(f"CV: {ber_cv:.6e}")
Variance Reduction Comparison
Compare crude Monte Carlo, importance sampling, and antithetic variates for BPSK BER estimation. See how variance reduction improves estimation accuracy at high SNR.
Parameters
Variance Reduction Techniques
# Code from: ch09/python/variance_reduction.py
# Load from backend supplements endpointQuick Check
In importance sampling for BER, what happens if the proposal distribution has thinner tails than the original ?
The estimator becomes biased
The variance may become infinite
The estimator converges faster
No effect β all proposals give the same variance
When g has thinner tails, some likelihood ratios f/g become extremely large, leading to high or infinite variance.
Common Mistake: Importance Sampling with Wrong Tail Behavior
Mistake:
Using a proposal distribution that does not cover the tails of . For example, using a uniform proposal for estimating Gaussian tail probabilities β the weights explode in the tails.
Correction:
Ensure wherever , and use exponential tilting (shift the mean) rather than arbitrary proposals. Monitor the effective sample size .
Key Takeaway
For BER below , always consider variance reduction. Importance sampling with exponential tilting is the most powerful technique (exponential speedup). Antithetic variates are free and give a reliable 2x improvement. Control variates work well when a correlated quantity with known mean is available.
Why This Matters: Importance Sampling in 5G Simulation
5G NR targets BLER of for URLLC (Ultra-Reliable Low Latency Communication). Simulating this with crude MC requires codewords β prohibitive for complex LDPC/Polar coded systems. Industry link-level simulators use importance sampling to speed up low-BLER verification by 100-1000x.
Variance Reduction Methods Comparison
| Method | Variance Reduction | Complexity | Best For |
|---|---|---|---|
| Crude MC | 1x (baseline) | None | Moderate BER () |
| Antithetic Variates | Up to 2x | Minimal | Any smooth estimator |
| Control Variates | x | Need known-mean correlate | When good control available |
| Importance Sampling | Up to x+ | Proposal design | Rare events, low BER |
| Stratified Sampling | Moderate | Stratum design | Heterogeneous domains |
Historical Note: Importance Sampling Origins
1950s-1980sImportance sampling was introduced by Herman Kahn and Andy Marshall at RAND Corporation in 1953 for neutron transport simulations. The technique was independently developed for telecommunications by Jeruchim, Balaban, and Shanmugan in the 1980s, who applied it to BER estimation of digital communication systems with rare error events.
Advanced Importance Sampling
# Code from: ch09/python/importance_sampling.py
# Load from backend supplements endpointImportance Sampling
A variance reduction technique that samples from a proposal distribution and corrects with likelihood ratios .
Related: Likelihood Ratio
Likelihood Ratio
The importance weight , the ratio of the original density to the proposal density.
Related: Importance Sampling
Antithetic Variates
A variance reduction technique using negatively correlated sample pairs to reduce estimator variance.
Control Variate
A random variable with known mean used to reduce the variance of a Monte Carlo estimator via correlation.