Ferkans — Interactive Telecom Tutor

Why Hypothesis Testing Matters for Simulation

When you run a Monte Carlo BER simulation and get $\hat{P}_b = 1.2 \times 10^{-3}$ , how confident are you in that number? Could the true BER be $10^{-4}$ ? Hypothesis testing and confidence intervals answer these questions rigorously. Without them, you cannot distinguish a real performance difference from statistical noise — a critical issue when comparing two receiver algorithms or validating against theory.

Definition:
Hypothesis Test

A hypothesis test consists of:

Null hypothesis $H_0$ : the default assumption (e.g., "the data comes from distribution $F_0$ ")
Alternative hypothesis $H_1$ : what we accept if $H_0$ is rejected
Test statistic $T$ : a function of the data
p-value: $P(T \ge t_{\mathrm{obs}} \mid H_0)$ — the probability of seeing a result at least as extreme under $H_0$
Significance level $\alpha$ : reject $H_0$ if $p < \alpha$ (typically $\alpha = 0.05$ )

from scipy.stats import ttest_1samp
stat, p_value = ttest_1samp(samples, popmean=0.0)
if p_value < 0.05:
    print("Reject H0 at 5% significance level")

Definition:
Type I and Type II Errors

Decision	$H_0$ True	$H_0$ False
Reject $H_0$	Type I error ( $\alpha$ )	Correct (Power $1-\beta$ )
Accept $H_0$	Correct	Type II error ( $\beta$ )

Type I error rate = $\alpha$ = probability of false alarm
Type II error rate = $\beta$ = probability of missed detection
Power = $1 - \beta$ = probability of correctly rejecting a false $H_0$

In simulation: Type I error means claiming an algorithm is better when it is not (false positive); Type II means missing a real improvement.

Definition:
Student's t-test

The one-sample t-test tests whether the population mean equals a hypothesized value $\mu_0$ . The test statistic is:

$t = \frac{\bar{x} - \mu_0}{s / \sqrt{N}}$

where $s$ is the sample standard deviation and $N$ is the sample size. Under $H_0$ , $t \sim t_{N-1}$ (Student's $t$ distribution with $N-1$ degrees of freedom).

The two-sample t-test compares means of two groups:

from scipy.stats import ttest_ind
stat, p = ttest_ind(ber_algorithm_A, ber_algorithm_B)

Definition:
Confidence Interval

A $(1-\alpha)$ confidence interval for parameter $\theta$ is a random interval $[L, U]$ such that:

$P(L \le \theta \le U) = 1 - \alpha$

For the mean of a Gaussian with unknown variance, the CI is:

$\bar{x} \pm t_{\alpha/2, N-1} \cdot \frac{s}{\sqrt{N}}$

from scipy.stats import t
ci_half = t.ppf(1 - alpha/2, df=N-1) * s / np.sqrt(N)
ci = (x_bar - ci_half, x_bar + ci_half)

Definition:
Kolmogorov-Smirnov (KS) Test

The KS test is a nonparametric test for whether data follows a specified distribution. It compares the empirical CDF to the reference:

$D_N = \sup_x |\hat{F}_N(x) - F_0(x)|$

from scipy.stats import kstest, norm
D, p = kstest(data, 'norm', args=(mu_hat, sigma_hat))

Two-sample KS test compares two empirical distributions:

from scipy.stats import ks_2samp
D, p = ks_2samp(data_A, data_B)

The KS test is distribution-free under $H_0$ — the critical values do not depend on the reference distribution. However, if you estimate the parameters from the same data, the $p$ -values are conservative (too large). Use the Lilliefors test for this case.

Definition:
Bootstrap Confidence Intervals

The bootstrap estimates the sampling distribution of a statistic by resampling the data with replacement:

From $N$ observations, draw $B$ bootstrap samples (each of size $N$ , with replacement)
Compute the statistic $\hat{\theta}^{(b)}$ for each bootstrap sample
Use the $\alpha/2$ and $1-\alpha/2$ quantiles of $\{\hat{\theta}^{(b)}\}$ as confidence bounds

rng = np.random.default_rng(42)
B = 10000
boot_stats = np.array([
    np.mean(rng.choice(data, size=len(data), replace=True))
    for _ in range(B)
])
ci = np.percentile(boot_stats, [2.5, 97.5])

The bootstrap is especially useful for statistics without closed-form distributions, like the median BER across fading realizations.

Theorem: Confidence Interval Width Scales as $1/\sqrt{N}$

For a $(1-\alpha)$ confidence interval of the mean, the half-width is:

$w = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{N}}$

To halve the confidence interval width, you need $4\times$ as many samples. For a Monte Carlo BER estimate with target relative precision $\epsilon$ , the required number of trials is:

$N_s \ge \frac{z_{\alpha/2}^2 \cdot (1 - P_b)}{P_b \cdot \epsilon^2} \approx \frac{z_{\alpha/2}^2}{P_b \cdot \epsilon^2}$

The $1/\sqrt{N}$ scaling is a fundamental law of statistics. To estimate a BER of $10^{-6}$ with 10% relative precision at 95% confidence, you need at least $N_s \approx 4 \times 10^8$ trials.

Theorem: Bootstrap Consistency

For a "smooth" statistic $T_N = g(\bar{X}_N)$ with $g$ differentiable:

$\sup_t \left| P^*(T_N^* \le t) - P(T_N \le t) \right| \xrightarrow{a.s.} 0$

where $P^*$ denotes the bootstrap distribution. The bootstrap confidence interval has asymptotically correct coverage.

The empirical distribution converges to the true distribution (Glivenko-Cantelli), and resampling from it mimics the true sampling process.

Theorem: Exact Confidence Interval for BER

Let $k$ errors in $N$ trials. Since $k \sim \mathrm{Binomial}(N, P_b)$ , the Clopper-Pearson exact confidence interval for $P_b$ is:

$\left[ B^{-1}(\alpha/2;\, k,\, N-k+1),\; B^{-1}(1-\alpha/2;\, k+1,\, N-k) \right]$

where $B^{-1}$ is the inverse beta CDF. For large $N$ and moderate $k$ , the normal approximation gives:

$\hat{P}_b \pm z_{\alpha/2} \sqrt{\frac{\hat{P}_b(1-\hat{P}_b)}{N}}$

The BER is a proportion, so its confidence interval comes from the binomial distribution. The Clopper-Pearson interval is conservative (coverage $\ge 1-\alpha$ ); the normal approximation works well when $k \ge 30$ .

Example: Comparing Two Receiver Algorithms via t-test

You run 50 independent BER trials for Algorithm A and Algorithm B. Determine whether there is a statistically significant difference in their average BER at the 5% level.

Solution

Simulate BER data

import numpy as np
from scipy.stats import ttest_ind

rng = np.random.default_rng(42)
ber_A = 1e-3 + 2e-4 * rng.standard_normal(50)
ber_B = 1.2e-3 + 2e-4 * rng.standard_normal(50)

Run t-test

stat, p = ttest_ind(ber_A, ber_B)
print(f"t-statistic: {stat:.4f}")
print(f"p-value: {p:.6f}")
print(f"Significant at 5%: {p < 0.05}")

Interpret

If $p < 0.05$ , we reject $H_0: \mu_A = \mu_B$ and conclude Algorithm B has a significantly different (higher) BER.

Example: Bootstrap Confidence Interval for Median BER

Compute a 95% bootstrap confidence interval for the median BER from 100 Monte Carlo trials.

Solution

Generate data and bootstrap

import numpy as np

rng = np.random.default_rng(42)
ber_trials = 1e-3 * np.exp(0.5 * rng.standard_normal(100))

B = 10000
boot_medians = np.array([
    np.median(rng.choice(ber_trials, size=100, replace=True))
    for _ in range(B)
])
ci = np.percentile(boot_medians, [2.5, 97.5])
print(f"Median BER: {np.median(ber_trials):.4e}")
print(f"95% Bootstrap CI: [{ci[0]:.4e}, {ci[1]:.4e}]")

Example: Verifying Rayleigh Fading with the KS Test

Generate fading samples and use the KS test to verify they follow a Rayleigh distribution.

Solution

Generate and test

from scipy.stats import kstest, rayleigh
import numpy as np

rng = np.random.default_rng(42)
h = (rng.standard_normal(5000)
     + 1j * rng.standard_normal(5000)) / np.sqrt(2)
envelope = np.abs(h)

D, p = kstest(envelope, 'rayleigh')
print(f"KS statistic: {D:.4f}")
print(f"p-value: {p:.4f}")
# p >> 0.05 => cannot reject Rayleigh hypothesis

Hypothesis Test Visualizer

Visualize the null distribution, test statistic, critical region, and p-value for a one-sample t-test. Adjust the true mean and sample size to see how power changes.

Parameters

Hypothesis Test Decision Regions — Anatomy of a two-sided hypothesis test: null distribution, critical values at $\pm z_{\alpha/2}$ , rejection regions (shaded), and the relationship between significance level, p-value, and test statistic.

Hypothesis Testing and Confidence Intervals

python

t-test, KS test, bootstrap confidence intervals, BER CI.

# Code from: ch09/python/hypothesis_testing.py
# Load from backend supplements endpoint

Quick Check

A p-value of 0.03 means:

There is a 3% probability that H0 is true

If H0 is true, there is a 3% chance of getting a test statistic at least as extreme as observed

The experiment has a 3% error rate

We should always reject H0

Correction:

If H0 is true, there is a 3% chance of getting a test statistic at least as extreme as observed

The p-value is P(T >= t_obs | H0), the probability of the observed data (or more extreme) under the null hypothesis.

Quick Check

You need a confidence interval for BER that is half as wide. How many times more Monte Carlo trials do you need?

2x

4x

8x

sqrt(2)x

Correction:

4x

Width proportional to 1/sqrt(N), so halving width requires 4x the samples.

Common Mistake: Multiple Testing Without Correction

Mistake:

Running 20 t-tests at $\alpha = 0.05$ to compare algorithm variants and reporting any significant result. By chance, you expect $20 \times 0.05 = 1$ false positive.

Correction:

Apply the Bonferroni correction ( $\alpha' = \alpha / m$ ) or use scipy.stats.false_discovery_control() for the Benjamini-Hochberg procedure when performing multiple tests.

Key Takeaway

Always report confidence intervals with your Monte Carlo results. A bare BER number without a CI is scientifically meaningless. Use the Clopper-Pearson exact interval for small error counts ( $k < 30$ ) and the normal approximation for large counts.

Key Takeaway

The bootstrap is your universal confidence interval tool. It works for any statistic (median, percentile, ratio of BERs) without requiring closed-form distributions. Use $B \ge 10000$ bootstrap resamples for reliable intervals.

Why This Matters: Statistical Rigor in BER Simulation

In wireless research, the standard practice is to count at least 100 errors before declaring a BER measurement valid. This rule of thumb comes from the $1/\sqrt{N}$ scaling of the CI width: with $k = 100$ errors, the 95% CI is approximately $\hat{P}_b \pm 20\%$ . The exact Clopper-Pearson interval from Theorem 3 makes this precision quantitative.

Statistical Tests for Simulation Validation

Test	Null Hypothesis	Assumptions	scipy Function
One-sample t-test	$\mu = \mu_0$	Normal data or large $N$	`ttest_1samp`
Two-sample t-test	$\mu_A = \mu_B$	Independent samples, normal or large $N$	`ttest_ind`
Paired t-test	$\mu_{A-B} = 0$	Paired observations	`ttest_rel`
KS test	$X \sim F_0$	Continuous distribution	`kstest`
Two-sample KS	$F_A = F_B$	Independent continuous samples	`ks_2samp`
Chi-squared	Observed = Expected	Categorical data, $n_i \ge 5$	`chisquare`

p-value

The probability of observing a test statistic at least as extreme as the one computed, assuming the null hypothesis is true.

Confidence Interval

A random interval $[L, U]$ that contains the true parameter with probability $1-\alpha$ over repeated experiments.

Related: Bootstrap

Bootstrap

A resampling method that estimates the sampling distribution of a statistic by drawing samples with replacement from the observed data.

Related: Confidence Interval

Hypothesis Testing and Confidence Intervals

Why Hypothesis Testing Matters for Simulation

Definition: Hypothesis Test

Definition: Type I and Type II Errors

Definition: Student's t-test

Definition: Confidence Interval

Definition: Kolmogorov-Smirnov (KS) Test

Definition: Bootstrap Confidence Intervals

Theorem: Confidence Interval Width Scales as 1/N1/\sqrt{N}1/N​

Theorem: Bootstrap Consistency

Theorem: Exact Confidence Interval for BER

Example: Comparing Two Receiver Algorithms via t-test

Simulate BER data

Run t-test

Interpret

Example: Bootstrap Confidence Interval for Median BER

Generate data and bootstrap

Example: Verifying Rayleigh Fading with the KS Test

Generate and test

Hypothesis Test Visualizer

Parameters

Hypothesis Test Decision Regions

Hypothesis Testing and Confidence Intervals

Quick Check

Quick Check

Common Mistake: Multiple Testing Without Correction

Key Takeaway

Key Takeaway

Why This Matters: Statistical Rigor in BER Simulation

Statistical Tests for Simulation Validation

p-value

Confidence Interval

Bootstrap

Definition:
Hypothesis Test

Definition:
Type I and Type II Errors

Definition:
Student's t-test

Definition:
Confidence Interval

Definition:
Kolmogorov-Smirnov (KS) Test

Definition:
Bootstrap Confidence Intervals

Theorem: Confidence Interval Width Scales as $1/\sqrt{N}$