Ferkans — Interactive Telecom Tutor

Why the Gaussian Channel?

Every wireless, wired, and optical communication system ultimately faces additive noise. The central limit theorem tells us that the aggregate of many small independent disturbances converges to a Gaussian distribution — and this is precisely what we observe in practice: thermal noise in receivers, shot noise in photodetectors, and aggregate interference in dense networks are all well-modeled as Gaussian.

The Gaussian channel is therefore not just a mathematical convenience — it is the canonical model for communication under noise. Its capacity formula, $C = \frac{1}{2}\log(1 + \text{SNR})$ , is arguably the single most important equation in all of communication theory. It tells every engineer, before writing a single line of code, the absolute limit of what is achievable.

Definition:
The Scalar AWGN Channel

The additive white Gaussian noise (AWGN) channel is defined by

$Y_i = X_i + Z_{i}, \quad i = 1, 2, \ldots, n,$

where:

$X_i \in \mathbb{R}$ is the channel input at time $i$ ,
$Z_{i} \sim \mathcal{N}(0, N)$ are i.i.d. Gaussian noise samples, independent of the input,
$Y_i \in \mathbb{R}$ is the channel output.

The encoder maps a message $m \in \{1, \ldots, 2^{nR}\}$ to a codeword $\mathbf{x}(m) = (x_1(m), \ldots, x_n(m))$ subject to the average power constraint:

$\frac{1}{n} \sum_{i=1}^n x_i^2(m) \leq P \quad \text{for all } m.$

AWGN channel

Additive White Gaussian Noise channel: $Y = X + Z$ with $Z \sim \mathcal{N}(0, N)$ i.i.d. and an average power constraint on the input. The most fundamental continuous-alphabet channel model in information theory.

Signal-to-noise ratio (SNR)

The ratio of average signal power to noise power: $\text{SNR} = P/N$ . In decibels, $\text{SNR}_{\text{dB}} = 10 \log_{10}(P/N)$ .

Related: AWGN channel

Why an Average Power Constraint?

In practice, transmitters have a finite energy budget (battery, power amplifier limits). The average power constraint $\frac{1}{n}\sum x_i^2 \leq P$ models the total energy per symbol. We could also impose a peak power constraint $|x_i| \leq A$ , but the average constraint is more natural for information-theoretic analysis and yields cleaner results. The peak constraint leads to a harder optimization — the capacity-achieving input distribution becomes discrete (Smith, 1971) rather than Gaussian.

Theorem: Capacity of the Scalar AWGN Channel

The capacity of the AWGN channel $Y = X + Z$ with $Z \sim \mathcal{N}(0, N)$ and average power constraint $\mathbb{E}[X^2] \leq P$ is

$C = \frac{1}{2}\log\bigl(1 + \text{SNR}\bigr) \quad \text{bits per channel use},$

where $\text{SNR} = P/N$ . The capacity-achieving input distribution is $X \sim \mathcal{N}(0, P)$ .

The formula says that doubling the SNR buys you roughly one extra bit per channel use (at high SNR). Intuitively, the Gaussian input spreads energy as "evenly" as possible across the signal space — any other distribution with the same power produces less entropy at the output, hence less mutual information.

Proof

Achievability — upper bound on $\ntn{hd}(Y)$

We compute $I(X;Y) = h(Y) - h(Y|X)$ and maximize over $p_X$ .

Since $Z$ is independent of $X$ , we have $h(Y|X) = h(Z) = \frac{1}{2}\log(2\pi e N)$ .

For the output entropy, note that $\text{Var}(Y) = \mathbb{E}[X^2] + N \leq P + N$ . By the Gaussian entropy maximizer (Theorem 2.X), we have $h(Y) \leq \frac{1}{2}\log(2\pi e(P + N))$ , with equality if and only if $Y$ is Gaussian — which happens when $X \sim \mathcal{N}(0, P)$ .

Achievability — capacity expression

Combining the two terms:

$C = \max_{p_X:\, \mathbb{E}[X^2] \leq P} I(X;Y) = \frac{1}{2}\log(2\pi e(P+N)) - \frac{1}{2}\log(2\pi e N) = \frac{1}{2}\log\!\left(1 + \frac{P}{N}\right).$

The maximum is achieved by $X \sim \mathcal{N}(0, P)$ , confirming that the Gaussian input is optimal.

Converse — Fano's inequality approach

For any $(2^{nR}, n)$ code with vanishing error probability, Fano's inequality gives

$nR \leq I(X^n; Y^n) + n\epsilon_n$

where $\epsilon_n \to 0$ . Since the channel is memoryless:

$I(X^n; Y^n) = h(Y^n) - h(Y^n|X^n) = h(Y^n) - \sum_{i=1}^n h(Z_{i}) = h(Y^n) - \frac{n}{2}\log(2\pi e N).$

Converse — bounding $\ntn{hd}(Y^n)$

By the independence bound and the Gaussian maximizer:

$h(Y^n) \leq \sum_{i=1}^n h(Y_i) \leq \sum_{i=1}^n \frac{1}{2}\log(2\pi e(\mathbb{E}[X_i^2] + N)).$

By the concavity of $\log$ and the power constraint $\frac{1}{n}\sum_i \mathbb{E}[X_i^2] \leq P$ , Jensen's inequality yields

$\frac{1}{n}h(Y^n) \leq \frac{1}{2}\log(2\pi e(P + N)).$

Dividing by $n$ and letting $n \to \infty$ : $R \leq \frac{1}{2}\log(1 + P/N)$ .

,

Key Takeaway

The AWGN capacity $C = \frac{1}{2}\log(1 + \text{SNR})$ is the single most important formula in communication theory. The Gaussian input is optimal because it maximizes the output entropy under a power constraint — a direct consequence of the entropy maximization property of the Gaussian distribution.

Historical Note: Shannon's 1948 Paper and the Gaussian Channel

Shannon derived the Gaussian channel capacity in his landmark 1948 paper "A Mathematical Theory of Communication." What is remarkable is that Shannon not only gave the formula but also proved both achievability (via random coding with Gaussian codebooks) and the converse — all in the same paper that invented the field.

The result was initially met with skepticism: how could one transmit reliably at any positive rate over a noisy channel? The key insight was that coding over long blocks concentrates the noise around a thin shell, and the number of distinguishable signal spheres grows exponentially with the block length. It took nearly 50 years for practical codes (turbo codes, LDPC codes) to approach Shannon's limit within a fraction of a dB.

The Sphere-Packing Picture

There is a beautiful geometric interpretation of the AWGN capacity. Consider transmission of $n$ symbols:

The codeword $\mathbf{x}(m)$ has energy $\|\mathbf{x}(m)\|^2 \leq nP$ , so all codewords lie in a sphere of radius $\sqrt{nP}$ in $\mathbb{R}^n$ .
The noise vector $Z^{n}$ concentrates (with high probability) on a thin shell of radius $\approx \sqrt{nN}$ .
The received vector $\mathbf{y} = \mathbf{x}(m) + Z^{n}$ lies in a sphere of radius $\approx \sqrt{n(P+N)}$ .

For reliable decoding, the "noise spheres" centered at different codewords must not overlap. The number of non-overlapping noise spheres that fit is

$\frac{\text{Vol}(\text{received sphere})}{\text{Vol}(\text{noise sphere})} \approx \left(\frac{P+N}{N}\right)^{n/2} = 2^{\frac{n}{2}\log(1 + P/N)}.$

Taking $\frac{1}{n}\log$ of this count gives exactly $C = \frac{1}{2}\log(1 + \text{SNR})$ .

Sphere-Packing Interpretation of AWGN Capacity

Codewords (blue dots) packed inside a signal sphere, each surrounded by a noise sphere. The capacity counts how many non-overlapping noise spheres fit inside the received sphere. Watch the packing ratio match the Shannon formula

C = \frac{1}{2}\log(1 + \text{SNR})

.

Definition:
The Complex AWGN Channel

The complex AWGN channel models passband communication via complex baseband:

$Y = gX + Z, \quad Z \sim \mathcal{CN}(0, N_0),$

where $g \in \mathbb{C}$ is the (known, deterministic) channel gain and the power constraint is $\mathbb{E}[|X|^2] \leq E_s$ .

The capacity is

$C(\text{SNR}) = \log(1 + \text{SNR}) \quad \text{bits per complex symbol},$

where $\text{SNR} = |g|^2 E_s / N_0$ .

The factor-of-two difference from the real case ( $\frac{1}{2}\log$ vs. $\log$ ) arises because each complex symbol carries two real dimensions.

In wireless communications, the standard convention uses the complex model with $\text{SNR} = |g|^2 E_s/N_0$ . The capacity in bits/s is $C = W \log(1 + \text{SNR})$ , where $W$ is the bandwidth in Hz.

Example: Computing AWGN Capacity

A wireless link operates at $\text{SNR} = 20$ dB with bandwidth $W = 10$ MHz. What is the maximum achievable data rate?

Solution

Convert SNR to linear

$\text{SNR} = 10^{20/10} = 100$ .

Compute capacity per symbol

$C = \log_2(1 + 100) = \log_2(101) \approx 6.66$ bits per complex channel use.

Compute capacity in bits/s

The symbol rate equals the bandwidth for Nyquist signaling, so

$C_{\text{bits/s}} = W \cdot \log_2(1 + \text{SNR}) = 10 \times 10^6 \times 6.66 \approx 66.6 \text{ Mbit/s}.$

Example: Required SNR for a Target Rate

What minimum $\text{SNR}$ (in dB) is needed to achieve a spectral efficiency of $R = 4$ bits/s/Hz on a complex AWGN channel?

Solution

Set up the equation

We need $R \leq \log_2(1 + \text{SNR})$ , so $\text{SNR} \geq 2^{R} - 1 = 2^4 - 1 = 15$ .

Convert to dB

$\text{SNR}_{\text{dB}} = 10\log_{10}(15) \approx 11.76$ dB.

The point is that 4 bits/s/Hz requires roughly 12 dB of SNR — a useful rule of thumb for link budget calculations.

AWGN Channel Capacity vs. SNR

Explore how the AWGN capacity grows logarithmically with $\text{SNR}$ . At low SNR, capacity grows approximately linearly; at high SNR, each 3 dB increase adds roughly 1 bit/s/Hz.

Parameters

\text{SNR}_{\max}

(dB)30

Quick Check

For an AWGN channel with $\text{SNR} = 7$ (linear, not dB), what is the capacity in bits per real channel use?

$\frac{1}{2}\log_2(8) = 1.5$ bits

$\log_2(8) = 3$ bits

$\frac{1}{2}\log_2(7) \approx 1.40$ bits

$\log_2(7) \approx 2.81$ bits

Correction:

\frac{1}{2}\log_2(8) = 1.5

bits

The real AWGN capacity is $C = \frac{1}{2}\log_2(1 + \text{SNR}) = \frac{1}{2}\log_2(8) = 1.5$ bits per real channel use. The factor $\frac{1}{2}$ distinguishes the real channel from the complex channel.

Common Mistake: Real vs. Complex AWGN: The Factor of Two

Mistake:

Using $C = \log_2(1 + \text{SNR})$ for a real-valued AWGN channel (or $\frac{1}{2}\log_2(1+\text{SNR})$ for a complex channel).

Correction:

Real AWGN: $C = \frac{1}{2}\log_2(1 + \text{SNR})$ bits per real symbol. Complex AWGN: $C = \log_2(1 + \text{SNR})$ bits per complex symbol. The complex channel has two real dimensions, hence the factor of two. When computing bits/s, both give $C_{\text{bits/s}} = W\log_2(1 + \text{SNR})$ because the complex symbol rate is half the real sample rate.

Common Mistake: SNR in dB vs. Linear

Mistake:

Plugging $\text{SNR}_{\text{dB}}$ directly into the capacity formula: $C = \frac{1}{2}\log(1 + 20)$ when $\text{SNR} = 20$ dB.

Correction:

Always convert to linear scale first: $\text{SNR} = 10^{20/10} = 100$ . Then $C = \frac{1}{2}\log(1 + 100) \approx 3.34$ bits (real) — very different from $\frac{1}{2}\log(21) \approx 2.18$ bits.

Theorem: MMSE Lower Bound via Differential Entropy

For jointly distributed continuous random variables $X$ and $Y$ , the minimum mean-square error (MMSE) satisfies

$\mathbb{E}\bigl[|X - \hat{x}(Y)|^2\bigr] \geq \frac{1}{2\pi e}\, 2^{2h(X|Y)},$

where $\hat{x}(Y) = \mathbb{E}[X|Y]$ is the optimal estimator. Equality holds if and only if $X$ given $Y$ is Gaussian.

This theorem connects two seemingly different worlds: estimation theory (MMSE) and information theory (entropy). It says that the conditional entropy places a fundamental lower bound on how well you can estimate $X$ from $Y$ . The Gaussian case is the "hardest to estimate" for a given entropy — any other conditional distribution with the same entropy is easier to estimate.

Proof

Best estimator is conditional mean

By standard estimation theory, $\mathbb{E}[|X - \hat{x}(Y)|^2] \geq \mathbb{E}[|X - \mathbb{E}[X|Y]|^2]$ for any estimator $\hat{x}(Y)$ .

Iterated expectation

The MMSE equals $\mathbb{E}[\text{Var}(X|Y)] = \mathbb{E}[\sigma_X^2(Y)]$ .

Apply the Gaussian entropy maximizer conditionally

For each $y$ , the Gaussian with variance $\sigma_X^2(y)$ maximizes entropy. Therefore $h(X|Y=y) \leq \frac{1}{2}\log(2\pi e\, \sigma_X^2(y))$ , which gives $\sigma_X^2(y) \geq \frac{1}{2\pi e}\, 2^{2h(X|Y=y)}$ .

Average over $Y$ and apply Jensen's inequality

Taking expectations and using the convexity of $t \mapsto 2^t$ :

$\text{MMSE} = \mathbb{E}[\sigma_X^2(Y)] \geq \frac{1}{2\pi e}\, \mathbb{E}\!\left[2^{2h(X|Y=Y)}\right] \geq \frac{1}{2\pi e}\, 2^{2h(X|Y)}.$

Example: Verifying the MMSE Bound for Jointly Gaussian Variables

Let $(X, Y)$ be jointly Gaussian with zero mean, $\text{Var}(X) = \sigma_x^2$ , $\text{Var}(Y) = \sigma_y^2$ , and correlation coefficient $\rho$ . Verify that the MMSE lower bound holds with equality.

Solution

Compute the MMSE

For jointly Gaussian variables, the MMSE estimator is linear: $\hat{x}(Y) = \rho \frac{\sigma_x}{\sigma_y} Y$ . The MMSE is $\sigma_x^2(1 - \rho^2)$ .

Compute the conditional entropy

$X|Y \sim \mathcal{N}(\rho \frac{\sigma_x}{\sigma_y} Y,\; \sigma_x^2(1-\rho^2))$ , so $h(X|Y) = \frac{1}{2}\log(2\pi e \sigma_x^2(1 - \rho^2))$ .

Verify equality

$\frac{1}{2\pi e} 2^{2h(X|Y)} = \frac{1}{2\pi e} \cdot 2\pi e \sigma_x^2(1 - \rho^2) = \sigma_x^2(1 - \rho^2) = \text{MMSE}. \quad \checkmark$

Equality holds because $X|Y$ is Gaussian.

Why This Matters: AWGN Capacity and Spectral Efficiency in 5G NR

The AWGN capacity formula $C = W\log_2(1 + \text{SNR})$ is the benchmark against which every practical modulation and coding scheme is measured. In 5G NR, adaptive modulation and coding (AMC) selects the highest-rate modulation-coding scheme (MCS) that the current $\text{SNR}$ can support. The Shannon limit tells us how close each MCS comes to the theoretical maximum. Modern LDPC and polar codes in 5G NR operate within 1–2 dB of the AWGN capacity for long block lengths.

See Book telecom, Ch. 14 for the detailed treatment of AMC and link adaptation, and Book telecom, Ch. 12 for the coding schemes that approach this limit.

Quick Check

If we double the transmit power $P$ (keeping noise $N$ fixed), by how much does the AWGN capacity increase?

It exactly doubles

It increases by $\frac{1}{2}$ bit per real channel use

It increases by $\frac{1}{2}\log_2(2P/N) - \frac{1}{2}\log_2(P/N) = \frac{1}{2}\log_2(2) = 0.5$ bits only when $P \gg N$

It does not change

Correction:

It increases by

\frac{1}{2}\log_2(2P/N) - \frac{1}{2}\log_2(P/N) = \frac{1}{2}\log_2(2) = 0.5

bits only when

P \gg N

At high SNR where $P \gg N$ , $C \approx \frac{1}{2}\log_2(P/N)$ , so doubling $P$ adds $\frac{1}{2}\log_2(2) = 0.5$ bits. At low SNR, the gain from doubling power is larger. The logarithmic growth of capacity with power is a fundamental feature: power is an expensive resource for buying rate.

The AWGN Channel

Why the Gaussian Channel?

Definition: The Scalar AWGN Channel

AWGN channel

Signal-to-noise ratio (SNR)

Why an Average Power Constraint?

Theorem: Capacity of the Scalar AWGN Channel

Achievability — upper bound on $\ntn{hd}(Y)$

Achievability — capacity expression

Converse — Fano's inequality approach

Converse — bounding $\ntn{hd}(Y^n)$

Key Takeaway

Historical Note: Shannon's 1948 Paper and the Gaussian Channel

The Sphere-Packing Picture

Sphere-Packing Interpretation of AWGN Capacity

Sphere-Packing Interpretation of AWGN Capacity

Definition: The Complex AWGN Channel

Example: Computing AWGN Capacity

Convert SNR to linear

Compute capacity per symbol

Compute capacity in bits/s

Example: Required SNR for a Target Rate

Set up the equation

Convert to dB

AWGN Channel Capacity vs. SNR

Parameters

Quick Check

Common Mistake: Real vs. Complex AWGN: The Factor of Two

Common Mistake: SNR in dB vs. Linear

Theorem: MMSE Lower Bound via Differential Entropy

Best estimator is conditional mean

Iterated expectation

Apply the Gaussian entropy maximizer conditionally

Average over $Y$ and apply Jensen's inequality

Example: Verifying the MMSE Bound for Jointly Gaussian Variables

Compute the MMSE

Compute the conditional entropy

Verify equality

Why This Matters: AWGN Capacity and Spectral Efficiency in 5G NR

Quick Check

Definition:
The Scalar AWGN Channel

Definition:
The Complex AWGN Channel