Ferkans — Interactive Telecom Tutor

ex-ch19-01

Easy

Compute the Maxwell-Boltzmann distribution for 16-QAM (4-PAM per axis with amplitudes $\{-3, -1, +1, +3\}$ ) at $\lambda = 0.2$ . What is the entropy of the marginal amplitude distribution?

Show Hint

$P_A(a) \propto e^{-\lambda a^2}$ on $\{-3,-1,1,3\}$ .

Solution

Unnormalised probabilities

$e^{-\lambda \cdot 9} \approx 0.165$ (for $a = \pm 3$ ). $e^{-\lambda \cdot 1} \approx 0.819$ (for $a = \pm 1$ ).

Normalisation

Sum = $2(0.165 + 0.819) = 1.968$ . Probabilities: $P_A(\pm 3) = 0.165/1.968 = 0.084$ ; $P_A(\pm 1) = 0.819/1.968 = 0.416$ .

Entropy

$H(P_A) = -2(0.084 \log_2 0.084) - 2(0.416 \log_2 0.416) \approx 0.601 + 1.053 = 1.654$ bits (vs 2 bits uniform).

ex-ch19-02

Easy

Using the PAS rate formula $R_{\rm PAS} = R_c (H(P_A) + \log_2 \sqrt{M}) + R_c - 1$ , compute the PAS rate for 64-QAM with LDPC rate $R_c = 0.75$ and $H(P_A) = 2.5$ bits (per-axis amplitude entropy).

Show Hint

For 64-QAM, $\log_2 \sqrt{M} = 3$ .

Solution

Plug in

$R_{\rm PAS} = 0.75 \cdot (2.5 + 3) + 0.75 - 1 = 4.125 + (-0.25) = 3.875$ bits/symbol.

Interpretation

Vs uniform 64-QAM at rate 0.75: $R = 0.75 \cdot 6 + (-0.25) = 4.25$ . Shaped rate is lower, but requires less SNR to achieve.

ex-ch19-03

Easy

Compute the CCDM rate loss at $n = 500$ for an 8-PAM (8-ary amplitude) target with entropy $H(P_A) = 2.8$ bits.

Show Hint

Rate loss is $\frac{|\mathcal{A}|-1}{2n}\log_2 n$ .

Solution

Loss formula

$\Delta R = \frac{7}{2 \cdot 500}\log_2 500 = 7/1000 \cdot 8.97 = 0.063$ bits/symbol.

As fraction of entropy

$0.063/2.8 = 2.2\%$ — modest loss at moderate $n$ .

ex-ch19-04

Medium

A 400ZR coherent optical link uses 16-QAM with LDPC rate $R_c = 15/16$ . Estimate the PAS rate adaptation range by varying $\lambda$ from 0 to very large.

Show Hint

At $\lambda = 0$ : uniform QAM. As $\lambda \to \infty$ : single-point.

Solution

Max rate

$\lambda = 0$ : $H(P_A) = \log_2 \sqrt{M} = 2$ . Rate $= 0.9375 \cdot (2 + 2) + 0.9375 - 1 = 3.75 - 0.0625 = 3.6875$ bits/symbol/pol.

Min rate

$\lambda \to \infty$ : $H(P_A) \to 0$ . Rate $= 0.9375 \cdot 2 + (-0.0625) = 1.8125$ bits/symbol/pol.

Range

PAS sweeps rate from 1.81 to 3.69 bits/symbol/pol continuously — about a 2x span on a single (M, R) pair.

ex-ch19-05

Medium

Prove the maximum-entropy theorem: the distribution $p(x) \propto e^{-\lambda x^2}$ maximises $H(X)$ subject to the constraints $\mathbb{E}[X^2] = \bar{E}$ and $\sum_x p(x) = 1$ .

Show Hint

Lagrangian with two constraints: normalisation and power.

Solution

Lagrangian

$\mathcal{L} = -\sum_x p(x)\log p(x) - \mu_1(\sum p(x) - 1) - \mu_2(\sum x^2 p(x) - \bar{E})$ .

Derivative

$\frac{\partial\mathcal{L}}{\partial p(x)} = -\log p(x) - 1 - \mu_1 - \mu_2 x^2 = 0$ , giving $p(x) \propto e^{-\mu_2 x^2}$ .

Identification

$\mu_2 = \lambda$ (scaling by the power constraint). This is the Maxwell-Boltzmann distribution. $\blacksquare$

ex-ch19-06

Medium

Describe why the SIGN bits in PAS do not need shaping, while the AMPLITUDE bits do.

Show Hint

MB distribution is symmetric: $P_A(+|a|) = P_A(-|a|)$ .

Solution

Symmetry of MB

The Maxwell-Boltzmann distribution $e^{-\lambda |x|^2}$ depends only on $|x|$ — it is symmetric around zero. So the sign of each shaped symbol is ALREADY uniform under MB.

Practical advantage

This means the sign bits can be PRODUCED directly by the LDPC systematic output (which is uniform) without any shaping. Only the amplitudes need the CCDM. This is the key PAS insight: shaping only the amplitudes = FEC-compatibility.

ex-ch19-07

Medium

A system uses 256-QAM (16-PAM per axis) with $\lambda = 0.04$ shaping. Compute the per-axis amplitude entropy $H(P_A)$ and the shaping gain.

Show Hint

Amplitudes: $\{-15, -13, ..., +15\}$ (8 positive values scaled).

Solution

Probabilities

$p(|a|) \propto e^{-0.04 a^2}$ . Compute for $|a| \in \{1, 3, 5, \ldots, 15\}$ : $e^{-0.04}, e^{-0.36}, e^{-1.0}, e^{-1.96}, ...$ $\approx 0.961, 0.698, 0.368, 0.141, 0.041, 0.009, 0.0015, 0.0002$ . Sum = 2.22. Pairs (for $\pm a$ ): divide by $2 \cdot 2.22$ . Each amplitude has $P_A(\pm 1) \approx 0.216, P_A(\pm 3) \approx 0.157, P_A(\pm 5) \approx 0.083, \ldots$ .

Entropy

$H(P_A) \approx -\sum P \log_2 P \approx 3.1$ bits (vs 4 bits uniform for 16-PAM).

Shaping gain

Uniform per-axis entropy = 4 bits; shaped = 3.1 bits. The gap of 0.9 bits/axis corresponds to roughly $0.9 \cdot 6 = 5.4$ dB of rate reduction, but the SNR reduction (shaping gain) is much smaller — about 1.2 dB for this parameter choice, as measured numerically.

ex-ch19-08

Medium

Compare PAS and discrete MCS adaptation: at 18 dB SNR, a system can either use 64-QAM rate 3/4 (4.5 bits/symbol) or 16-QAM rate 5/6 (3.33 bits/symbol). What continuous PAS rate could achieve the same BER performance at 18 dB?

Show Hint

PAS enables continuous rates between MCS points.

Solution

Discrete gap

Between 3.33 and 4.5 bits/symbol, there is a 1.17 bit/symbol quantisation gap at the MCS boundary.

PAS continuous value

With 64-QAM + $R_c = 5/6$ and a PAS-tuned $H(P_A)$ , the system can achieve any rate between $5/6 \cdot 2 - 1/6 = 1.5$ and $5/6 \cdot 6 - 1/6 = 4.83$ bits/symbol. In particular, a rate of 3.90 bits/symbol is achievable continuously.

System implication

The MCS gap (3.33 → 4.5) is wasteful — a link that could operate at 3.9 bits/symbol is forced to 3.33. PAS recovers this efficiency.

ex-ch19-09

Hard

Prove that geometric shaping and probabilistic shaping achieve the same mutual information for the SAME set of constellation points and marginal amplitude distribution — i.e., the two approaches are INFORMATION-EQUIVALENT.

Show Hint

MI depends on the JOINT distribution of (x, y), not on which side is modified.

Solution

Channel-MI formulation

$I(X; Y) = H(Y) - H(Y|X)$ for a channel $Y = f(X) + W$ . The shaping affects $H(Y)$ through the marginal of $X$ — but only via the DISTRIBUTION of $Y$ induced by the shaping.

Equivalence

If scheme A (PS) places uniform input on non-uniform point locations and scheme B (GS) places non-uniform input on a uniform grid, and BOTH induce the same marginal distribution of $Y$ , then $I_A(X;Y) = I_B(X;Y)$ exactly.

Practical caveat

In practice the two schemes LOOK different in the encoder but are information-theoretically identical. Hence the "asymptotic equivalence" in Thm. 1 of §4. The BER difference between the two comes from the RECEIVER processing, not from the encoder side. $\blacksquare$

ex-ch19-10

Hard

The CCDM rate loss formula $O(\log n/n)$ is for CONSTANT composition. Show that the rate loss of a HIERARCHICAL DM (multi-stage tree of constant-composition blocks) is smaller at moderate $n$ .

Show Hint

Hierarchical DM splits the sequence into smaller sub-blocks.

Solution

Hierarchical structure

Split the $n$ -symbol output into $K$ sub-blocks of length $n/K$ . Apply CCDM separately to each, with target distribution targeted at the sub-block level. The overall rate is close to $H(P_A)$ but with smaller finite- $n$ losses.

Rate loss analysis

Per sub-block, the CCDM rate loss is $O(\log(n/K)/(n/K))$ . Summed over $K$ sub-blocks: total loss $\approx K \cdot (\log(n/K))/(n/K) \cdot (1/K)$ $= (\log(n/K))/(n/K)$ .

Trade-off

As $K$ grows, each sub-block is shorter and the local rate loss per sub-block grows like $\log(n/K)/(n/K)$ . The optimal $K$ scales as $\log n / \log\log n$ . Modern implementations use $K \approx 100-1000$ at $n \sim 10^4$ .

ex-ch19-11

Hard

An autoencoder trained on an AWGN channel at SNR 10 dB is likely to learn a constellation similar to probabilistic-shaped 16-QAM. Why? What does this suggest about the fundamental uniqueness of shaping solutions?

Show Hint

Autoencoder loss (cross-entropy) is equivalent to MI maximisation.

Solution

Loss-MI equivalence

The cross-entropy loss $-\mathbb{E}\log p(s|Y)$ equals $H(S) - I(S;Y)$ , which is minimised when $I(S;Y)$ is maximised — the same objective as capacity.

Solution convergence

Since MI-maximising constellations on AWGN are MB-shaped QAM (up to unitary rotation), the autoencoder converges to the same solution as analytical PS.

Uniqueness up to rotation

The autoencoder may learn a ROTATED version of MB-shaped QAM because rotation preserves MI. But modulo rotation, the solution is unique. This is a useful sanity check: if the autoencoder doesn't converge to MB on AWGN, the training setup is broken.

ex-ch19-12

Hard

Derive the 1.53 dB asymptotic shaping ceiling $\pi e / 6$ in units of dB using the normalised second moment of a sphere.

Show Hint

The normalised second moment of the $n$ -sphere converges to $1/(2\pi e)$ as $n \to \infty$ .

Solution

Normalised 2nd moment of n-sphere

$G(S_n) \to \frac{1}{2\pi e}$ as $n \to \infty$ (Zador 1996).

Normalised 2nd moment of n-cube

$G(C_n) = 1/12$ for all $n$ (uniform on cube).

Shaping gain

$\gamma_s = G(C_n)/G(S_n) \to (1/12)/(1/(2\pi e)) = 2\pi e / 12 = \pi e / 6$ .

In dB

$\pi e/6 \approx 3.1416 \cdot 2.718/6 \approx 1.423$ . In dB: $10\log_{10}(1.423) = 1.53$ dB. $\blacksquare$

ex-ch19-13

Hard

A common criticism of PAS is that it adds encoder/decoder complexity. Quantify the complexity of CCDM encoding at $n = 1000$ , $|\mathcal{A}| = 8$ .

Show Hint

Arithmetic encoding: $O(n \cdot |\mathcal{A}|)$ operations.

Solution

Operation count

For each of 1000 positions, the CCDM evaluates 8 candidates: $1000 \times 8 = 8000$ operations per block.

Context

At 400 GBaud: 400,000 blocks/second. Total: $400{,}000 \times 8000 = 3.2 \cdot 10^9$ operations/second. Well within modern DSP chip capacity (~ $10^{11}$ ops/s).

Memory

Each block needs a multinomial-lookup table of size $O(\binom{n}{n/2})$ — too large for $n = 1000$ . Practical implementations use floating-point arithmetic coding with $O(n)$ memory per block.

ex-ch19-14

Hard

The PAS architecture requires a SYSTEMATIC LDPC code. What goes wrong if you use a non-systematic turbo code instead?

Show Hint

Non-systematic codes XOR the systematic bits with parity — they don't preserve the shape.

Solution

Non-systematic composition

A non-systematic code outputs $\mathbf{c} = \mathbf{G}\mathbf{u}$ with NO IDENTIFIABLE shaped and unshaped bit components. After passing through the non-systematic encoder, the output bit distribution is approximately UNIFORM regardless of input shape.

Consequence

The BICM mapper then sees uniform input bits and produces uniform QAM output — the shaping is LOST.

Fix

Use a SYSTEMATIC code: its output includes the original shaped bits unchanged, and parity bits are added alongside. The uniformly-distributed parity bits serve as SIGN bits in the PAS architecture.

ex-ch19-15

Challenge

Open research: can PAS be combined with CDA codes (Ch 13) to achieve BOTH the DMT (rank+determinant) and shaping gain simultaneously? Sketch what would need to be proved.

Show Hint

CDA requires uniform input distribution. PAS uses shaped input.

Solution

Problem

CDA codes' non-vanishing-determinant theorem (Ch 13) is proved for UNIFORM inputs. If the input is MB-shaped, does the NVD property still hold?

Conjecture

Likely yes, with a tighter constant in $\delta_{\min}$ , because MB shaping compresses outer points toward the centre — possibly reducing codeword-pair determinants but preserving positivity.

Needs proving

(a) Show the minimum determinant is bounded below by a positive constant under MB input. (b) Characterise the DMT of PAS+CDA — likely still $d^*(r)$ but with modified coding gain. (c) Practical construction: how does the CCDM interact with the CDA codeword construction? This would be a solid PhD thesis topic.

Exercises

ex-ch19-01

Unnormalised probabilities

Normalisation

Entropy

ex-ch19-02

Plug in

Interpretation

ex-ch19-03

Loss formula

As fraction of entropy

ex-ch19-04

Max rate

Min rate

Range

ex-ch19-05

Lagrangian

Derivative

Identification

ex-ch19-06

Symmetry of MB

Practical advantage

ex-ch19-07

Probabilities

Entropy

Shaping gain

ex-ch19-08

Discrete gap

PAS continuous value

System implication

ex-ch19-09

Channel-MI formulation

Equivalence

Practical caveat

ex-ch19-10

Hierarchical structure

Rate loss analysis

Trade-off

ex-ch19-11

Loss-MI equivalence

Solution convergence

Uniqueness up to rotation

ex-ch19-12

Normalised 2nd moment of n-sphere

Normalised 2nd moment of n-cube

Shaping gain

In dB

ex-ch19-13

Operation count

Context

Memory

ex-ch19-14

Non-systematic composition

Consequence

Fix

ex-ch19-15

Problem

Conjecture

Needs proving