Ferkans — Interactive Telecom Tutor

ex-ch22-01

Easy

Compute the Marzetta-Hochwald non-coherent pre-log for $n_t = 2$ , $n_r = 4$ , $T = 6$ . Compare to the coherent pre-log.

Show Hint

$n_t^* = \min(n_t, n_r, \lfloor T/2 \rfloor)$ .

Solution

Effective stream count

$n_t^* = \min(2, 4, 3) = 2$ .

Pre-log

$n_t^*(1 - n_t^*/T) = 2(1 - 2/6) = 2 \cdot 2/3 \approx 1.33$ .

Coherent comparison

Coherent pre-log = $\min(n_t, n_r) = 2$ . Non-coherent loses 33% of the high-SNR degrees of freedom.

ex-ch22-02

Easy

A URLLC system operates at 10 dB SNR, $n = 300$ blocklength, target BLER $\epsilon = 10^{-6}$ . Use the Polyanskiy normal approximation to estimate the maximum achievable rate.

Show Hint

$V \approx \rho(\rho+2)/(2(\rho+1)^2)$ in nats; convert to bits.

$Q^{-1}(10^{-6}) \approx 4.75$ .

Solution

Capacity

$\rho = 10$ ; $C = \frac{1}{2}\log_2(1 + 10) = 1.73$ bits/use.

Dispersion

$V = 10 \cdot 12 / (2 \cdot 121) \cdot (\log_2 e)^2 = 0.496 \cdot 2.081 = 1.03$ .

Polyanskiy rate

$R^* \approx 1.73 - \sqrt{1.03/300} \cdot 4.75 = 1.73 - 0.278 = 1.45$ bits/use. 16% below Shannon.

ex-ch22-03

Easy

For the GN model with $P_{\rm ASE} = 10^{-5}$ mW and nonlinear efficiency $\eta_{\rm NL} = 10^{-3}$ mW $^{-2}$ , compute the optimal launch power $P_{\rm opt}$ and peak SNR.

Show Hint

$P_{\rm opt} = (P_{\rm ASE}/(2\eta_{\rm NL}))^{1/3}$ .

Solution

Optimal power

$P_{\rm opt} = (10^{-5}/(2 \cdot 10^{-3}))^{1/3} = (5 \cdot 10^{-3})^{1/3} \approx 0.171$ mW $\approx -7.7$ dBm.

Peak SNR

Plug into formula: $\mathrm{SNR}^* = P_{\rm opt}/(3 P_{\rm ASE}/2) \approx 11{,}400$ ( $\sim 40.6$ dB).

Rate per polarisation

$\log_2(1 + 11400) \approx 13.5$ bits/symbol.

ex-ch22-04

Medium

An autoencoder trained on a Rapp HPA with smoothness $p = 2$ , IBO 3 dB delivers a 0.8 dB coding gain. When deployed on a real HPA with $p = 1.5$ and IBO 2 dB, experimental measurements show the gain drops to 0.2 dB. Explain.

Show Hint

The training distribution and deployment distribution differ.

Solution

Distribution shift

The learned constellation geometry is optimised for the specific training HPA parameters. Different $p$ and IBO change the distortion function, so the compressed constellation is no longer matched.

Why 0.2 dB survives

The autoencoder learned to REDUCE outer points — a general principle that still helps on any nonlinear HPA, but with smaller gain when the specific nonlinearity doesn't match.

Remediation

Train with domain randomisation (sample $p \in [1.2, 2.5]$ and IBO $\in [1.5, 3.5]$ dB during training) to make the learned constellation robust across the deployment envelope.

ex-ch22-05

Medium

Derive the dispersion formula $V(\rho) = \rho(\rho+2)/(2(\rho+1)^2) (\log_2 e)^2$ for the real AWGN channel.

Show Hint

Information density per use: $i(X;Y) = \log f(Y|X)/f(Y)$ .

Compute $\mathrm{Var}(i)$ at capacity-achieving Gaussian input.

Solution

Capacity-achieving input

$X \sim \mathcal{N}(0, \rho)$ , $Y = X + Z$ , $Z \sim \mathcal{N}(0, 1)$ . $f(Y|X) = \phi(Y - X; 0, 1)$ , $f(Y) = \phi(Y; 0, 1+\rho)$ .

Information density

$i(X;Y) = \log f(Y|X) - \log f(Y) = -\frac{1}{2}\log(1) - \frac{(Y-X)^2}{2} + \frac{1}{2}\log(1+\rho) + \frac{Y^2}{2(1+\rho)}$ .

Variance

Direct computation of $\mathrm{Var}(i(X;Y))$ using $X, Z$ Gaussian: $V = \rho(\rho + 2) / (2(1+\rho)^2)$ (in nats). Multiply by $(\log_2 e)^2$ to convert to bits $^2$ .

ex-ch22-06

Medium

Compare the capacity of a $2 \times 2$ MIMO system: coherent (perfect CSI) vs non-coherent at $T = 4$ .

Show Hint

Coherent pre-log = 2.

Non-coherent needs $T \ge 2 n_t$ ; here $T = 2 n_t$ exactly.

Solution

Effective streams

$n_t^* = \min(2, 2, \lfloor 4/2 \rfloor) = 2$ .

Pre-log

Non-coherent: $2 (1 - 2/4) = 1$ . Coherent: $2$ .

Interpretation

Half the DoF are lost for $T = 2 n_t$ . To recover the coherent rate, the system would need longer coherence time ( $T \gg 2 n_t$ ).

ex-ch22-07

Medium

Show that the Polyanskiy normal approximation is tight at $n \to \infty$ but loose at small $n$ . Explicitly, give a non-trivial lower bound on $R^*(n, \epsilon)$ valid at moderate $n$ .

Show Hint

Look up the meta-converse bound (Polyanskiy 2010, Thm. 27).

Solution

Asymptotic tightness

The normal approximation matches the meta-converse upper bound up to $O(\log n/n)$ . At $n \to \infty$ : $R^*(n, \epsilon) = C - \sqrt{V/n} Q^{-1}(\epsilon) + \frac{1}{2n}\log n + O(1/n)$ .

Moderate-$n$ looseness

At $n = 100$ , the $O(\log n/n)$ term is $\approx 0.023$ bits/use — comparable to the SNR-dependent $\sqrt{V/n} Q^{-1}$ term. More accurate bounds (Polyanskiy-Wang 2010 " $\kappa\beta$ achievability") retain this correction.

Take-away

For rigorous URLLC design, use the explicit meta-converse bound or a tight saddle-point approximation. Normal approx is quick but can be off by 0.1-0.2 bits/use.

ex-ch22-08

Medium

Explain why the optical fibre channel has no classical Shannon-type capacity theorem, despite being a physical channel with noise and bandwidth.

Show Hint

The channel is nonlinear (Kerr effect) and has memory (chromatic dispersion).

Solution

Shannon's assumptions

Shannon's AWGN theorem assumes LINEAR channel with MEMORYLESS Gaussian noise. Kerr nonlinearity violates linearity; dispersion violates memorylessness.

GN model as linearisation

The GN model (Essiambre 2010) approximates the nonlinearity as ADDITIVE Gaussian noise with power proportional to $P^3$ . Under this approximation, Shannon's $\log(1 + \mathrm{SNR}_{\rm eff})$ applies — but the effective SNR PEAKS and then decreases.

Beyond GN

At higher launch powers, the GN model fails: nonlinear interference is NOT Gaussian, and capacity is likely higher than GN predicts (digital back-propagation exploits this). The true capacity is an open research question.

ex-ch22-09

Hard

An autoencoder is trained on AWGN with 16 messages, 2 channel uses, and SNR 5 dB. After training, it is deployed on the same channel at SNR 10 dB. Would you expect the BER to improve, stay the same, or worsen relative to a hand-designed 16-QAM?

Show Hint

The autoencoder learned a constellation specific to its training SNR.

Solution

What was learned

At 5 dB training SNR, the autoencoder may learn a constellation with SLIGHTLY reduced minimum distance (at low SNR, fewer points in dense regions matter).

At 10 dB deployment

Hand-designed 16-QAM benefits from the 5 dB SNR boost fully. The autoencoder's trained-at-low-SNR constellation may have suboptimal minimum distance, so the BER gain at 10 dB is less than at 5 dB.

Net effect

Typically: the autoencoder's AWGN gain DIMINISHES as the deployment SNR diverges from the training SNR. For best robustness, train over a range of SNRs. This is "SNR-robust autoencoder training" (Cammerer et al. 2019).

ex-ch22-10

Hard

Prove that for the non-coherent block-fading MIMO channel with $T = n_t + n_r$ , the Zheng-Tse (2002) non-coherent DMT equals the coherent DMT.

Show Hint

Zheng-Tse prove that the 'CSI penalty' vanishes when $T \ge n_t + n_r$ .

Solution

Coherent DMT (Ch 12)

$d^*(r) = (n_t - r)(n_r - r)$ for integer $r \in [0, \min(n_t, n_r)]$ .

Non-coherent at large $T$

Zheng-Tse (2002) show that for $T \ge n_t + n_r - 1$ , the non-coherent DMT matches the coherent DMT: no penalty.

Intuition

When the coherence block is long enough, the receiver can "estimate" the channel essentially for free by sacrificing $n_t$ of the $T$ slots to pilots. The remaining $T - n_t$ slots achieve the coherent rate. If $T \ge n_t + n_r$ , the pilot overhead becomes negligible in the high-SNR DMT exponent. $\blacksquare$

ex-ch22-11

Hard

For 800G coherent optical links at 128 GBaud with 64-QAM PAS shaping, estimate the reach on SMF-28e fibre assuming $P_{\rm ASE} = 10^{-5}$ mW/span and $\eta_{\rm NL} = 10^{-3}$ mW $^{-2}$ /km.

Show Hint

Scale $\eta_{\rm NL}$ and $P_{\rm ASE}$ by span count $N_{\rm span}$ .

Solution

Per-span values

Assume 80 km/span. Full-link $\eta_{\rm NL}^{\rm full} = N_{\rm span} \cdot \eta_{\rm NL}$ , $P_{\rm ASE}^{\rm full} = N_{\rm span} \cdot P_{\rm ASE}$ .

Target rate

PAS-shaped 64-QAM: ~5.5 bits/symbol/pol. Dual-pol: 11 bits/symbol. Need $\mathrm{SNR}^* = 2^{5.5} - 1 \approx 44$ at peak.

Solve for $N_{\rm span}$

$P_{\rm opt} = (P_{\rm ASE}^{\rm full}/(2\eta_{\rm NL}^{\rm full}))^{1/3}$ is INDEPENDENT of $N_{\rm span}$ (both scale linearly). But $\mathrm{SNR}^* = P_{\rm opt}/(3 P_{\rm ASE}^{\rm full}/2) \propto 1/N_{\rm span}^{2/3}$ .

Reach estimate

Target $\mathrm{SNR}^* = 44$ requires $N_{\rm span} \approx [11400/44]^{3/2} = 260^{3/2} \approx 4200$ . But each span is 80 km, so reach $\approx 80$ km (1 span) — 800G is metro only under this crude model. (Real systems with DBP reach 200-400 km.)

ex-ch22-12

Hard

An autoencoder trained with MSE loss vs cross-entropy loss will learn different encoders. Explain the difference and which is preferred for communication.

Show Hint

MSE optimises the WAVEFORM; CE optimises the BIT PROBABILITIES.

Solution

MSE loss

Minimising $\|\hat{s} - s\|^2$ over the encoder/decoder pair produces Gaussian-like signalling — close to Shannon's capacity but SOFT bit decisions instead of hard detection.

Cross-entropy loss

Minimising $-\sum_i p_i \log q_i$ over symbols gives a DISCRETE constellation (because one-hot labels guide the output toward argmax behaviour). The learned constellation resembles QAM in shape with Gray labelling.

Which is better?

For COMMUNICATION with a hard-decision output, cross-entropy gives the "right" geometry (discrete constellation with Gray). For SOFT-decoded channel coding, MSE can be competitive. In practice, CE is preferred.

ex-ch22-13

Hard

An old proverb in information theory says that "every joint input- output constraint either adds a $\log n$ penalty or cuts a constant off the pre-log." Apply this to the non-coherent model: compared to a fully-coherent CSI-known system, what is the COST of not knowing the channel?

Show Hint

Compare the non-coherent pre-log $n_t^*(1 - n_t^*/T)$ with coherent $n_t^*$ .

Solution

Pre-log gap

The pre-log gap is $n_t^* \cdot n_t^*/T = (n_t^*)^2/T$ — a loss of $(n_t^*)^2/T$ bits/channel use per $\log\text{SNR}$ at high SNR.

Constant cost

In addition, non-coherent decoding incurs a $O(1)$ offset in the log-rate due to the random decoder not knowing which Grassmannian direction the signal takes.

Total cost of non-coherence

$\DeltaC = (n_t^*)^2/T \cdot \log\text{SNR} + O(1)$ . As $T \to \infty$ , this vanishes — non-coherent converges to coherent. For small $T$ , the penalty is significant.

ex-ch22-14

Hard

The book's golden thread — that every chapter establishes a code design criterion and a construction — fails in Ch 22. Why? What structure would a future Ch 22 use to be a proper design chapter?

Show Hint

Open problems don't yet have design criteria.

Solution

Open problems lack definitive design criteria

For non-coherent STC, finite-blocklength URLLC, autoencoder codes, and optical fibre, the design criteria are still being formulated. We have BOUNDS (Polyanskiy, Marzetta-Hochwald, nonlinear Shannon peak) but not CONSTRUCTIVE theorems.

What a future 'design' chapter would look like

A mature design chapter for each area would include: (1) a formal design criterion (like rank+det for STC); (2) an explicit construction (like CDA for DMT); (3) a deployed example. Today we have (1) in some areas, (2) partially, and (3) only for PAS in optical.

Take-away

Ch 22 is a research survey BECAUSE the field is still being built. The book ends here precisely where the reader would need to contribute to close the loop.

ex-ch22-15

Challenge

Open research: suggest a specific research direction that combines two of the book's landmark results (e.g., CDA codes of Ch 13 and PAS of Ch 19) and would be a natural PhD thesis topic.

Show Hint

Think about gaps in the current literature.

Solution

Example direction

PROBABILISTICALLY-SHAPED CDA CODES: extend Ch 13's CDA framework (DMT-optimal, non-vanishing determinant) to support probabilistic input distributions (Ch 19 PAS). The key question: does the NVD property extend when symbol probabilities are non-uniform?

Why it's PhD-scale

Requires: (a) algebraic analysis of CDA codeword determinants under non-uniform symbol distributions; (b) analogue of the approximate-universality theorem for shaped inputs; (c) finite- SNR performance comparisons with hand-designed shaped constellations.

Practical impact

A shaped CDA code would unify the MIMO and single-carrier paradigms: DMT-optimal (like Golden code) + near-capacity (like PAS). This would redefine MCS design for high-SNR MIMO (Wi-Fi 7, 5G mmWave, optical PAS).

Alternative directions

Other natural combinations: (i) LAST codes for URLLC (short blocks + DMT-optimality); (ii) autoencoder-learned CDA initialisation (neural search over the CDA manifold); (iii) ARQ- LAST for non-terrestrial networks (long RTT URLLC + MIMO).

Exercises

ex-ch22-01

Effective stream count

Pre-log

Coherent comparison

ex-ch22-02

Capacity

Dispersion

Polyanskiy rate

ex-ch22-03

Optimal power

Peak SNR

Rate per polarisation

ex-ch22-04

Distribution shift

Why 0.2 dB survives

Remediation

ex-ch22-05

Capacity-achieving input

Information density

Variance

ex-ch22-06

Effective streams

Pre-log

Interpretation

ex-ch22-07

Asymptotic tightness

Moderate-$n$ looseness

Take-away

ex-ch22-08

Shannon's assumptions

GN model as linearisation

Beyond GN

ex-ch22-09

What was learned

At 10 dB deployment

Net effect

ex-ch22-10

Coherent DMT (Ch 12)

Non-coherent at large $T$

Intuition

ex-ch22-11

Per-span values

Target rate

Solve for $N_{\rm span}$

Reach estimate

ex-ch22-12

MSE loss

Cross-entropy loss

Which is better?

ex-ch22-13

Pre-log gap

Constant cost

Total cost of non-coherence

ex-ch22-14

Open problems lack definitive design criteria

What a future 'design' chapter would look like

Take-away

ex-ch22-15

Example direction

Why it's PhD-scale

Practical impact

Alternative directions