Chapter Summary

Key Points

1.
LoRA fine-tuning is the practical default. It modifies only 0.1% of parameters with rank- $r$ updates $\Delta\mathbf{W} = \mathbf{B}\mathbf{A}$ , achieving near-full fine-tuning quality. QLoRA further reduces memory via 4-bit quantization of base weights.
2.
Data quality beats quantity for fine-tuning. 1000-5000 carefully curated instruction-response pairs suffice for most domain adaptation tasks. Always validate on a held-out set.
3.
Training from scratch teaches the full pipeline. nanoGPT-style implementations demystify LLMs: data preparation, model architecture, training loop with gradient clipping and cosine LR.
4.
Instruction tuning creates chat models. Compute loss only on response tokens, use proper chat templates, and evaluate on diverse benchmarks.
5.
Multimodal models combine vision and language. Vision encoders project image features into the LLM's embedding space for joint reasoning.

Looking Ahead

Chapter 38 applies these LLM capabilities to telecommunications and imaging research: code generation, literature review, system design, and semantic communication.

Multimodal Models Exercises