Chapter Summary

Chapter Summary

Key Points

  • 1.

    The API paradigm is message-based. Send structured messages (system, user, assistant), get text back. Token-based pricing means prompt design directly affects cost. Use streaming for interactive applications, structured output for reliable parsing.

  • 2.

    Prompt engineering is systematic. Use system messages for role/constraints, few-shot examples for task specification, and chain-of-thought for complex reasoning. Most gains come from the first 3-5 examples.

  • 3.

    Tool use enables LLM agents. LLMs can call external functions for computation, data retrieval, and actions. Each tool call has a reliability cost — verify results at every step.

  • 4.

    RAG grounds LLMs in domain knowledge. Chunk documents into 200-500 tokens, embed with sentence transformers, retrieve top-k by cosine similarity. RAG is simpler and more flexible than fine-tuning for factual knowledge.

  • 5.

    Local models provide privacy and zero marginal cost. Quantized 7-8B models run on consumer GPUs. Use Ollama for quick setup, vLLM for production serving, HuggingFace for full control.

Looking Ahead

Chapter 37 covers fine-tuning and training LLMs when prompting alone is insufficient — LoRA, nanoGPT, instruction tuning, and multimodal models.