Ferkans — Interactive Telecom Tutor

Definition:
Low-Rank Adaptation (LoRA)

LoRA freezes the pre-trained weights $\mathbf{W}_0$ and adds a trainable low-rank decomposition:

$\mathbf{W} = \mathbf{W}_0 + \frac{\alpha}{r} \mathbf{B}\mathbf{A}$

where $\mathbf{A} \in \mathbb{R}^{r \times d}$ and $\mathbf{B} \in \mathbb{R}^{d \times r}$ with rank $r \ll d$ .

Trainable parameters: $2 \cdot r \cdot d$ per layer (vs. $d^2$ for full fine-tuning).

With $r = 16$ and $d = 4096$ , LoRA adds only 0.2% parameters while achieving comparable performance to full fine-tuning.

Definition:
QLoRA: Quantized LoRA

QLoRA combines 4-bit quantization of base weights with LoRA adapters:

Quantize base model to NF4 (4-bit NormalFloat)
Add LoRA adapters in fp16/bf16
Train only the LoRA parameters

This allows fine-tuning a 65B model on a single 48GB GPU.

Definition:
Parameter-Efficient Fine-Tuning (PEFT)

PEFT methods modify only a small fraction of model parameters:

Method	Trainable Params	Approach
LoRA	0.1-1%	Low-rank weight update
Prefix Tuning	0.1%	Learnable prefix tokens
Adapters	1-5%	Small bottleneck layers
Full FT	100%	Update all parameters

Definition:
SFT Training Data Format

Supervised fine-tuning uses instruction-response pairs:

{
  "instruction": "Compute the SNR for received power -80 dBm and noise -100 dBm",
  "response": "SNR = P_rx - N = -80 - (-100) = 20 dB"
}

Quality > quantity: 1000 high-quality examples often outperform 100,000 noisy ones.

Definition:
Fine-Tuning Learning Rate

Fine-tuning requires much lower learning rates than pre-training:

$\eta_\text{FT} \approx 10^{-5} \text{ to } 10^{-4}$

Use cosine decay with warmup: $\eta(t) = \eta_\max \cdot \frac{1 + \cos(\pi t / T)}{2}$

Theorem: LoRA Rank Selection

For a weight matrix $\mathbf{W}_0 \in \mathbb{R}^{d \times d}$ , the expressiveness of the LoRA update is bounded by:

$\text{rank}(\Delta\mathbf{W}) \le r$

Empirically, $r \in [4, 64]$ is sufficient for most tasks. The optimal $r$ depends on the task complexity and dataset size.

Fine-tuning for a specific domain shifts weights in a low-dimensional subspace. LoRA captures this subspace with rank $r$ .

Theorem: Catastrophic Forgetting in Fine-Tuning

Full fine-tuning on domain data $\mathcal{D}_\text{new}$ can degrade performance on the original distribution $\mathcal{D}_\text{orig}$ :

$L(\theta_{\text{FT}}, \mathcal{D}_\text{orig}) > L(\theta_0, \mathcal{D}_\text{orig})$

LoRA mitigates this because the original weights $\mathbf{W}_0$ are frozen.

When you overwrite weights with new task knowledge, old knowledge can be lost. LoRA adds rather than replaces.

Theorem: Fine-Tuning Data Scaling

Fine-tuning performance on downstream tasks follows:

$\text{Perf}(n) \approx a - b \cdot n^{-\gamma}$

where $n$ is the number of training examples and $\gamma \approx 0.3$ - $0.5$ . Beyond ~10K examples, returns diminish significantly for most tasks.

The pre-trained model already has strong representations. Fine-tuning mainly teaches it the task format and domain conventions.

Example: Fine-Tuning with LoRA using PEFT

Fine-tune a 7B model on wireless communication Q&A data using LoRA.

Solution

Setup and Training

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3-8B")
lora_config = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 6,553,600 || all params: 8,036,065,280 || trainable%: 0.08

Example: Preparing Fine-Tuning Data

Convert wireless paper abstracts into instruction-tuning format.

Solution

Data Conversion

import json

def abstract_to_instruction(abstract, keywords):
    return {
        "instruction": f"Summarize this paper and list key contributions:\n{abstract}",
        "response": f"Keywords: {', '.join(keywords)}\n..."
    }

Example: Evaluating Fine-Tuned Models

Evaluate a fine-tuned model on domain-specific benchmarks.

Solution

Evaluation

from lm_eval import evaluator
results = evaluator.simple_evaluate(
    model="hf", model_args=f"pretrained={model_path}",
    tasks=["winogrande", "hellaswag", "arc_easy"],
)

LoRA Rank Explorer

Explore how LoRA rank affects parameter count and performance

Parameters

Fine-Tuning Loss Comparison

Compare training dynamics for different fine-tuning methods

Parameters

GPU Memory Estimator

Estimate GPU memory for different fine-tuning configurations

Parameters

LoRA Weight Update Animation

Visualize how LoRA adapters modify attention weights during training

Parameters

Quantization Explorer

Compare model sizes and quality across quantization levels

Parameters

LoRA Architecture — LoRA adds trainable low-rank matrices alongside frozen pre-trained weights.

Fine-Tuning Pipeline — From data preparation through training to evaluation and deployment.

Quick Check

Why does LoRA work with such low rank (r=16) for fine-tuning a 4096-dimensional model?

The model is not actually learning anything

Fine-tuning shifts weights in a low-dimensional subspace

Higher ranks would cause overfitting

Correction:

Fine-tuning shifts weights in a low-dimensional subspace

The weight changes needed for domain adaptation lie in a much lower-dimensional space than the full weight matrix.

Quick Check

What is the main advantage of QLoRA over standard LoRA?

Better model quality

4x reduction in base model memory through NF4 quantization

Faster training speed

Correction:

4x reduction in base model memory through NF4 quantization

QLoRA quantizes the frozen base weights to 4 bits, allowing fine-tuning of much larger models on limited hardware.

Quick Check

For fine-tuning, which is more important: data quality or data quantity?

Data quantity — more data is always better

Data quality — 1000 excellent examples often beat 100K noisy ones

Neither — the model architecture matters most

Correction:

Data quality — 1000 excellent examples often beat 100K noisy ones

Pre-trained models already have strong representations. Fine-tuning mainly teaches task format and domain conventions, requiring less but higher-quality data.

Common Mistake: Overfitting Small Datasets

Mistake:

Training for too many epochs on small fine-tuning datasets.

Correction:

Use early stopping, monitor validation loss, and keep epochs low (1-3 for instruction tuning). The pre-trained model already has strong representations.

Common Mistake: Applying LoRA to Wrong Modules

Mistake:

Only applying LoRA to query projections.

Correction:

Apply LoRA to all attention projections (Q, K, V, O) and optionally the MLP layers. This provides the best performance-efficiency trade-off.

Common Mistake: Evaluation Data Contamination

Mistake:

Including evaluation data in the training set accidentally.

Correction:

Always split data before any processing. Use held-out test sets that are never seen during training or hyperparameter tuning.

Key Takeaway

LoRA fine-tuning modifies only 0.1-1% of parameters while achieving comparable results to full fine-tuning. QLoRA further reduces memory requirements by 4x through base weight quantization.

Key Takeaway

Data quality dominates quantity for fine-tuning. A well-curated dataset of 1000-5000 examples is typically sufficient for domain adaptation. Focus on diverse, high-quality instruction-response pairs.

Why This Matters: Domain-Specific LLMs for Telecom

Fine-tuning on 3GPP specifications, IEEE papers, and simulation code creates LLMs that understand telecom terminology, can generate correct simulation code, and reason about system-level design trade-offs. This is a growing area of applied research.

See full treatment in Chapter 38

Historical Note: The LoRA Breakthrough

2021

Hu et al. (2021) at Microsoft introduced LoRA, showing that the weight updates during fine-tuning are inherently low-rank. This enabled fine-tuning GPT-3 (175B) with only ~18M trainable parameters, making large model adaptation accessible to researchers without massive GPU clusters.

Historical Note: QLoRA Democratization

2023

Dettmers et al. (2023) showed that combining 4-bit quantization with LoRA allows fine-tuning a 65B model on a single 48GB GPU. This democratized LLM fine-tuning, enabling graduate students to customize frontier-class models.

Fine-Tuning Method Comparison

Method	Trainable %	Memory (7B)	Quality	Speed
Full FT	100%	~60 GB	Best	Slowest
LoRA (r=16)	0.08%	~16 GB	Near-best	Fast
QLoRA (r=16)	0.08%	~6 GB	Good	Fast
Prefix Tuning	0.1%	~15 GB	Good	Fast

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning method that adds trainable low-rank matrices to frozen pre-trained weights, requiring only 0.1-1% additional parameters.

QLoRA

Quantized LoRA — combines 4-bit NormalFloat quantization of base weights with LoRA adapters, enabling fine-tuning of large models on consumer GPUs.

PEFT

Parameter-Efficient Fine-Tuning — a family of methods that fine-tune only a small subset of model parameters, including LoRA, adapters, and prefix tuning.

SFT (Supervised Fine-Tuning)

Fine-tuning a pre-trained model on curated instruction-response pairs to improve instruction following.

Catastrophic Forgetting

The tendency of neural networks to lose previously learned knowledge when fine-tuned on new data, mitigated by LoRA and other regularization techniques.

Fine-Tuning Pre-Trained Models

Definition: Low-Rank Adaptation (LoRA)

Definition: QLoRA: Quantized LoRA

Definition: Parameter-Efficient Fine-Tuning (PEFT)

Definition: SFT Training Data Format

Definition: Fine-Tuning Learning Rate

Theorem: LoRA Rank Selection

Theorem: Catastrophic Forgetting in Fine-Tuning

Theorem: Fine-Tuning Data Scaling

Example: Fine-Tuning with LoRA using PEFT

Setup and Training

Example: Preparing Fine-Tuning Data

Data Conversion

Example: Evaluating Fine-Tuned Models

Evaluation

LoRA Rank Explorer

Parameters

Fine-Tuning Loss Comparison

Parameters

GPU Memory Estimator

Parameters

LoRA Weight Update Animation

Parameters

Quantization Explorer

Parameters

LoRA Architecture

Fine-Tuning Pipeline

Quick Check

Quick Check

Quick Check

Common Mistake: Overfitting Small Datasets

Common Mistake: Applying LoRA to Wrong Modules

Common Mistake: Evaluation Data Contamination

Key Takeaway

Key Takeaway

Why This Matters: Domain-Specific LLMs for Telecom

Historical Note: The LoRA Breakthrough

Historical Note: QLoRA Democratization

Fine-Tuning Method Comparison

LoRA (Low-Rank Adaptation)

QLoRA

PEFT

SFT (Supervised Fine-Tuning)

Catastrophic Forgetting

Definition:
Low-Rank Adaptation (LoRA)

Definition:
QLoRA: Quantized LoRA

Definition:
Parameter-Efficient Fine-Tuning (PEFT)

Definition:
SFT Training Data Format

Definition:
Fine-Tuning Learning Rate