The Three Stages
Modern LLM Training Pipeline
- Pre-training: Learn general language from vast text corpus
- Supervised Fine-tuning (SFT): Adapt to specific tasks/formats
- Alignment (RLHF): Make helpful, harmless, honest
Stage 1: Pre-training
- Data: Trillions of tokens from web, books, code
- Objective: Next token prediction (CLM) or masked prediction (MLM)
- Compute: Thousands of GPUs for weeks/months
- Cost: Millions of dollars for large models
Stage 2: Fine-tuning
- Data: High-quality task-specific datasets
- Objective: Same as pre-training, but focused
- Compute: Much less (hours to days)
Stage 3: Alignment
- Data: Human preferences and demonstrations
- Objective: RLHF or similar methods
- Goal: Helpful, harmless, honest responses
Knowledge Check
Quiz: The Training Pipeline
1. What is the primary objective during pre-training?
- a) Learning human preferences
- b) Next token prediction
- c) Code optimization
Answer: b) Next token prediction — the model learns to predict the next token in a sequence from vast text data.
2. Which stage uses RLHF (Reinforcement Learning from Human Feedback)?
- a) Pre-training
- b) Fine-tuning
- c) Alignment
Answer: c) Alignment — RLHF is used to align the model with human values and preferences.
3. What type of data is used during supervised fine-tuning?
- a) Raw web text
- b) High-quality task-specific datasets
- c) Random token sequences
Answer: b) High-quality task-specific datasets — curated data for specific tasks and formats.
4. What does "HHH" stand for in alignment?
- a) High, Higher, Highest
- b) Helpful, Harmless, Honest
- c) Human, Hybrid, Hardware
Answer: b) Helpful, Harmless, Honest — the three key goals of alignment training.
5. Which stage typically requires the most compute resources?
- a) Pre-training
- b) Fine-tuning
- c) Alignment
Answer: a) Pre-training — requires thousands of GPUs running for weeks or months on trillions of tokens.
Practice Exercise
Exercise 1: Design a Training Pipeline
Imagine you're training a customer service chatbot. Outline the three stages of your training pipeline:
- Pre-training: What data sources would you use? What base model would you start with?
- Fine-tuning: What specific datasets would you need? How would you structure the training examples?
- Alignment: What human feedback would you collect? How would you ensure the bot is helpful but safe?
Solution Approach:
Start with a general-purpose LLM (like Llama or GPT). Fine-tune on customer service transcripts and support tickets. Use RLHF to align responses with company tone and safety guidelines.
Exercise 2: Compute Estimation
Given the following scenario, estimate the relative compute costs:
- Pre-training: 1000 GPUs Ă— 30 days
- Fine-tuning: 8 GPUs Ă— 2 days
- Alignment: 100 GPUs Ă— 5 days
Question: What percentage of total compute does each stage represent?
Answer:
- Pre-training: ~99.7% (30,000 GPU-days)
- Fine-tuning: ~0.05% (16 GPU-days)
- Alignment: ~1.7% (500 GPU-days)
Quick Quiz: Training Pipeline Concepts
Test Your Understanding
1. Why is pre-training called "unsupervised" learning?
- a) No humans are involved
- b) No labeled examples are needed—the model learns from raw text patterns
- c) It runs without monitoring
Answer: b) No labeled examples are needed—the model learns from raw text patterns. The model predicts next tokens without explicit human annotations.
2. What is the main purpose of the alignment stage?
- a) To make the model faster
- b) To ensure the model produces helpful, harmless, and honest outputs
- c) To reduce model size
Answer: b) To ensure the model produces helpful, harmless, and honest outputs. Alignment shapes model behavior to match human values.
3. Which training stage typically uses the smallest amount of data?
- a) Pre-training (trillions of tokens)
- b) Fine-tuning (millions to billions of tokens)
- c) Alignment (thousands to millions of preference comparisons)
Answer: c) Alignment uses the least data—thousands to millions of human preference comparisons versus trillions of tokens in pre-training.
Coding Exercise: Training Pipeline Simulator
Exercise 3: Build a Mini Training Pipeline
Write a Python function that simulates the three stages of LLM training:
Expected Output:
Training costs: {'pretrain': 700.0, 'finetune': 7.0, 'align': 70.0}
Challenge: Modify the function to calculate total cost and identify which stage consumes the most resources.