Lesson 4: Instruction Tuning

Instruction Tuning

Train models to follow natural language instructions:

Input:  "Translate to French: The cat sat on the mat"
Output: "Le chat s'est assis sur le tapis"

Input:  "Summarize: {long article}"
Output: "{concise summary}"
      

Dataset Creation

High-quality instruction datasets include:

Alpaca (52k instructions)
ShareGPT (conversations)
Dolly (human-generated)
FLAN (task mixtures)

Training Tips

Use small learning rate (1e-5 to 5e-5)
Train for 1-3 epochs (don't overfit)
Balance different task types
Include examples of desired behavior

Practical Examples

Example 1: Creating Instruction Data

Here's how to structure instruction-following data for fine-tuning:

{
  "instruction": "Write a Python function to calculate factorial",
  "input": "",
  "output": "def factorial(n):\n    if n == 0 or n == 1:\n        return 1\n    return n * factorial(n - 1)"
}

Example 2: Using Open-Source Tools

Generate synthetic instruction data with existing models:

# Using axolotl for instruction tuning
base_model: meta-llama/Llama-2-7b-hf
datasets:
  - path: yahma/alpaca-cleaned
    type: alpaca
num_epochs: 3
learning_rate: 2e-5

Example 3: Evaluating Instruction Following

Test if your model follows instructions correctly:

Test: "List three benefits of renewable energy"
✓ Good response: Numbered list with clear benefits
✗ Bad response: "Renewable energy is important" (too vague)

Test: "Answer with only YES or NO"
✓ Good response: "YES"
✗ Bad response: "Yes, I think that's correct"

Knowledge Check

Question 1

What is the recommended learning rate range for instruction tuning?

Answer: 1e-5 to 5e-5 — much smaller than pre-training rates to preserve existing knowledge.

Question 2

Why should instruction tuning typically run for only 1-3 epochs?

Answer: To prevent overfitting. Too many epochs can cause the model to memorize training examples and lose generalization ability.

Question 3

Name two popular open-source instruction datasets.

Answer: Alpaca (52k instructions), ShareGPT (conversations), Dolly (human-generated), or FLAN (task mixtures).

Question 4

What is the key difference between fine-tuning and instruction tuning?

Answer: Fine-tuning adapts a model to a specific task or domain, while instruction tuning teaches the model to follow natural language instructions across various tasks.

Question 5

Why is it important to balance different task types in instruction datasets?

Answer: Balancing prevents the model from becoming biased toward certain instruction formats and ensures it generalizes across diverse tasks.

Practice Exercises

Exercise 1: Create Instruction Data

Write 3 instruction-following examples in JSON format. Each should include: instruction, input (can be empty), and expected output.

# Example structure:
{
  "instruction": "Your task description here",
  "input": "optional context",
  "output": "expected model response"
}

Tasks to cover: 1) Text summarization, 2) Code generation, 3) Sentiment analysis

Check your answers against the examples in the lesson above.

Exercise 2: Design Evaluation Prompts

Create 2 test prompts to evaluate if a model follows instructions precisely:

One that tests format adherence (e.g., "Answer in JSON format")
One that tests constraint following (e.g., "Use exactly 50 words")

Bonus: Write what a "good" vs "bad" response would look like for each.