Interactive Course

Understand LLMs From First Principles

Learn how Large Language Models work, how to build them, and the mathematics behind everything. From tokens to transformers, from gradients to GPT. No prior ML experience required.

Start Learning

5 Levels

150 Lessons

∞ Math Explained

0 Cost (Free)

Course Curriculum

A structured journey from "What is a token?" to implementing your own transformer. Each level builds on the previous, with interactive examples and mathematical rigor.

Foundations

What are LLMs? How do they represent text? Learn about tokens, embeddings, vocabularies, sampling strategies, and the basic building blocks of language models.

Tokens Embeddings Vocabulary Next Token Prediction Sampling

embedding("king") - embedding("man") + embedding("woman") ≈ embedding("queen")

25 Lessons Beginner

Available Now

Neural Networks

From the perceptron to deep networks. Understand how neural networks learn patterns, backpropagation, optimization, and the architecture choices that make deep learning work.

Perceptron Activation Functions Backpropagation CNNs RNNs

y = σ(Wx + b) where σ is the activation function

30 Lessons Beginner-Intermediate

Available Now

The Transformer

The architecture that changed everything. Master attention mechanisms, multi-head attention, positional encodings, and the full transformer block in detail.

Attention Multi-Head Positional Encoding KV-Cache GPT Architecture

Attention(Q,K,V) = softmax(QK^T/√d_k)V

35 Lessons Intermediate

Available Now

Training & Optimization

How LLMs are actually trained. Pre-training, fine-tuning, RLHF, distributed training, quantization, and the optimization algorithms that make it all work.

Pre-training Fine-tuning RLHF Distributed Training LoRA

θ_new = θ_old - α∇J(θ)

35 Lessons Advanced

Available Now

The Mathematics

Deep dive into the math. Linear algebra, calculus, probability, information theory, optimization theory, and statistical learning — all explained visually and rigorously.

Linear Algebra Calculus Probability Information Theory Optimization

&partial;L/∂ial;W = &partial;L/&partial;y · ∂y/∂z · ∂z/∂W (Chain Rule)

25 Lessons Advanced

Available Now

Why This Course?

Most LLM explanations either oversimplify or drown you in equations. We do neither — every concept explained intuitively AND rigorously.

First Principles

Start from absolute zero. No assumed knowledge of ML, AI, or even programming. We build everything up from scratch.

The Actual Math

See the real equations behind LLMs — gradients, attention matrices, loss functions — explained step by step with visualizations.

Interactive Code

Run actual code in your browser. Build a tiny transformer, train it, and see it generate text. Learn by doing.

Two Modes

Switch between "Casual" (intuitive explanations) and "Formal" (technical depth) anytime. Learn at your level.

Visual Intuition

High-dimensional concepts rendered as interactive visualizations. See attention heads, embedding spaces, and gradients.

Completely Free

No paywalls, no subscriptions, no ads. Open source and free forever. Knowledge should be accessible to everyone.

Ready to Understand LLMs?

Join thousands of learners demystifying AI. Start with Level 1 and build your way up to understanding every component of modern language models.

Start Level 01: Foundations →