The Chain Rule
For composed functions:
Backpropagation
Applying chain rule to neural networks:
Example: Simple Network
š Quick Quiz
Q1: If y = f(g(x)), what is dy/dx according to the chain rule?
A) dy/dx = dy/dg + dg/dx
B) dy/dx = dy/dg Ć dg/dx ā
C) dy/dx = dy/dg - dg/dx
D) dy/dx = (dy/dg) / (dg/dx)
Q2: For y = (3x + 2)³, what is dy/dx?
Let u = 3x + 2, then y = u³
dy/dx = 3u² Ć 3 = 9(3x + 2)² ā
Q3: In backpropagation, why do we store activations during the forward pass?
A) To reduce memory usage
B) To compute gradients during the backward pass ā
C) To initialize weights
D) To speed up inference
Q4: For y = Ļ(Wx + b), what is āy/āb?
āy/āb = Ļ'(Wx + b) Ć 1 = Ļ'(Wx + b) ā
Q5: What does the gradient tell us in neural network training?
A) The final prediction accuracy
B) The direction and magnitude of weight updates ā
C) The number of layers needed
D) The input data distribution
š» Coding Exercises
Exercise 1: Manual Backpropagation
Implement a simple 2-layer neural network and compute gradients manually:
Exercise 2: Chain Rule Verification
Verify the chain rule numerically using finite differences:
Expected Output: Errors should be < 1e-10, confirming the chain rule implementation.