Derivatives
The derivative measures rate of change:
Partial Derivatives
For functions of multiple variables:
The Jacobian
Matrix of all first-order partial derivatives:
Practice Exercises
For the function f(x, y, z) = x²y + yz³, compute:
- ∂f/∂x
- ∂f/∂y
- ∂f/∂z
Answer: ∂f/∂x = 2xy, ∂f/∂y = x² + z³, ∂f/∂z = 3yz²
Complete the Python function to perform one gradient descent update:
Solution: grad_x = 2*x, grad_y = 2*y. Starting at (3,4): new_x = 3 - 0.1*6 = 2.4, new_y = 4 - 0.1*8 = 3.2
Knowledge Check Quiz
What does the gradient vector ∇f point to?
- A) The direction of steepest descent
- B) The direction of steepest ascent
- C) A local minimum
- D) The origin
Answer: B — The gradient points in the direction of steepest ascent (maximum rate of increase).
For f(x,y) = x³y², what is ∂f/∂x?
- A) 3x²y²
- B) x³ · 2y
- C) 3x² + 2y
- D) 6xy
Answer: A — Treat y as constant: ∂f/∂x = 3x² · y² = 3x²y²
If f: ℝ⁵ → ℝ³, what are the dimensions of the Jacobian matrix?
- A) 5×3
- B) 3×5
- C) 5×5
- D) 3×3
Answer: B — The Jacobian is m×n where m=output dim (3) and n=input dim (5), so 3×5.
Why do we subtract the gradient (not add it) in gradient descent?
- A) To move toward lower loss
- B) To increase the learning rate
- C) To compute the Jacobian
- D) It's just a convention
Answer: A — We subtract because the gradient points toward steepest ascent; subtracting moves us toward steepest descent (lower loss).
Additional Coding Exercises
Write a function to compute the Jacobian matrix for a vector-valued function:
Implement the chain rule for backpropagation through a simple neural network layer: