🚧 Lesson 4 of 25 in Level 05
Level 05 • Lesson 4

Gradients & Derivatives

Computing gradients. Partial derivatives and the Jacobian.

Derivatives

The derivative measures rate of change:

# f(x) = x^2 # f'(x) = 2x # At x=3, slope is 6 # Small change in x → 6x change in f(x)

Partial Derivatives

For functions of multiple variables:

# f(x,y) = x^2 + y^2 # ∂f/∂x = 2x (treat y as constant) # ∂f/∂y = 2y (treat x as constant) # Gradient (vector of partial derivatives): ∇f = [∂f/∂x, ∂f/∂y] = [2x, 2y]

The Jacobian

Matrix of all first-order partial derivatives:

# For f: R^n → R^m, Jacobian is m×n matrix # J[i,j] = ∂f_i/∂x_j # Used in backpropagation to chain gradients

Practice Exercises

Exercise 1: Compute Gradients Manually

For the function f(x, y, z) = x²y + yz³, compute:

  • ∂f/∂x
  • ∂f/∂y
  • ∂f/∂z

Answer: ∂f/∂x = 2xy, ∂f/∂y = x² + z³, ∂f/∂z = 3yz²

Exercise 2: Implement Gradient Descent Step

Complete the Python function to perform one gradient descent update:

def gradient_descent_step(x, y, learning_rate=0.1): """ For f(x,y) = x² + y², perform one gradient descent step. Returns: new_x, new_y """ # Compute gradients grad_x = ____ # ∂f/∂x grad_y = ____ # ∂f/∂y # Update parameters new_x = x - learning_rate * grad_x new_y = y - learning_rate * grad_y return new_x, new_y # Test: starting from (3, 4), after one step with lr=0.1: # Expected: new_x = 2.4, new_y = 3.2

Solution: grad_x = 2*x, grad_y = 2*y. Starting at (3,4): new_x = 3 - 0.1*6 = 2.4, new_y = 4 - 0.1*8 = 3.2

Knowledge Check Quiz

Question 1: Gradient Direction

What does the gradient vector ∇f point to?

  • A) The direction of steepest descent
  • B) The direction of steepest ascent
  • C) A local minimum
  • D) The origin

Answer: B — The gradient points in the direction of steepest ascent (maximum rate of increase).

Question 2: Partial Derivative

For f(x,y) = x³y², what is ∂f/∂x?

  • A) 3x²y²
  • B) x³ · 2y
  • C) 3x² + 2y
  • D) 6xy

Answer: A — Treat y as constant: ∂f/∂x = 3x² · y² = 3x²y²

Question 3: Jacobian Dimensions

If f: ℝ⁵ → ℝ³, what are the dimensions of the Jacobian matrix?

  • A) 5×3
  • B) 3×5
  • C) 5×5
  • D) 3×3

Answer: B — The Jacobian is m×n where m=output dim (3) and n=input dim (5), so 3×5.

Question 4: Gradient Descent Update

Why do we subtract the gradient (not add it) in gradient descent?

  • A) To move toward lower loss
  • B) To increase the learning rate
  • C) To compute the Jacobian
  • D) It's just a convention

Answer: A — We subtract because the gradient points toward steepest ascent; subtracting moves us toward steepest descent (lower loss).

Additional Coding Exercises

Exercise 3: Implement the Jacobian

Write a function to compute the Jacobian matrix for a vector-valued function:

import numpy as np def jacobian(f, x, h=1e-5): """ Compute Jacobian of f at point x using finite differences. f: function that takes vector x and returns vector x: input vector (numpy array) h: step size for finite difference Returns: Jacobian matrix J where J[i,j] = ∂f_i/∂x_j """ n = len(x) fx = f(x) m = len(fx) J = np.zeros((m, n)) for j in range(n): x_plus = x.copy() x_plus[j] += h # TODO: compute partial derivative for column j J[:, j] = (f(x_plus) - fx) / h return J # Test: f(x,y) = [x²+y, xy] def test_func(v): x, y = v[0], v[1] return np.array([x**2 + y, x*y]) # At point (1, 2), expected Jacobian: # J = [[2x, 1], [y, x]] = [[2, 1], [2, 1]] print(jacobian(test_func, np.array([1.0, 2.0])))
Exercise 4: Chain Rule Implementation

Implement the chain rule for backpropagation through a simple neural network layer:

def linear_layer_backward(x, w, b, grad_output): """ Backward pass through linear layer: y = x @ w + b Args: x: input (batch_size, in_features) w: weights (in_features, out_features) b: bias (out_features,) grad_output: gradient from next layer (batch_size, out_features) Returns: grad_x, grad_w, grad_b: gradients w.r.t. inputs """ # TODO: Compute gradients using chain rule # grad_w = x.T @ grad_output # grad_b = sum(grad_output, axis=0) # grad_x = grad_output @ w.T grad_w = x.T @ grad_output grad_b = np.sum(grad_output, axis=0) grad_x = grad_output @ w.T return grad_x, grad_w, grad_b # Verify shapes import numpy as np x = np.random.randn(4, 3) # batch=4, in=3 w = np.random.randn(3, 2) # in=3, out=2 b = np.random.randn(2) # out=2 grad_out = np.random.randn(4, 2) gx, gw, gb = linear_layer_backward(x, w, b, grad_out) print(f"grad_x shape: {gx.shape}, expected: (4, 3)") print(f"grad_w shape: {gw.shape}, expected: (3, 2)") print(f"grad_b shape: {gb.shape}, expected: (2,)")