🚧 Lesson 2 of 25 in Level 05
Level 05 • Lesson 2

Matrix Operations

Multiplication, transpose, inverse. Geometric interpretation.

Matrix Multiplication

The fundamental operation in neural networks:

# If A is m×n and B is n×p, then AB is m×p # (A @ B)[i,j] = sum over k of A[i,k] * B[k,j] A = [[1, 2], B = [[5, 6], [3, 4], [7, 8]] [5, 6]] AB = [[19, 22], # 1*5+2*7, 1*6+2*8 [43, 50], # 3*5+4*7, 3*6+4*8 [67, 78]] # 5*5+6*7, 5*6+6*8

Transpose

# Flip rows and columns A = [[1, 2, 3], A^T = [[1, 4], [4, 5, 6]] [2, 5], [3, 6]]

Identity and Inverse

# Identity matrix (like 1 for matrices) I = [[1, 0], [0, 1]] # Inverse: A @ A^-1 = I # Used in solving linear systems

Key Takeaways

  • Matrix Multiplication: The dot product of rows and columns. For A (m×n) and B (n×p), result is m×p. Essential for neural network forward passes.
  • Transpose: Flips rows to columns (A^T). Useful for shape compatibility and attention mechanisms in transformers.
  • Identity Matrix: Acts as "1" for matrices — multiplying by I leaves the matrix unchanged.
  • Matrix Inverse: A^-1 satisfies A @ A^-1 = I. Used for solving linear systems, though often computationally expensive for large matrices.
  • Geometric View: Matrices represent linear transformations (scaling, rotation, shearing) — composition of transformations equals matrix multiplication.

Quick Quiz

1. Matrix Multiplication Dimensions: If matrix A is 3×4 and matrix B is 4×2, what are the dimensions of AB?

Answer: 3×2 (the inner dimensions must match, result has outer dimensions)

2. Transpose Property: What is (AB)^T equal to?

Answer: B^T × A^T (the order reverses when transposing a product)

3. Identity Matrix: If A is a 3×3 matrix, what is A × I?

Answer: A (the identity matrix acts like 1 for matrix multiplication)

4. Inverse Application: If Ax = b, how do you solve for x using the inverse?

Answer: x = A^(-1) × b (multiply both sides by A inverse on the left)

Practice Exercises

Exercise 1: Matrix Multiplication Implementation

Implement matrix multiplication from scratch without using numpy's @ operator:

def matmul(A, B): """Multiply two matrices A and B manually.""" rows_A = len(A) cols_A = len(A[0]) rows_B = len(B) cols_B = len(B[0]) # Check if multiplication is possible if cols_A != rows_B: raise ValueError("Matrix dimensions incompatible") # Initialize result matrix with zeros result = [[0 for _ in range(cols_B)] for _ in range(rows_A)] # Compute dot products for i in range(rows_A): for j in range(cols_B): for k in range(cols_A): result[i][j] += A[i][k] * B[k][j] return result # Test A = [[1, 2], [3, 4]] B = [[5, 6], [7, 8]] print(matmul(A, B)) # [[19, 22], [43, 50]]

Exercise 2: Batch Matrix Operations for Neural Networks

In neural networks, we process multiple inputs at once (batch processing). Given a batch of 3 inputs (each with 4 features) and weights (4×2), compute the output:

import numpy as np # Batch of 3 inputs, each with 4 features X = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) # Shape: (3, 4) # Weights: 4 inputs -> 2 outputs W = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8]]) # Shape: (4, 2) # Bias for 2 outputs b = np.array([0.1, 0.2]) # Forward pass: output = X @ W + b output = X @ W + b print(f"Output shape: {output.shape}") # Should be (3, 2) print(f"Output:\n{output}") # Manual check for first sample: # [1*0.1 + 2*0.3 + 3*0.5 + 4*0.7 + 0.1, 1*0.2 + 2*0.4 + 3*0.6 + 4*0.8 + 0.2] # = [0.1 + 0.6 + 1.5 + 2.8 + 0.1, 0.2 + 0.8 + 1.8 + 3.2 + 0.2] # = [5.1, 6.2]

Challenge: Why is the bias added after the matrix multiplication, not before? What would happen if you added it before?

Additional Exercises

Exercise 3: Matrix Transpose Implementation

Write a function to compute the transpose of a matrix without using numpy:

def transpose(matrix): """Return the transpose of a matrix.""" rows = len(matrix) cols = len(matrix[0]) # Result will be cols × rows result = [[0 for _ in range(rows)] for _ in range(cols)] for i in range(rows): for j in range(cols): result[j][i] = matrix[i][j] return result # Test A = [[1, 2, 3], [4, 5, 6]] print(transpose(A)) # [[1, 4], [2, 5], [3, 6]]

Exercise 4: Verify Transpose Property

Prove that (AB)^T = B^T × A^T with a concrete example:

import numpy as np A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) # Compute (AB)^T AB = A @ B AB_T = AB.T print("(AB)^T:") print(AB_T) # Compute B^T @ A^T BT_AT = B.T @ A.T print("\nB^T @ A^T:") print(BT_AT) # Verify they are equal print(f"\nEqual: {np.allclose(AB_T, BT_AT)}") # Should be True

Insight: This property is crucial in backpropagation where we need to compute gradients through matrix operations.

Knowledge Check Quiz

1. Matrix Multiplication Non-Commutativity: Is AB always equal to BA? Provide a counterexample or explain why.

Answer: No, matrix multiplication is not commutative. For A = [[1, 2], [3, 4]] and B = [[0, 1], [0, 0]], AB = [[0, 1], [0, 3]] but BA = [[3, 4], [0, 0]].

2. Associative Property: For matrices A (2×3), B (3×4), C (4×5), is (AB)C equal to A(BC)?

Answer: Yes, matrix multiplication is associative: (AB)C = A(BC). The result will be 2×5 in both cases.

3. Singular Matrix: What does it mean if a matrix has no inverse? Give an example of a 2×2 singular matrix.

Answer: A singular matrix has determinant = 0. Example: [[1, 2], [2, 4]] — the rows are linearly dependent (second row is 2× first row).

4. Neural Network Application: In a layer with 100 inputs and 50 outputs, what are the dimensions of the weight matrix W?

Answer: W is 100×50. When we multiply input (batch_size × 100) by W (100×50), we get output (batch_size × 50).

5. Computational Complexity: What is the time complexity of multiplying an m×n matrix by an n×p matrix?

Answer: O(m×n×p). Each of the m×p output elements requires computing a dot product of length n.