Lesson 2: Matrix Operations

Matrix Multiplication

The fundamental operation in neural networks:

# If A is m×n and B is n×p, then AB is m×p
# (A @ B)[i,j] = sum over k of A[i,k] * B[k,j]

A = [[1, 2],    B = [[5, 6],
     [3, 4],         [7, 8]]
     [5, 6]]

AB = [[19, 22],   # 1*5+2*7, 1*6+2*8
      [43, 50],   # 3*5+4*7, 3*6+4*8
      [67, 78]]   # 5*5+6*7, 5*6+6*8
      

Transpose

# Flip rows and columns
A = [[1, 2, 3],        A^T = [[1, 4],
     [4, 5, 6]]               [2, 5],
                              [3, 6]]
      

Identity and Inverse

# Identity matrix (like 1 for matrices)
I = [[1, 0],
     [0, 1]]

# Inverse: A @ A^-1 = I
# Used in solving linear systems
      

Key Takeaways

        Matrix Multiplication: The dot product of rows and columns. For A (m×n) and B (n×p), result is m×p. Essential for neural network forward passes.
Transpose: Flips rows to columns (A^T). Useful for shape compatibility and attention mechanisms in transformers.
Identity Matrix: Acts as "1" for matrices — multiplying by I leaves the matrix unchanged.
Matrix Inverse: A^-1 satisfies A @ A^-1 = I. Used for solving linear systems, though often computationally expensive for large matrices.
Geometric View: Matrices represent linear transformations (scaling, rotation, shearing) — composition of transformations equals matrix multiplication.

      

Quick Quiz

1. Matrix Multiplication Dimensions: If matrix A is 3×4 and matrix B is 4×2, what are the dimensions of AB?

Answer: 3×2 (the inner dimensions must match, result has outer dimensions)

2. Transpose Property: What is (AB)^T equal to?

Answer: B^T × A^T (the order reverses when transposing a product)

3. Identity Matrix: If A is a 3×3 matrix, what is A × I?

Answer: A (the identity matrix acts like 1 for matrix multiplication)

4. Inverse Application: If Ax = b, how do you solve for x using the inverse?

Answer: x = A^(-1) × b (multiply both sides by A inverse on the left)

Practice Exercises

Exercise 1: Matrix Multiplication Implementation

Implement matrix multiplication from scratch without using numpy's @ operator:

def matmul(A, B):
    """Multiply two matrices A and B manually."""
    rows_A = len(A)
    cols_A = len(A[0])
    rows_B = len(B)
    cols_B = len(B[0])
    
    # Check if multiplication is possible
    if cols_A != rows_B:
        raise ValueError("Matrix dimensions incompatible")
    
    # Initialize result matrix with zeros
    result = [[0 for _ in range(cols_B)] for _ in range(rows_A)]
    
    # Compute dot products
    for i in range(rows_A):
        for j in range(cols_B):
            for k in range(cols_A):
                result[i][j] += A[i][k] * B[k][j]
    
    return result

# Test
A = [[1, 2], [3, 4]]
B = [[5, 6], [7, 8]]
print(matmul(A, B))  # [[19, 22], [43, 50]]
        

Exercise 2: Batch Matrix Operations for Neural Networks

In neural networks, we process multiple inputs at once (batch processing). Given a batch of 3 inputs (each with 4 features) and weights (4×2), compute the output:

import numpy as np

# Batch of 3 inputs, each with 4 features
X = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])  # Shape: (3, 4)

# Weights: 4 inputs -> 2 outputs
W = np.array([[0.1, 0.2],
              [0.3, 0.4],
              [0.5, 0.6],
              [0.7, 0.8]])  # Shape: (4, 2)

# Bias for 2 outputs
b = np.array([0.1, 0.2])

# Forward pass: output = X @ W + b
output = X @ W + b
print(f"Output shape: {output.shape}")  # Should be (3, 2)
print(f"Output:\n{output}")

# Manual check for first sample:
# [1*0.1 + 2*0.3 + 3*0.5 + 4*0.7 + 0.1, 1*0.2 + 2*0.4 + 3*0.6 + 4*0.8 + 0.2]
# = [0.1 + 0.6 + 1.5 + 2.8 + 0.1, 0.2 + 0.8 + 1.8 + 3.2 + 0.2]
# = [5.1, 6.2]
        

Challenge: Why is the bias added after the matrix multiplication, not before? What would happen if you added it before?

Additional Exercises

Exercise 3: Matrix Transpose Implementation

Write a function to compute the transpose of a matrix without using numpy:

def transpose(matrix):
    """Return the transpose of a matrix."""
    rows = len(matrix)
    cols = len(matrix[0])
    # Result will be cols × rows
    result = [[0 for _ in range(rows)] for _ in range(cols)]
    for i in range(rows):
        for j in range(cols):
            result[j][i] = matrix[i][j]
    return result

# Test
A = [[1, 2, 3],
     [4, 5, 6]]
print(transpose(A))  # [[1, 4], [2, 5], [3, 6]]
        

Exercise 4: Verify Transpose Property

Prove that (AB)^T = B^T × A^T with a concrete example:

import numpy as np

A = np.array([[1, 2],
              [3, 4]])
B = np.array([[5, 6],
              [7, 8]])

# Compute (AB)^T
AB = A @ B
AB_T = AB.T
print("(AB)^T:")
print(AB_T)

# Compute B^T @ A^T
BT_AT = B.T @ A.T
print("\nB^T @ A^T:")
print(BT_AT)

# Verify they are equal
print(f"\nEqual: {np.allclose(AB_T, BT_AT)}")  # Should be True
        

Insight: This property is crucial in backpropagation where we need to compute gradients through matrix operations.

Knowledge Check Quiz

1. Matrix Multiplication Non-Commutativity: Is AB always equal to BA? Provide a counterexample or explain why.

Answer: No, matrix multiplication is not commutative. For A = [[1, 2], [3, 4]] and B = [[0, 1], [0, 0]], AB = [[0, 1], [0, 3]] but BA = [[3, 4], [0, 0]].

2. Associative Property: For matrices A (2×3), B (3×4), C (4×5), is (AB)C equal to A(BC)?

Answer: Yes, matrix multiplication is associative: (AB)C = A(BC). The result will be 2×5 in both cases.

3. Singular Matrix: What does it mean if a matrix has no inverse? Give an example of a 2×2 singular matrix.

Answer: A singular matrix has determinant = 0. Example: [[1, 2], [2, 4]] — the rows are linearly dependent (second row is 2× first row).

4. Neural Network Application: In a layer with 100 inputs and 50 outputs, what are the dimensions of the weight matrix W?

Answer: W is 100×50. When we multiply input (batch_size × 100) by W (100×50), we get output (batch_size × 50).

5. Computational Complexity: What is the time complexity of multiplying an m×n matrix by an n×p matrix?

Answer: O(m×n×p). Each of the m×p output elements requires computing a dot product of length n.