Lesson 1: Vectors & Matrices

Vectors

A vector is an ordered list of numbers:

v = [1, 2, 3]  # 3-dimensional vector

# In ML: word embeddings are vectors
# "cat" might be represented as [0.2, -0.5, 0.8, ...]
      

Vector Operations

# Addition: element-wise
[1, 2] + [3, 4] = [4, 6]

# Scalar multiplication
2 * [1, 2, 3] = [2, 4, 6]

# Dot product (measures similarity)
[1, 2] · [3, 4] = 1*3 + 2*4 = 11
      

Matrices

A matrix is a 2D array of numbers:

A = [[1, 2],
     [3, 4],
     [5, 6]]  # 3×2 matrix (3 rows, 2 columns)

# In ML: weight matrices transform vectors
      

Knowledge Check

Question 1: What is the result of the dot product [2, 3] · [4, 5]?

Answer: 2×4 + 3×5 = 8 + 15 = 23

Question 2: If matrix A is 3×2 and matrix B is 2×4, what are the dimensions of A × B?

Answer: The result is 3×4 (rows from A, columns from B)

Question 3: In ML, what do weight matrices do to input vectors?

Answer: They transform vectors through linear transformations, changing their dimensions and values to extract features

Question 4: What is the result of 3 * [1, 2, 3]?

Answer: [3, 6, 9] — each element is multiplied by the scalar

Practical Examples

Example 1: Word Embeddings in Practice

In LLMs, words are converted to vectors. Similar words have similar vector directions:

# Word embedding vectors (simplified, 3D for visualization)
cat     = [0.8,  0.2,  0.5]
dog     = [0.7,  0.3,  0.6]   # Similar to "cat" (both animals)
king    = [0.2,  0.9,  0.1]   # Different direction (royalty)
queen   = [0.3,  0.85, 0.15]  # Similar to "king"

# Dot product measures similarity
dot(cat, dog)     # High value (~0.9) - similar words
dot(cat, king)    # Low value (~0.3) - different meaning
      

Example 2: Neural Network Layer Computation

A simple neural network layer uses matrix multiplication:

# Input: 4 features (e.g., 4 token embeddings)
input_vector = [1.0, 0.5, -0.3, 0.8]  # Shape: (1, 4)

# Weight matrix: 4 inputs → 3 outputs (hidden layer)
weights = [[ 0.2,  0.5, -0.1,  0.3],
           [-0.3,  0.1,  0.4, -0.2],
           [ 0.1, -0.4,  0.2,  0.5]]  # Shape: (4, 3)

# Matrix multiplication: output = input × weights
# Result shape: (1, 3) - 3 hidden neurons activated
output = [0.49, 0.42, 0.33]

# This transforms 4D input into 3D representation
      

Example 3: Attention Mechanism (Simplified)

Self-attention uses matrices to compute relationships between tokens:

# Three tokens: "The", "cat", "sat"
# Each represented as a 4-dimensional embedding
embeddings = [[0.5, 0.2, 0.1, 0.8],   # "The"
              [0.8, 0.3, 0.5, 0.2],   # "cat"
              [0.3, 0.9, 0.2, 0.1]]   # "sat"

# Query matrix projects embeddings to query space
Q = embeddings × W_q   # Shape: (3, 4) × (4, 4) = (3, 4)

# Attention scores: how much each token attends to others
# scores = Q × K^T  (matrix multiply with key transpose)
# Result: 3×3 matrix showing token-to-token relationships

# Example attention scores:
attn_scores = [[2.1, 1.5, 0.8],
               [1.4, 2.3, 1.1],
               [0.9, 1.2, 2.0]]
# "cat" (row 2) has highest score with itself (2.3) and "sat" (1.1)
      

Practice Exercises

Exercise 1: Vector Operations

Implement basic vector operations in Python:

# Task: Complete the following functions

def vector_add(v1, v2):
    """Add two vectors element-wise"""
    # Your code here
    pass

def dot_product(v1, v2):
    """Calculate dot product of two vectors"""
    # Your code here
    pass

def scalar_multiply(scalar, vector):
    """Multiply vector by scalar"""
    # Your code here
    pass

# Test cases
print(vector_add([1, 2, 3], [4, 5, 6]))        # Expected: [5, 7, 9]
print(dot_product([1, 2], [3, 4]))              # Expected: 11
print(scalar_multiply(3, [1, 2, 3]))            # Expected: [3, 6, 9]
      

Solution:

def vector_add(v1, v2):
    return [a + b for a, b in zip(v1, v2)]

def dot_product(v1, v2):
    return sum(a * b for a, b in zip(v1, v2))

def scalar_multiply(scalar, vector):
    return [scalar * x for x in vector]
        

Exercise 2: Matrix Multiplication

Implement matrix multiplication from scratch:

# Task: Implement matrix multiplication

def matrix_multiply(A, B):
    """
    Multiply matrix A (m×n) by matrix B (n×p)
    Returns matrix of shape (m×p)
    """
    # Your code here
    pass

# Test case
A = [[1, 2],
     [3, 4]]      # 2×2

B = [[5, 6],
     [7, 8]]      # 2×2

result = matrix_multiply(A, B)
print(result)  # Expected: [[19, 22], [43, 50]]
# Explanation: [1*5+2*7, 1*6+2*8] = [19, 22]
#             [3*5+4*7, 3*6+4*8] = [43, 50]
      

Solution:

def matrix_multiply(A, B):
    rows_A = len(A)
    cols_A = len(A[0])
    cols_B = len(B[0])
    
    # Initialize result matrix with zeros
    result = [[0 for _ in range(cols_B)] for _ in range(rows_A)]
    
    # Multiply
    for i in range(rows_A):
        for j in range(cols_B):
            for k in range(cols_A):
                result[i][j] += A[i][k] * B[k][j]
    
    return result