Level 02: Neural Foundations

The Artificial Neuron

Remember vectors from Level 1? Now we're going to use them to build something amazing: artificial neurons.

Think of a neuron like a tiny decision-maker. It takes some inputs, does some math, and decides whether to "fire" (activate) or not.

            Real neurons in your brain: They receive signals from other neurons, and if enough signals come in at once, the neuron fires and sends signals to the next neurons in line.
            
            Artificial neurons: They do the same thing with math!

How a Neuron Works

A simple artificial neuron does three things:

Multiply each input by its weight
Add them all up (plus a bias)
Decide using an activation function

Try clicking the input nodes below to see the neuron in action!

The Perceptron Model

The perceptron, introduced by Rosenblatt in 1958, is the fundamental building block of neural networks. It computes a binary output from real-valued inputs using learnable weights.

Mathematical Formulation

Given input vector x = (x₁, x₂, ..., xₙ) and weights w = (w₁, w₂, ..., wₙ), the perceptron computes:

output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + b)

Or in vector notation:

output = activation(w · x + b)

Where:

w · x is the dot product of weights and inputs
b is the bias term (threshold adjustment)
activation is a non-linear function

Biological Inspiration

Biological neurons receive electrochemical signals through dendrites. When the sum of incoming signals exceeds a threshold, the neuron fires an action potential down its axon to connected neurons.

Artificial neurons abstract this process: inputs represent signals, weights represent synaptic strengths, and the activation function represents the thresholding behavior.

🧠 Interactive Neuron

Click the input nodes (x₁, x₂) to toggle them on/off. Watch how the neuron computes the output!

x₁ 0

x₂ 0

w₁=0.5

w₂=0.5

Σ

bias: 0.3

out 0.00

output = step(0.5×0 + 0.5×0 + 0.3) = step(0.3) = 1

            What's happening: The neuron multiplies each input by its weight, adds them up with the bias, 
            and applies a step function. If the result is positive, the neuron "fires" (outputs 1). Otherwise, it outputs 0.
          

Weights and Biases: The Learning Parameters

Here's the secret to how neurons learn: weights and biases.

Weights: How Important Is Each Input?

Weights tell the neuron which inputs matter more. A big weight means "pay attention!" A small (or negative) weight means "ignore this" or "this counts against."

            Example: If you're deciding whether to bring an umbrella:
            Weight = 10: Dark clouds (very important!)
Weight = 2: Windy (somewhat important)
Weight = -5: Sunny (counts against bringing umbrella)

          

Biases: The Threshold

The bias is like a starting point. It shifts the decision threshold up or down.

High bias: Neuron fires easily (optimistic)
Low bias: Neuron needs strong inputs to fire (pessimistic)

During training, the AI adjusts all these weights and biases to make better predictions. That's the "learning" part!

Parameters and Decision Boundaries

Weight Parameters

Weights determine the relative importance of each input feature. In geometric terms, the weight vector w defines the orientation of the decision boundary:

w · x + b = 0

This equation describes a hyperplane that divides the input space into two regions. Points on one side produce positive outputs (class 1), points on the other side produce negative outputs (class 0).

Bias Parameter

The bias b translates the decision boundary away from the origin. Without bias, all hyperplanes would be constrained to pass through the origin, severely limiting the representational capacity.

Learning as Parameter Optimization

Training consists of finding optimal values for weights and biases that minimize prediction error on the training set. For a perceptron with n inputs, there are n + 1 parameters to learn (n weights + 1 bias).

Activation Functions: Making Decisions

After adding up all the weighted inputs, the neuron needs to make a decision. That's where activation functions come in!

Think of activation functions as filters that decide how much the neuron should "fire."

Common Activation Functions

Non-linear Activation Functions

Activation functions introduce non-linearity, enabling neural networks to approximate complex functions. Without non-linear activations, deep networks would collapse to equivalent single-layer linear models.

Properties of Good Activation Functions

Non-linear: Enable approximation of arbitrary functions
Differentiable: Allow gradient-based optimization
Computationally efficient: Fast to compute during training
Well-behaved gradients: Avoid vanishing or exploding gradients

📊 Activation Function Explorer

Step Function: The simplest activation. Outputs 0 if input is negative, 1 if positive. Used in the original perceptron, but problematic because it's not differentiable.

Input: 0.0 → Output: 1.0

            Which Activation Should You Use?
            Step: Simple, but too harsh (used historically)
Sigmoid: Smooth, outputs 0-1 (good for probabilities)
ReLU: Fast, simple, works great (most common today!)
Tanh: Similar to sigmoid, but outputs -1 to 1

          

Why ReLU is popular: It's fast to compute, doesn't have the "vanishing gradient" problem, and works surprisingly well in practice. Most modern neural networks use ReLU or variants of it.

Step: f(x) = 1 if x > 0, else 0

Sigmoid: σ(x) = 1 / (1 + e^(-x))

ReLU: f(x) = max(0, x)

Tanh: tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Why ReLU Dominates

ReLU (Rectified Linear Unit) has become the default choice for hidden layers because:

Computational efficiency: Simple max operation vs. expensive exponentials
Gradient flow: For x > 0, gradient is 1 (no vanishing gradient)
Sparsity: Outputs exact zero for half the inputs
Biological plausibility: Similar to neural firing thresholds

However, ReLU has a "dying ReLU" problem—neurons can become permanently inactive. Variants like Leaky ReLU and ELU address this.

From Neurons to Networks

One neuron can make simple decisions, but real intelligence requires networks of neurons. When we connect neurons together, magic happens!

Layers of Neurons

Neural networks are organized in layers:

Input Layer: Receives the raw data (like pixel values)
Hidden Layers: Process and transform the information
Output Layer: Produces the final prediction

Information flows from input → hidden → output. Each layer learns to extract more complex features from the data.

Multi-Layer Perceptron Architecture

A Multi-Layer Perceptron (MLP) consists of:

Input layer: L₀ neurons receiving feature vector x ∈ ℝⁿ
Hidden layers: L₁, L₂, ..., Lₖ with non-linear activations
Output layer: Lₒ producing prediction ŷ ∈ ℝᵐ

Forward Pass

For layer l with weight matrix W⁽ˡ⁾ and bias vector b⁽ˡ⁾:

z⁽ˡ⁾ = W⁽ˡ⁾a⁽ˡ⁻¹⁾ + b⁽ˡ⁾
a⁽ˡ⁾ = activation(z⁽ˡ⁾)

Where a⁽⁰⁾ = x (input) and a⁽ᴸ⁾ = ŷ (output).

Universal Approximation

A feedforward network with at least one hidden layer containing a finite number of neurons can approximate any continuous function on compact subsets of ℝⁿ (Universal Approximation Theorem).

This explains why neural networks are so powerful—they can theoretically learn any pattern!

🏗️ Network Architecture Example

A simple network: 3 inputs → 4 hidden neurons → 2 outputs

Input Layer

x₁

x₂

x₃

→

Hidden Layer

→

Output Layer

ŷ₁

ŷ₂

            Parameter Count: 
            Input→Hidden: 3×4 = 12 weights + 4 biases = 16 parameters

            Hidden→Output: 4×2 = 8 weights + 2 biases = 10 parameters

            Total: 26 learnable parameters

🎮 Neural Network Playground

Now it's your turn! Below is a simple neural network that learns to classify points. The blue points are one class, orange are another. The network learns to draw a boundary between them.

Interactive Classifier

Click "Train" to watch the network learn! The background color shows what the network predicts.

Epoch: 0

Loss: --

Accuracy: --

            What you see: The network starts with random guesses (messy background). 
            As it trains, the decision boundary becomes clearer. Blue regions = predict blue class, 
            orange regions = predict orange class. The goal is to separate the dots correctly!
          

What You Learned

            🎓 Key Takeaways
            Artificial neurons mimic biological neurons using math
Weights determine input importance; biases set the threshold
Activation functions introduce non-linearity (ReLU is most popular)
Networks connect neurons in layers to learn complex patterns
Deep networks can theoretically learn any function!

          

In Level 3, we'll explore why deep networks were historically hard to train and how a breakthrough called ResNet changed everything!

            Summary of Neural Network Fundamentals
            Perceptron: y = activation(w·x + b), the fundamental computation unit
Parameters: Weights and biases learned via gradient descent
Activation Functions: Non-linearities enabling universal approximation
Architecture: Layer-wise composition of affine transformations and non-linearities
Universal Approximation: Sufficiently wide networks can represent any continuous function

          

Next, we examine training challenges in deep architectures and the residual connection innovation that enabled modern deep learning.