The Artificial Neuron
Remember vectors from Level 1? Now we're going to use them to build something amazing: artificial neurons.
Think of a neuron like a tiny decision-maker. It takes some inputs, does some math, and decides whether to "fire" (activate) or not.
Artificial neurons: They do the same thing with math!
How a Neuron Works
A simple artificial neuron does three things:
- Multiply each input by its weight
- Add them all up (plus a bias)
- Decide using an activation function
Try clicking the input nodes below to see the neuron in action!
The Perceptron Model
The perceptron, introduced by Rosenblatt in 1958, is the fundamental building block of neural networks. It computes a binary output from real-valued inputs using learnable weights.
Mathematical Formulation
Given input vector x = (x₁, x₂, ..., xₙ) and weights w = (w₁, w₂, ..., wₙ), the perceptron computes:
Or in vector notation:
Where:
w · xis the dot product of weights and inputsbis the bias term (threshold adjustment)activationis a non-linear function
Biological Inspiration
Biological neurons receive electrochemical signals through dendrites. When the sum of incoming signals exceeds a threshold, the neuron fires an action potential down its axon to connected neurons.
Artificial neurons abstract this process: inputs represent signals, weights represent synaptic strengths, and the activation function represents the thresholding behavior.
🧠 Interactive Neuron
Click the input nodes (x₁, x₂) to toggle them on/off. Watch how the neuron computes the output!
Weights and Biases: The Learning Parameters
Here's the secret to how neurons learn: weights and biases.
Weights: How Important Is Each Input?
Weights tell the neuron which inputs matter more. A big weight means "pay attention!" A small (or negative) weight means "ignore this" or "this counts against."
- Weight = 10: Dark clouds (very important!)
- Weight = 2: Windy (somewhat important)
- Weight = -5: Sunny (counts against bringing umbrella)
Biases: The Threshold
The bias is like a starting point. It shifts the decision threshold up or down.
- High bias: Neuron fires easily (optimistic)
- Low bias: Neuron needs strong inputs to fire (pessimistic)
During training, the AI adjusts all these weights and biases to make better predictions. That's the "learning" part!
Parameters and Decision Boundaries
Weight Parameters
Weights determine the relative importance of each input feature. In geometric terms, the weight vector w defines the orientation of the decision boundary:
This equation describes a hyperplane that divides the input space into two regions. Points on one side produce positive outputs (class 1), points on the other side produce negative outputs (class 0).
Bias Parameter
The bias b translates the decision boundary away from the origin. Without bias, all hyperplanes would be constrained to pass through the origin, severely limiting the representational capacity.
Learning as Parameter Optimization
Training consists of finding optimal values for weights and biases that minimize prediction error on the training set. For a perceptron with n inputs, there are n + 1 parameters to learn (n weights + 1 bias).
Activation Functions: Making Decisions
After adding up all the weighted inputs, the neuron needs to make a decision. That's where activation functions come in!
Think of activation functions as filters that decide how much the neuron should "fire."
Common Activation Functions
Non-linear Activation Functions
Activation functions introduce non-linearity, enabling neural networks to approximate complex functions. Without non-linear activations, deep networks would collapse to equivalent single-layer linear models.
Properties of Good Activation Functions
- Non-linear: Enable approximation of arbitrary functions
- Differentiable: Allow gradient-based optimization
- Computationally efficient: Fast to compute during training
- Well-behaved gradients: Avoid vanishing or exploding gradients
📊 Activation Function Explorer
Which Activation Should You Use?
- Step: Simple, but too harsh (used historically)
- Sigmoid: Smooth, outputs 0-1 (good for probabilities)
- ReLU: Fast, simple, works great (most common today!)
- Tanh: Similar to sigmoid, but outputs -1 to 1
Why ReLU is popular: It's fast to compute, doesn't have the "vanishing gradient" problem, and works surprisingly well in practice. Most modern neural networks use ReLU or variants of it.
Why ReLU Dominates
ReLU (Rectified Linear Unit) has become the default choice for hidden layers because:
- Computational efficiency: Simple max operation vs. expensive exponentials
- Gradient flow: For x > 0, gradient is 1 (no vanishing gradient)
- Sparsity: Outputs exact zero for half the inputs
- Biological plausibility: Similar to neural firing thresholds
However, ReLU has a "dying ReLU" problem—neurons can become permanently inactive. Variants like Leaky ReLU and ELU address this.
From Neurons to Networks
One neuron can make simple decisions, but real intelligence requires networks of neurons. When we connect neurons together, magic happens!
Layers of Neurons
Neural networks are organized in layers:
- Input Layer: Receives the raw data (like pixel values)
- Hidden Layers: Process and transform the information
- Output Layer: Produces the final prediction
Information flows from input → hidden → output. Each layer learns to extract more complex features from the data.
Multi-Layer Perceptron Architecture
A Multi-Layer Perceptron (MLP) consists of:
- Input layer: L₀ neurons receiving feature vector x ∈ ℝⁿ
- Hidden layers: L₁, L₂, ..., Lₖ with non-linear activations
- Output layer: Lₒ producing prediction ŷ ∈ ℝᵐ
Forward Pass
For layer l with weight matrix W⁽ˡ⁾ and bias vector b⁽ˡ⁾:
a⁽ˡ⁾ = activation(z⁽ˡ⁾)
Where a⁽⁰⁾ = x (input) and a⁽ᴸ⁾ = ŷ (output).
Universal Approximation
A feedforward network with at least one hidden layer containing a finite number of neurons can approximate any continuous function on compact subsets of ℝⁿ (Universal Approximation Theorem).
This explains why neural networks are so powerful—they can theoretically learn any pattern!
🏗️ Network Architecture Example
A simple network: 3 inputs → 4 hidden neurons → 2 outputs
Hidden→Output: 4×2 = 8 weights + 2 biases = 10 parameters
Total: 26 learnable parameters
🎮 Neural Network Playground
Now it's your turn! Below is a simple neural network that learns to classify points. The blue points are one class, orange are another. The network learns to draw a boundary between them.
Interactive Classifier
Click "Train" to watch the network learn! The background color shows what the network predicts.
What You Learned
🎓 Key Takeaways
- Artificial neurons mimic biological neurons using math
- Weights determine input importance; biases set the threshold
- Activation functions introduce non-linearity (ReLU is most popular)
- Networks connect neurons in layers to learn complex patterns
- Deep networks can theoretically learn any function!
In Level 3, we'll explore why deep networks were historically hard to train and how a breakthrough called ResNet changed everything!
Summary of Neural Network Fundamentals
- Perceptron: y = activation(w·x + b), the fundamental computation unit
- Parameters: Weights and biases learned via gradient descent
- Activation Functions: Non-linearities enabling universal approximation
- Architecture: Layer-wise composition of affine transformations and non-linearities
- Universal Approximation: Sufficiently wide networks can represent any continuous function
Next, we examine training challenges in deep architectures and the residual connection innovation that enabled modern deep learning.