The Foundations

Before you can love or hate AI, you have to know what it actually is. Here's the 15-minute version — neural networks, training, and the difference between AI, ML, and DL.

AI vs ML vs DL

These three terms get used interchangeably. They're not the same. Think of them as nested Russian dolls.

AI

Artificial Intelligence

The biggest box. Any technique that makes a machine seem intelligent — including hand-coded rules, search algorithms, and yes, machine learning.

ML

Machine Learning

A subset of AI. Instead of writing rules, you show the machine examples and it figures out the patterns. Linear regression to random forests live here.

DL

Deep Learning

A subset of ML using neural networks with many layers. This is what powers modern AI — image recognition, LLMs, all of it.

What's a Neural Network?

A neural network is a stack of "neurons" — simple math units — connected by weighted lines. Inputs come in the left, predictions come out the right.

INPUT HIDDEN HIDDEN OUTPUT
A 4-layer neural network — animated

How it actually works

  • Inputs become numbers (a pixel, a word, a stock price).
  • Each connection has a weight. Each neuron multiplies its inputs by weights, sums them, and squashes the result through an activation function.
  • The output is a prediction. At first, it's random garbage.
  • We compare the prediction to the right answer, calculate the error, and use a trick called backpropagation to nudge the weights in the direction that reduces the error.
  • Repeat ~a billion times. The network gets eerily good at the task.

Three Flavors of Learning

Supervised

You give the model labeled examples ("this image is a cat", "this email is spam"). It learns the mapping. The most common kind.

Unsupervised

No labels — the model finds structure on its own. Clustering customers, finding anomalies, learning embeddings.

Reinforcement

The model takes actions in an environment and gets rewards or penalties. How AlphaGo learned to play, and how LLMs get fine-tuned with RLHF.

Training vs Inference

Two completely different stages of an AI model's life.

Training

The slow, expensive part. You feed the model billions of examples and gradually tune billions of weights. Takes weeks, costs millions, melts GPUs.

Done once. (Or every few months.)

Inference

The fast, cheap part. The model is already trained — now you just feed in a new input and get a prediction. Milliseconds, not weeks.

Done every time you use ChatGPT.

Key Terms You'll See

Parameters

The weights inside the network. GPT-4 has roughly a trillion. More parameters ≈ more capacity to learn (up to a point).

Loss / Error

How wrong the model is. Training is just "minimize the loss" — repeated forever.

Gradient Descent

The optimization trick. The gradient tells you which direction reduces the loss; you take a tiny step that way. Then another. And another.

Overfitting

When a model memorizes its training data instead of learning real patterns. Tested in the wild, it fails. The classic ML villain.

Epochs & Batches

An epoch is one full pass over the training data. A batch is a small chunk processed at once. You train for many epochs, batch by batch.

Tensor

A multi-dimensional array of numbers. The fundamental data structure of deep learning. (PyTorch and TensorFlow are named after them.)

Up Next