Before you can love or hate AI, you have to know what it actually is. Here's the 15-minute version — neural networks, training, and the difference between AI, ML, and DL.
These three terms get used interchangeably. They're not the same. Think of them as nested Russian dolls.
The biggest box. Any technique that makes a machine seem intelligent — including hand-coded rules, search algorithms, and yes, machine learning.
A subset of AI. Instead of writing rules, you show the machine examples and it figures out the patterns. Linear regression to random forests live here.
A subset of ML using neural networks with many layers. This is what powers modern AI — image recognition, LLMs, all of it.
A neural network is a stack of "neurons" — simple math units — connected by weighted lines. Inputs come in the left, predictions come out the right.
You give the model labeled examples ("this image is a cat", "this email is spam"). It learns the mapping. The most common kind.
No labels — the model finds structure on its own. Clustering customers, finding anomalies, learning embeddings.
The model takes actions in an environment and gets rewards or penalties. How AlphaGo learned to play, and how LLMs get fine-tuned with RLHF.
Two completely different stages of an AI model's life.
The slow, expensive part. You feed the model billions of examples and gradually tune billions of weights. Takes weeks, costs millions, melts GPUs.
Done once. (Or every few months.)
The fast, cheap part. The model is already trained — now you just feed in a new input and get a prediction. Milliseconds, not weeks.
Done every time you use ChatGPT.
The weights inside the network. GPT-4 has roughly a trillion. More parameters ≈ more capacity to learn (up to a point).
How wrong the model is. Training is just "minimize the loss" — repeated forever.
The optimization trick. The gradient tells you which direction reduces the loss; you take a tiny step that way. Then another. And another.
When a model memorizes its training data instead of learning real patterns. Tested in the wild, it fails. The classic ML villain.
An epoch is one full pass over the training data. A batch is a small chunk processed at once. You train for many epochs, batch by batch.
A multi-dimensional array of numbers. The fundamental data structure of deep learning. (PyTorch and TensorFlow are named after them.)