Lessons
The lessons split into Fundamentals—Training and Generation, which everyone should do—and Extensions you can mix and match based on interests and available time. New to LLMs Unplugged? Do Training, then Generation, then pick whichever extensions catch your eye.
The Fundamentals (and a couple of extensions) come in two flavours with a toggle at the top of the page; both teach the same core ideas, so pick whichever suits your group.
The grid flavour uses pen, paper and dice. Students fill in a tally-mark table during training, then roll a d10 to sample from the counts during generation (a 60%/40% split between two candidate next words becomes “roll a d10: 1-6 picks the first, 7-10 picks the second”). It plugs straight into probability in the maths curriculum—if your group isn't comfortable with that style of sampling yet, run Weighted Randomness first as a warm-up.
The cutouts flavour swaps dice for a spread of printed tokens. Students cut them out, lay them on a table, and play a colour-matching game to generate text—the maths happens automatically (more cutouts means more likely), so it works for any age. The cost is prep: someone has to generate, print and cut out the tokens beforehand (the Tools page has ready-to-print packs).
Fundamentals
Training
Build a bigram language model that tracks which words follow which other words in text.
Generation
Use your hand-built bigram model to generate new text through weighted random sampling.
Extensions
These can be done in any order after the Fundamentals. Each one explores a different aspect of how modern language models work.
Scaling up
Pre-trained Model Generation
Use a provided pre-trained booklet to generate text without training your own model.
Trigram
Extend the bigram model to use two words of context for better predictions.
Controlling output
Sampling
Same model, different output---experiment with temperature and truncation strategies to shape how your model picks the next word.
Beam Search
Explore multiple generation paths simultaneously and pick the best overall sequence.
Agentic Tool Use
Turn your model into an agent by teaching it to call external tools when it needs information it doesn't have.
Context and meaning
These two work well together—context columns extends the grid model, and word embeddings uses those extended grids to explore semantic similarity.
Context Columns
Add context columns to your bigram model to capture grammatical patterns, then use them during generation.
Word Embeddings
Turn each word's row into a vector and measure similarities between words in your model.
Model tuning
LoRA
Add a lightweight adaptation layer to retarget a trained model without retraining everything.
Synthetic Data
Generate synthetic text with your model, retrain on it, and see how patterns drift or collapse.
RLHF
Use human preferences to adjust your model's weights, making it generate text people prefer.