For educators

Hero image: For educators

INFO

These resources are under active development. If there’s something you’d like to see, please get in touch.

TIP

These lesson plans have been tested with high-school students and up (including tertiary students). They’re also suitable across all subject areas (not just Computer Science/Digital Techologies).

We’re working on some slightly modified versions which work for younger learners as well—with the right support they can absolutely grasp the concepts involved. We’ll update the lesson plans as we road-test them. If you’ve got ideas or feedback, we’d love to hear them.

Lesson plan 1: LLMs Unplugged Fundamentals

time: 90mins
for ages: 12+

This core workshop covers the essential training-to-generation pipeline. Start with a brief introduction to set the scene, then move through training a model, generating text from it, and finally exploring what happens when you use a larger pre-trained model. Each step builds on the last, giving students a complete picture of how language models work.

Introduction

What is a language model, and why learn about them with pen, paper, and dice?

Training

Build a bigram language model that tracks which words follow which other words in text.

Generation

Use your hand-built bigram model to generate new text through weighted random sampling.

Pre-trained Model Generation

Use a provided pre-trained booklet to generate text without training your own model.

Suggested timing

00:20 Intro
00:20 Training
00:40 Generation
01:00 Pre-trained model generation
01:20 close — how has this workshop changed how you think about language models? how has this workshop changed how you will use language models?

Notes for Fundamentals

this outline doesn’t include the Weighted randomness lesson, but if your students aren’t so familiar with that stuff then you could add it in before the Training lesson (add another 30mins)
once you get to the Generation lesson and beyond, get students to do “dramatic readings” as they share back the text their new language models have generated…
if you have a bit longer, then adding the Sampling lesson at the end is a fun option—it builds on either the Generation or the Pre-trained model generation work and shows how different parts of the “LLM process” can have different effects on the output

Lesson plan 2: going deeper

time: 2–3 hours (or split across sessions)
for ages: senior high school or particularly engaged groups

For students ready to go further, this extended trajectory adds the “how models understand” topic. After covering the fundamentals, you explore how models can track grammatical context and how words get represented as numerical vectors. This path suits later-year high school students, computing electives, or keen beans who want to understand what “attention” and “embeddings” actually mean.

Introduction

What is a language model, and why learn about them with pen, paper, and dice?

Training

Build a bigram language model that tracks which words follow which other words in text.

Generation

Use your hand-built bigram model to generate new text through weighted random sampling.

Pre-trained Model Generation

Use a provided pre-trained booklet to generate text without training your own model.

Context Columns

Add context columns to your bigram model to capture grammatical patterns, then use them during generation.

Word Embeddings

Turn each word's row into a vector and measure similarities between words in your model.

What these additions cover

Context columns extends the basic bigram model with extra columns that track grammatical categories (is the previous word a verb? a pronoun? a preposition?). This is a hand-crafted version of what transformer “attention” learns automatically—the idea that type of context matters, not just which specific word came before.
Word embeddings turns each word’s row in the model into a numerical vector and measures similarities between words. Words that behave similarly in the training text end up close together. This is the foundation of how modern LLMs represent meaning—and students can calculate it by hand.

Why split the trajectory?

The fundamentals work for any audience and require only 90 minutes. The “understanding” lessons require more time and comfort with abstraction, but they connect directly to concepts students will encounter in any deeper study of AI: attention mechanisms, embeddings, vector similarity. Running them as a second session (or a follow-up for interested students) keeps the core workshop accessible while offering a clear path forward.

Lesson plan 3: controlling output

time: 30mins (as an add-on)
for ages: 14+

Once students can generate text, a natural question is: “How do you make it more or less creative?” The sampling lesson shows how temperature and truncation strategies change the character of output without changing the model itself. This is a quick add-on to either the fundamentals or the deeper trajectory.

Sampling

Experiment with temperature and truncation strategies to shape how your model picks the next word.

This lesson explains:

temperature: how dividing counts by a temperature value flattens or sharpens the distribution, making surprising words more or less likely
truncation: strategies like greedy selection, no-repeat, or even haiku constraints that narrow which words are eligible before sampling

Students discover that “creativity” in AI comes from two controls: adjusting probability distributions and filtering which tokens to consider. The same model can produce cautious prose or wild poetry just by tweaking these parameters.

Adaptation and data

For classes focused on data science, ethics, or media literacy, the “Adaptation and data” topic explores what happens when models train on their own output.

Fundamentals

Core concepts for building and using language models. Train a bigram model by hand and generate text.

Introduction

What is a language model, and why learn about them with pen, paper, and dice?

Weighted Randomness

Learn how to make random choices where some options are more likely than others---a core generative AI operation.

Training

Build a bigram language model that tracks which words follow which other words in text.

Generation

Use your hand-built bigram model to generate new text through weighted random sampling.

Scaling up

Move beyond hand-built models to explore pre-trained models and longer context windows with trigrams.

Pre-trained Model Generation

Use a provided pre-trained booklet to generate text without training your own model.

Trigram

Extend the bigram model to use two words of context for better predictions.

Controlling output

Learn how sampling strategies like temperature and truncation shape generated text without changing the underlying model.

Sampling

Experiment with temperature and truncation strategies to shape how your model picks the next word.

Beam Search

Explore multiple generation paths simultaneously and pick the best overall sequence.

Agentic Tool Use

Turn your model into an agent by teaching it to call external tools when it needs information it doesn't have.

Context and meaning

Explore how models use context and represent word meaning through embeddings.

Context Columns

Add context columns to your bigram model to capture grammatical patterns, then use them during generation.

Word Embeddings

Turn each word's row into a vector and measure similarities between words in your model.

The Synthetic data lesson is particularly effective for discussions about:

AI-generated content flooding the internet
model collapse and why training data quality matters
the difference between human-written and AI-generated text

This works well as a standalone activity after students have done basic training and generation, or as part of a broader unit on AI ethics and media literacy.

For educators ​

Lesson plan 1: LLMs Unplugged Fundamentals ​

Introduction

Training

Generation

Pre-trained Model Generation

Suggested timing ​

Notes for Fundamentals ​

Lesson plan 2: going deeper ​

Introduction

Training

Generation

Pre-trained Model Generation

Context Columns

Word Embeddings

What these additions cover ​

Why split the trajectory? ​

Lesson plan 3: controlling output ​

Sampling

Adaptation and data ​

Fundamentals

Introduction

Weighted Randomness

Training

Generation

Scaling up

Pre-trained Model Generation

Trigram

Controlling output

Sampling

Beam Search

Agentic Tool Use

Context and meaning

Context Columns

Word Embeddings

For educators

Lesson plan 1: LLMs Unplugged Fundamentals

Suggested timing

Notes for Fundamentals

Lesson plan 2: going deeper

What these additions cover

Why split the trajectory?

Lesson plan 3: controlling output

Adaptation and data