Generation

Choose your method: This lesson can be done with either a grid (paper and dice) or buckets (physical tokens). Pick whichever suits your group best; both teach the same concepts.

Weighted randomness

This lesson involves rolling dice to sample from weighted probability distributions. If your students need extra support with this concept, consider running the Weighted Randomness lesson first.

Use a pre-trained (hand-built) bigram model to generate new text through weighted random sampling.

Hero image: Grid Generation

You will need

your completed bigram model from Training
a d10 (or similar) for weighted sampling
pen and paper for jotting down the generated text

Your goal

Generate new text from your bigram language model. Stretch goal: keep going and write a whole story.

Key idea

A language model proposes several possible next words along with how likely each is. Dice rolls pick among those options, and repeating the process word by word yields fluent text.

Algorithm

Choose a starting word from the first column of your grid.
Look at that word’s row to find all possible next words and their counts.
Roll dice weighted by the counts (see Weighted Randomness).
Write down the chosen word and make it your new starting word.
Repeat from step 2 until you hit a natural stopping point (e.g., .) or your desired length.

Example

Before you try generating text yourself, work through this example to see the algorithm in action.

Using the same bigram model from the example in Training:

	`see`	`spot`	`run`	`.`	`jump`	`,`
`see`		\|\|
`spot`			\|		\|	\|\|
`run`				\|\|		\|
`.`	\|		\|		\|
`jump`				\|\|		\|
`,`		\|\|	\|		\|

choose (for example) see as your starting word
see (row) → spot (column); it’s the only option, so write down spot as next word
spot → run (25%), jump (25%) or , (50%); roll dice to choose
let’s say dice picks run; write it down
run → . (67%) or , (33%); roll dice to choose
let’s say dice picks .; write it down
. → see (33%), run (33%) or jump (33%); roll dice to choose
let’s say dice picks see; write it down
see → spot; it’s the only option, so write down spot… and so on

After the above steps, the generated text is “see spot run. see spot”

Instructor notes

Discussion questions

how does the starting word affect your generated text?
why does the text sometimes get stuck in loops?
if this is a bigram (i.e. 2-gram) model, how would a unigram (1-gram) model work?
how could you make generation less repetitive?
does the generated text capture the style of your training text?

Troubleshooting

“Every row only has one tally mark—there’s nothing to roll for.” If the group didn’t get very far in Training and no row has more than one tally mark, the generation algorithm won’t be very interesting—there will only ever be one option for the next word and they’ll be stuck on rails. In this case, either encourage them to go back and do a bit more training, or just have them add some extra tally marks to the grid wherever they like. This isn’t as much like cheating as it might seem—it’s really just an example of using Synthetic Data.
“I landed on a word that doesn’t have its own row.” This can happen if a word only ever appeared as the last word in the training text—it has a column (other words lead to it) but no row (it never leads to anything). Just pick any other word that does have a row and continue from there.
“We’re stuck in a loop.” With small models it’s common to bounce between two words that only point at each other (e.g. , → spot → , → spot → …). Try picking a different starting word, or just choose any other valid next word to break out of the cycle.

Connection to current LLMs

This generation process is identical to how current LLMs produce text:

sequential generation: both generate one word at a time
probabilistic sampling: both use weighted random selection (exactly like your dice or tokens)
probability distribution: neural network outputs probabilities for all 50,000+ possible next tokens
no planning: neither looks ahead—just picks the next word
variability: same prompt can produce different outputs due to randomness

The fact: sophisticated AI responses emerge from this simple process repeated thousands of times. Your paper model demonstrates that language generation is fundamentally about sampling from learned probability distributions. The randomness is why LLMs give different responses to the same prompt and why language models can be creative rather than repetitive. These physical sampling methods demonstrate the exact mathematical operation happening billions of times per second inside modern language models.

Note: in AI/ML more broadly, this process of using a trained model to produce outputs is commonly called “inference”—you may encounter this term in other contexts. In these teaching resources we use “generation” specifically because it more clearly describes what language models do: they generate text.

Step through the generation process at your own pace. Click on a row to select a starting word, then press Play or Step to watch the dice roll and text being generated. You can also edit the training text to create your own model.

	`the`	`cat`	`sat`	`on`	`mat`	`.`
`the`		\|			\|
`cat`			\|
`sat`				\|
`on`	\|
`mat`						\|
`.`

Generation ​

You will need ​

Your goal ​

Key idea ​

Algorithm ​

Example ​

Instructor notes ​

Discussion questions ​

Troubleshooting ​

Connection to current LLMs ​

Interactive widget ​