Training

Choose your method: This lesson can be done with either a grid (paper and dice) or buckets (physical tokens). Pick whichever suits your group best; both teach the same concepts.

Build a bigram language model that tracks which words follow which other words in text.

Hero image: Grid Training

You will need

some text (e.g. a few pages from a kids book, but it can be anything)
pen, pencil, and grid paper

Your goal

Produce a grid that captures the patterns in your input text data. This grid is your bigram language model. Stretch goal: keep training your model on more input text.

Key idea

Language models learn by counting patterns in text. Training means building a model (filling out the grid) to track which words follow other words.

Algorithm

Preprocess your text
- convert everything to lowercase
- treat words, commas, and full stops as separate “words” (ignore other punctuation and whitespace)
Set up your grid
- take the first word from your text
- write it in both the first row header and first column header of your grid
Fill in the grid one word pair at a time
- find the row for the first word (in your training text) and the column for the second word
- add a tally mark in that cell (if the word isn’t in the grid yet, add a new row and column for it)
- shift along by one word (so the second word becomes your “first” word) and repeat until you’ve gone through the entire text

Example

Before you try training a model yourself, work through this example to see the algorithm in action.

Original text: “See Spot run. See Spot jump. Run, Spot, run. Jump, Spot, jump.”

Preprocessed text: see spot run . see spot jump . run ,spot , run . jump , spot , jump .

After the first two words (see spot) the model looks like:

	`see`	`spot`
`see`		\|
`spot`

After the full text the model looks like:

	`see`	`spot`	`run`	`.`	`jump`	`,`
`see`		\|\|
`spot`			\|		\|	\|\|
`run`				\|\|		\|
`.`	\|		\|		\|
`jump`				\|\|		\|
`,`		\|\|	\|		\|

Instructor notes

Discussion questions

what can you tell about the input text by looking at the filled-out bigram model grid?
how does including punctuation as “words” help with sentence structure?
are there any other ways you could have written down this exact same model?
how could you use this model to generate new text in the style of your input/training data?

Troubleshooting

“Do I add a new row/column for every word?” No—each new word only gets a new row and column the first time you see it. After that, just find the existing row and column and add a tally mark.

Connection to current LLMs

This counting process is exactly what happens during the “training” phase of language models:

training data: your paragraph vs trillions of words from the internet
learning/training process: hand counting vs automated counting by computers
storage: your paper model vs billions of parameters in memory

The key insight: “training” a language model means counting patterns in text. Your hand-built model contains the same type of information that current LLMs store—at a vastly smaller scale.

Step through the training process at your own pace. Enter your own text or use the example, then press Play or Step to watch the model being built.

	`the`	`cat`	`sat`	`on`	`mat`	`.`
`the`
`cat`
`sat`
`on`
`mat`
`.`

Training ​

You will need ​

Your goal ​

Key idea ​

Algorithm ​

Example ​

Instructor notes ​

Discussion questions ​

Troubleshooting ​

Connection to current LLMs ​

Interactive widget ​