LLMs Unplugged

Understand AI by building it yourself

What is this about?

you’ll build your own language model—from scratch—with just a kids book, pen & paper, and some dice rolling

you’ll learn how language models work by spotting patterns in text to generate new text

llmsunplugged.org

Training

The recipe

walk through your text and tally up which tokens follow which in a grid

Example

“Run, Spot, run. See Spot run.”

after tidying up:

run , spot , run . see spot run .

The empty grid

run,spot,run.seespotrun.

Token	`run`	`,`	`spot`	`.`	`see`
`run`
`,`
`spot`
`.`
`see`

Training: `run` → `,`

run,spot,run.seespotrun.

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|
`,`
`spot`
`.`
`see`

Training: `,` → `spot`

run,spot,run.seespotrun.

Token	`,`	`spot`
`run`	\|
`,`		\|
`spot`
`.`
`see`

Training: `spot` → `,`

run,spot,run.seespotrun.

Token	`,`	`spot`
`run`	\|
`,`		\|
`spot`	\|
`.`
`see`

, is already in our vocabulary — no new column needed!

Training: `,` → `run`

run,spot,run.seespotrun.

Token	`run`	`,`	`spot`
`run`		\|
`,`	\|		\|
`spot`		\|
`.`
`see`

Training: `run` → `.`

run,spot,run.seespotrun.

Token	`run`	`,`	`spot`	`.`
`run`		\|		\|
`,`	\|		\|
`spot`		\|
`.`
`see`

. is a new token — punctuation gets its own row and column too

Complete model

run,spot,run.seespotrun.

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

with the rest of the text tallied the same way, the grid is complete. run → . came up twice, so its cell has two marks — those counts are what make some next-words more likely than others.

Training

10:00

The language of language models

model
token
vocabulary
training

Generation

The recipe

use your grid to generate new text, rolling dice to choose each next word

Generation: start with `see`

see spot

one option — no roll needed

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

Generation: from `spot`

seespot

2 options — roll the die!

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

How the die chooses: `spot`

spot → ? roll a d10

01234

, 1 tally

56789

run 1 tally

equal tallies → equal chances

How the die chooses: `spot`

spot → ? roll a d10

01234

, 1 tally

56789

run 1 tally

equal tallies → equal chances

rolled 2 → ,

Generation: `spot` → `,`

seespot ,

rolled 2 → ,

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

Generation: from `,`

seespot,

2 options — roll the die!

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

Generation: `,` → `run`

seespot, run

rolled 7 → run

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

Generation: from `run`

seespot,run

2 options — roll the die!

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

How the die chooses: `run`

run → ? roll a d10

0123456

. 2 tallies

789

, 1 tally

more tallies → more faces → more likely

How the die chooses: `run`

run → ? roll a d10

0123456

. 2 tallies

789

, 1 tally

more tallies → more faces → more likely

rolled 3 → .

Generation: `run` → `.`

seespot,run .

rolled 3 → .

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

Generation: from `.`

seespot,run. see

one option — no roll needed

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

Generation: back to `see`

seespot,run.see

one option — no roll needed

Token	`run`	`,`	`spot`	`.`	`see`
`run`		\|		\|\|
`,`	\|		\|
`spot`	\|	\|
`.`					\|
`see`			\|

Generated text

“see spot, run. see”

a new sentence — not in the training data!

Generation

10:00

Shareback

The language of language models

prompt
response/completion
context window

Sycophancy

What is sycophancy?

a model that always agrees with you:

“you’re absolutely right”

“that’s a great insight”

“what a thoughtful question”

real LLMs are notoriously prone to it—partly from RLHF (human raters reward agreeable answers), partly from training data (the internet is full of flattery)

Your goal

train some more on a page of pure flattery: tally the sycophancy text into your existing grid, then generate again from the same starting word and watch the output drift toward agreement

it’s the same training you already did—just more text poured into the same grid

You will need

back in pairs

your trained grid (already on the table)

the sycophancy text sheet to tally in

dice, pen and paper to write down the generated text

Sycophancy

10:00

Shareback

The language of language models

RLHF

alignment

training-data bias

helpful / honest / harmless

Agentic AI

What makes a model an “agent”?

an agent is a model that can call tools: pause generation, get information from “outside” the model, and continue

the loop: generate → trigger token → call tool → write the result → keep going

The recipe

generate from your model as before, but every punctuation token triggers a tool call: text the sentence so far to 3 friends, and the whole first reply goes into your text—followed by the punctuation you rolled

Worked example

your text so far is “the cat sat”, and the next dice roll gives you . as the next token—pause, that’s a tool call

text “the cat sat” (the whole sentence so far) to 3 friends; the first reply back might be “down by the river”

write down by the river, then the . you rolled anyway, and continue generating from .

You will need

your model, dice, pen and paper (as before)

a phone

3 friends who text back quickly

Agentic AI

10:00

Shareback

The language of language models

agent
tool call
agentic loop
agentic AI

Scaling up

More context

your grid

run , spot , run ?

Commercial LLMs

⋯ hundreds of pages before this ⋯ run , spot , run ?

your grid sees one word; Claude sees hundreds of thousands

Attention: learning where to look

the keys I left on the kitchen table ?

from table alone you’d guess is—but it’s are, because of keys, back at the start

attention lets every earlier word vote on what comes next, and the model learns the weights—no grid could ever be that big

Tallies become knobs

your grid

run . ||||

a transformer

0.31 −1.20 0.08 0.94 −0.55 1.30 × billions

your grid stores each pattern as tally marks you can count; a transformer spreads the same patterns across billions of tunable numbers—knobs nudged during training, so similar words like dog and cat come to share them

From continuing to answering

prompt: “what is the capital of France?”

a base model “What is the capital of Germany? What is the largest…”

+ post-training “Paris.”

everything you built only continues text—post-training on example conversations is what turns a continuer into an assistant

Scale is a hell of a drug

your model

trained on a few hundred words — a few dozen tallies

frontier LLM

trained on tens of trillions of words — hundreds of billions of numbers

the same loop you ran by hand—now with billions of times the text and billions of times the numbers

Put the two scales side by side. Training data: a page or two of text versus tens of trillions of words (the open figures land around 30 trillion --- Qwen3 cites ~36T, GLM-5.2 ~28.5T --- but the number keeps climbing, so keep it as "tens of trillions"). Parameters: a few dozen tally marks versus hundreds of billions of tunable numbers (frontier dense models run into the hundreds of billions; the big mixture-of-experts ones past a trillion). Two ballparks to make it physical. Paper: at the booklet's 1cm² per parameter, a frontier model's ~1.5 trillion parameters would need a sheet of about 150 km² --- a square roughly 12 km on a side, about 100 ANU Acton campuses, or twenty-odd Lake Burley Griffins (your grid was a postcard). Time: if hand-training 100 tokens took you ~10 minutes, 30 trillion tokens at that rate is ~5.7 million years --- you'd have had to start about when our ancestors split from the chimpanzees to be finishing now. The punchline isn't any one figure --- it's that the mechanism didn't change, only the dosage.

still just tokens in → tokens out

Questions

what new questions do you have about large language models?

anu.au1.qualtrics.com/jfe/form/SV_cA2ioVFWoQRj0KW

What next?

how will this change the way you think about and use LLMs in the future?

Next sessions

Monday 10 August 12:00–14:30
Tuesday 11 August 12:00–14:00
Wednesday 16 September 12:00–14:00
Wednesday 25 November 16:00–18:00

Innovation Space, Birch Building, ANU

No public sessions are scheduled right now —get in touch to arrange one.

events.humanitix.com/host/anu-cecc-school-of-cybernetics

LLMs Unplugged#

Understand AI by building it yourself#

What is this about?#

Training#

The recipe#

Example#

The empty grid#

Training: run → ,#

Training: , → spot#

Training: spot → ,#

Training: , → run#

Training: run → .#

Complete model#

Training#

The language of language models#

Generation#

The recipe#

Generation: start with see#

Generation: from spot#

How the die chooses: spot#

How the die chooses: spot#

Generation: spot → ,#

Generation: from ,#

Generation: , → run#

Generation: from run#

How the die chooses: run#

How the die chooses: run#

Generation: run → .#

Generation: from .#

Generation: back to see#

Generated text#

Generation#

Shareback#

The language of language models#

Sycophancy#

What is sycophancy?#

Your goal#

You will need#

Sycophancy#

Shareback#

The language of language models#

Agentic AI#

What makes a model an “agent”?#

The recipe#

Worked example#

You will need#

Agentic AI#

Shareback#

The language of language models#

Scaling up#

More context#

Attention: learning where to look#

Tallies become knobs#

From continuing to answering#

Scale is a hell of a drug#

Questions#

What next?#

Next sessions

LLMs Unplugged

Understand AI by building it yourself

What is this about?

Training

The recipe

Example

The empty grid

Training: `run` → `,`

Training: `,` → `spot`

Training: `spot` → `,`

Training: `,` → `run`

Training: `run` → `.`

Complete model

Training

The language of language models

Generation

The recipe

Generation: start with `see`

Generation: from `spot`

How the die chooses: `spot`

How the die chooses: `spot`

Generation: `spot` → `,`

Generation: from `,`

Generation: `,` → `run`

Generation: from `run`

How the die chooses: `run`

How the die chooses: `run`

Generation: `run` → `.`

Generation: from `.`

Generation: back to `see`

Generated text

Generation

Shareback

The language of language models

Sycophancy

What is sycophancy?

Your goal

You will need

Sycophancy

Shareback

The language of language models

Agentic AI

What makes a model an “agent”?

The recipe

Worked example

You will need

Agentic AI

Shareback

The language of language models

Scaling up

More context

Attention: learning where to look

Tallies become knobs

From continuing to answering

Scale is a hell of a drug

Questions

What next?