Agentic Tool Use
Key idea: An agent is a model that runs tools in a loop---recognising when it needs external help, delegating to a tool, and continuing generation with the result.
Turn your Language model A system that predicts what text comes next based on patterns learned from training data. Your hand-built grid or cutouts spread is a language model. View in glossary into an Agent A language model that runs tools in a loop to achieve a goal. Instead of generating text directly, an agent calls external tools and uses their results to continue. View in glossary by giving it access to external tools—people, objects, or resources in the room—that it can call when it needs information beyond what’s in the grid.

You will need
- a completed model from Training
- pen, paper, and dice as per Generation
- people or things to serve as “tools” (see examples below)
Your goal
Generate text where the model acts as an agent. Stretch goal: design your own tool and integrate it into your model.
Key idea
What makes a language model an “agent”? In practice, it comes down to Agentic tool use The ability of language models to act as agents by recognising when to call external tools (like calculators or search engines) in a loop rather than generating text directly. View in glossary —a model that can recognise special tokens triggering external actions, pause generation, call a tool, and incorporate the result before continuing. That loop of generate → call tool → incorporate result → keep generating is the core of agentic AI.
Setting up tools
Before generation, choose a person or object who gets to role-play as the “tool”. Each tool has:
- a trigger word that appears in your model’s vocabulary
- a capability (what it can do)
- a return format (what it gives back)
Example tools
| Trigger word | Tool | Capability | Returns |
|---|---|---|---|
ACTION | a chosen person | perform a small physical action | a word for what they did |
GOOGLE | someone with a phone | search the web for a continuation | one word from the top result |
FRIEND | everyone texts a friend | ask for a single-word continuation | first reply wins |
These three cover the main flavours of real-world tool use: ACTION reaches
into the room (physical, immediate), GOOGLE reaches out to the web
(informational, distant), and FRIEND reaches a specific human (escalation).
Algorithm
- Add trigger words to your model’s vocabulary (new rows and columns).
- Train or manually add counts so trigger words can appear in generation. Each
tool-trigger cutout ends with a closing token (e.g.,
.) on the same cutout, so the chain resumes cleanly once the tool returns. - During generation, when you sample a trigger word:
- pause generation
- formulate a question or request based on context
- the tool “executes” and returns a result
- write the result down in the output text
- write the closing token from the same cutout and continue generation from there
This is the agentic loop: your model generates until it hits a trigger, hands off to a tool, gets a result, and keeps going. Real AI agents do exactly the same thing—just faster and with more tools.
Example session
Suppose your model has ACTION and GOOGLE added as trigger tokens, each
followed by a closing . on the same cutout.
- Start with “We”
- Sample → “should”
- Sample →
GOOGLE(trigger!)- Pause. Operator googles “what should we do today”
- Picks a word from the top result: “exercise”
- Write down “exercise”
- Write down ”.” (the closing token on the
GOOGLEcutout)
- Sample → “I”
- Sample →
ACTION(trigger!)- Pause. Operator picks a student, who claps
- Write down “clapped”
- Write down ”.” (the closing token on the
ACTIONcutout)
Generated text: “We should exercise. I clapped.”
The operator’s words (“exercise”, “clapped”) get written into the output, but the next cutout picks up from the closing token on the trigger’s cutout, not from the result. This means tools can return literally anything—out-of-vocab words, phrases, even sounds—without breaking the generation loop.
Instructor notes
Designing good tool triggers
For the activity to work well:
- add trigger words to cutouts where they fit contextually (e.g.,
GOOGLEafter “a” or “the”,ACTIONafter “we” or “I”) - every trigger cutout closes with a common token like
.on the same cutout—this silently bridges the tool’s result back into the model’s vocabulary - the operator’s return can be anything: an out-of-vocab word, a phrase, even a noise. Continuation is keyed on the closing token, not the result
- have tool operators ready (and phones charged) before you start generating
Other tools to try
If you want to extend the activity, design more tools using the same pattern:
LOOK— a physical object in the room returns a word for what it isNAME— the room points at someone, returns their nameTIME— a clock or watch returns the current timeASK— a designated expert returns a short phrase
The shape is always: something outside the model returns a word, the closing token on the trigger’s cutout brings you back.
Discussion questions
- when should an agent use a tool vs try to answer itself?
- what happens if a tool returns something unexpected?
- how does the model “know” to call a tool? (it doesn’t—it just samples the trigger word)
- what tools would be most useful for different kinds of text?
- could a tool’s response change what the agent generates next?
Classroom variations
Simple version: use just one tool (FRIEND) and have the whole class participate. The model generates until it hits FRIEND, then everyone texts a friend asking for a single-word continuation—the first reply wins.
Advanced version: set up multiple tools around the room. Different students operate different tools. The agent doesn’t know which tool will be called next.
Multi-step version: chain tool calls together—the result of one tool becomes the context for calling another. This is closer to how real AI agents plan and execute multi-step tasks.
Connection to current LLMs
“Agentic AI” has become a buzzword, but in practice it really just means tool use in a loop. As Simon Willison puts it, an
LLM (Large Language Model) A language model trained on a very large amount of text, with billions of parameters. The hand-built models in these lessons are tiny language models; ChatGPT, Claude, and Gemini are large language models. The core principles are identical---the difference is scale. View in glossary agent is something that “runs tools in a loop to achieve a goal”—and that’s exactly what your model is doing.
Tool use The mechanism by which a language model calls external tools (calculators, search engines, databases, code runners) during generation. Modern LLMs output structured tool calls; in the unplugged activity, sampling a "trigger word" plays the same role. View in glossary (also called “function calling”) is how modern AI assistants perform actions in the world:
- the agentic loop: generate → detect tool call → execute tool → feed result back → continue generating, exactly like your trigger-word cycle
- examples: web search, code execution, API calls, database queries, image generation
- structured calls: modern models output JSON-formatted tool calls (function name, parameters) rather than just trigger words, but the mechanism is the same
- chaining: real agents chain multiple tool calls to complete complex tasks—planning, executing, observing results, and adjusting
The key insight: the model doesn’t “know” anything the tool returns—it just learns when to ask. Your classroom tools demonstrate this perfectly: the model samples GOOGLE not because it knows the answer, but because the training data included GOOGLE in that context. The actual knowledge comes from outside the model.
This is why tool-using AI agents can do things like search the web for current information, run calculations they couldn’t do in their head, or control robots and software. The model’s job is to know when to call a tool and how to use the result—not to contain all knowledge itself.