Beam Search

Generate text by exploring multiple paths at once, keeping the most promising candidates at each step.

You will need

a completed model from an earlier lesson
pen and paper for each “beam” (or a whiteboard divided into columns)
dice for sampling (or use greedy selection)
multiple people to track different beams (works best as a group activity)

Your goal

Generate text using beam search with beam width 2 or 3. Compare the result to standard single-path generation. Stretch goal: try different beam widths and see how results change.

Key idea

Standard generation picks one word at a time and commits to it. Beam search keeps multiple candidates alive, only choosing the final output after seeing how each path develops. This often produces more coherent text.

Algorithm

Setup

Choose a beam width (start with 2 or 3—this is how many paths you track).
Assign one person (or paper column) to each beam.
Agree on a scoring method: sum of counts along the path, or product of probabilities.

Generation

Start: all beams begin with the same starting word.
Expand: for each beam, find all possible next words from the current word’s row.
Score: calculate the score for each possible continuation (current path score + new word’s count, or current score × new word’s probability).
Prune: across all beams and all their expansions, keep only the top k paths (where k is your beam width). Some beams may die; others may branch.
Repeat: continue until all surviving beams reach a stopping point (.) or target length.
Select: choose the highest-scoring complete path as your output.

Example

Model: trained on “See Spot run. See Spot jump.”

Beam width: 2

Step 1 (start): both beams have “see” (score: 0)

Step 2 (expand from “see”):

“see → spot” is the only option (count: 2)
Both beams: “see spot” (score: 2)

Step 3 (expand from “spot”):

“spot → run” (count: 1)
“spot → jump” (count: 1)
Beam 1: “see spot run” (score: 3)
Beam 2: “see spot jump” (score: 3)

Step 4 (expand from “run” and “jump”):

“run → .” (count: 1) → Beam 1: “see spot run.” (score: 4)
“jump → .” (count: 1) → Beam 2: “see spot jump.” (score: 4)

Final: tied scores, so either output works. In practice, you’d pick one arbitrarily or continue generating.

Tracking beams in practice

With people: each person holds a piece of paper showing their current path and score. After each expansion step, everyone announces their candidates. The group collectively picks which paths survive.

With a whiteboard: draw columns for each beam. Write paths vertically, crossing out eliminated paths and continuing survivors.

With cards: write each candidate path on an index card. Spread them out, score them, and physically remove the losers.

Instructor notes

Discussion questions

how does beam width affect output quality?
what happens with beam width 1? (it’s just greedy search)
what happens with very large beam width? (approaches exhaustive search)
when would beam search produce different results than random sampling?
can a path that looks bad early become the best path later?

Comparing methods

Run the same starting word through three methods:

Random sampling (dice rolls)
Greedy (always pick highest count)
Beam search (width 2–3)

Compare the outputs. Beam search often finds a middle ground—more coherent than random, less repetitive than greedy.

Computational trade-offs

Discuss with students:

beam width 1 = 1 path to track (fast, simple)
beam width 10 = 10 paths to track (slower, but explores more options)
beam width 1000 = impractical by hand, but computers can do it

This illustrates why AI researchers tune beam width: it’s a trade-off between computation time and output quality.

Connection to current LLMs

Beam search is a classic decoding strategy used in many language tasks:

machine translation: beam search finds fluent translations by exploring multiple word orders
speech recognition: keeps multiple interpretations alive until context disambiguates
code generation: explores different syntactic structures before committing
trade-offs: beam search guarantees finding high-probability sequences but can produce “boring” text; sampling adds creativity but risks incoherence

The key insight: generation is a search problem. Standard sampling explores one random path; greedy search explores one “obvious” path; beam search explores multiple promising paths. Your classroom beams demonstrate this directly—by tracking several candidates, you can compare outcomes that single-path methods would never see.

Modern LLMs often combine strategies: use beam search for factual tasks where accuracy matters, sampling for creative tasks where variety matters, and hybrid approaches that sample within beams for balanced results.

Beam Search ​

You will need ​

Your goal ​

Key idea ​

Algorithm ​

Setup ​

Generation ​

Example ​

Tracking beams in practice ​

Instructor notes ​

Discussion questions ​

Comparing methods ​

Computational trade-offs ​

Connection to current LLMs ​