LoRA
Efficiently adapt a trained language model to a new domain or style without retraining the whole thing.

You will need
- a completed bigram model from an earlier lesson (your base model)
- pen, pencil, and grid paper
- new domain- or style-specific text
Your goal
Create a lightweight adaptation layer that shifts your base model toward a new domain. Stretch goal: experiment with mixing ratios between base and LoRA layers.
Key idea
Low-Rank Adaptation (LoRA) stores only the changes from the base model, so it can be much smaller. During generation you add LoRA counts to the base counts (optionally scaled) and sample as normal.
Algorithm
- Choose an existing bigram grid as your base model.
- Train a LoRA grid:
- Start with a new grid using the same columns as the base.
- Run Training (grid method) on your new domain text, but only keep rows for words that appear in that text.
- Apply the adaptation:
- When sampling, add the LoRA counts to the base counts for the current word (if that row exists).
- Optionally scale the LoRA counts up or down to control how strongly the adaptation influences the output.
Example
- Base model (general text) has a
sawrow with counts towardthey,the,a,red. - LoRA trained on “I saw a red cat. I saw the red dog.” adds only a
sawrow with extra counts towardthe,a,red. - Combined sampling uses base + LoRA counts, making
redmore likely aftersawwhile leaving other rows unchanged.
Instructor notes
Discussion questions
- how much training data do you need for the LoRA layer compared to training from scratch?
- what happens if you scale the LoRA values by 2 or 0.5 before adding them?
- can you create multiple LoRA layers for different domains?
- which words change most between base and adapted models?
- when would you want a separate LoRA layer vs retraining the whole model?
Connection to current LLMs
Low-Rank Adaptation revolutionised how modern LLMs are customised:
- efficiency: training a LoRA layer requires 100-1000x less computation than full fine-tuning
- modularity: you can have one base model plus many LoRA layers for different tasks (medical, legal, creative writing)
- preservation: the base model stays unchanged, so it retains its general capabilities
- combination: multiple LoRA layers can be combined or switched on-the-fly
- distribution: LoRA layers are small (megabytes vs gigabytes), making them easy to share
The key insight: most model adaptation happens in a small subspace of all possible changes. Instead of adjusting billions of parameters, LoRA identifies and modifies only the dimensions that matter for the new domain. Your paper implementation makes this concrete: rather than recreating the entire grid, you only track the changes needed for the new text style. When you add the base and LoRA counts together, you’re doing exactly what neural networks do when they apply LoRA layers during inference. This is why organisations can maintain one large foundation model and create thousands of specialised versions through lightweight LoRA layers.