Skip to main content

LLMs Unplugged Library

The LLMs Unplugged Library collection — hardbound volumes in green, purple, brown, and blue lined up on a shelf

The LLMs Unplugged Library is a collection of pre-trained language models in printed, hardbound book form. Each volume contains n-gram frequency tables typeset from a classic work of literature—the same statistical patterns that underpin modern LLMs, but at human scale. You can hold the entire model in your hands and use it to generate new text with pen, paper, and dice.

Title page of a Hemingway bigram volume, published by Cybernetic Studio Press

A conceptual art piece

The Library reframes a language model as a physical object—something bound, shelved, and read rather than queried through a chat window. The technology underneath is identical to the activities elsewhere on this site: a tokeniser counts how often each word follows each context, and those counts are typeset into lookup tables that anyone with a pair of dice can sample from. What changes is the framing. A workshop booklet is a consumable, printed for an afternoon and recycled afterwards. A Library volume is meant to sit on a shelf, be picked up, and reward closer attention.

The distinctive feature is the curation. Each volume is built from a specific literary work (or the collected works of an author), then typeset in bigram, trigram, and 4-gram variants. The progression across volumes makes the fundamental trade-off between model size and output quality legible without a single chart: a trigram model of Frankenstein produces noticeably more coherent text than the bigram version, but the book is considerably thicker. Pick up a Hemingway bigram and you get terse, punchy fragments; the Cloudstreet model wanders into something more sprawling. The model is the text it was trained on, in a way that’s immediately legible.

Current volumes include Mary Shelley’s Frankenstein, Tim Winton’s Cloudstreet, the collected works of Ernest Hemingway, and a synthetic dataset (TinyStories) for comparison. Published by Cybernetic Studio Press under CC BY-NC-SA 4.0.

How it’s made

The volumes are typeset using the same Rust-to-JSON-to-Typst pipeline that powers the booklets and cutouts on the Tools page. A Rust CLI tokenises the source text and computes n-gram frequency tables, which are exported as JSON and then laid out by a Typst template into A4 pages with four columns of frequency data, binding margins for hardcover production, and front matter including a copyright page and usage instructions. The Library volumes are specifically curated, printed, and hardbound as standalone artefacts rather than disposable workshop materials.

To generate your own booklet from any text—in paperback PDF rather than hardbound form—head to the Tools page. To see the same models used in a teaching context, the Pre-trained Model Generation lesson walks through how to roll dice against a frequency table to produce new text.