The Spec For This Article Is Longer Than The Article

Before I wrote the Karpathy LLM Wiki piece, I wrote a 231-line markdown spec describing what it would be. The spec named a goal, four explicit non-goals, a target audience, a thesis in two sentences, a section-by-section outline with word budgets, and a nine-point validation checklist. It took about thirty-five minutes to write. Then I handed it to Claude alongside a slash command that says “draft section by section, stay inside the word budgets, apply the voice rules at the bottom.”

The agent-harness article has a 148-line spec sitting next to it. The /lab playground feature I’m still building: 170 lines. The article you are reading now has a spec too, and the spec is longer than the article. That is the whole point.

What A Spec Actually Is

A spec in my working directory is a markdown file with eight sections. I keep a template at docs/specs/_article-template.md — ninety-nine lines — that seeds every new one. The sections that do the real work:

Goal. One paragraph. What does this produce, and why would anyone read it? If I cannot write the goal cleanly, I do not yet understand the piece, and the spec gets closed.

Context. What prompted this right now? A recent event, a personal incident, a landscape shift. Context is where the piece sits in time — the fact that makes the article inevitable rather than arbitrary.

Differentiation. Two or three specific gaps the article fills. Not “covers topic X” but “existing coverage misses Y and Z.” This is the section that kills most of my draft specs before I ever write the first paragraph. If I cannot name what is missing from the discourse, I have nothing to add to it.

Non-Goals. The section that saves the most time. Explicitly what this article is NOT.

Thesis. Two or three sentences. The core argument that every later section has to serve. When this goes fuzzy, everything downstream rots.

Structure. Section-by-section outline with a word budget on each. Prevents the thirty-paragraph drift that happens when you let an assistant “expand on that a bit.”

Voice guidelines. A do-list and a don’t-list. Mine catalogues the concrete AI-smell patterns my earlier drafts accumulated, so the next draft stops producing them.

Validation criteria. A checklist the finished draft has to pass before it ships. Mine includes “at least two internal links”, “no section where a developer would stop reading”, and “under 2,500 words.”

Non-Goals Is The Most Valuable Section

The LLM Wiki spec declared four non-goals on line 29:

  • Not a tutorial (“how to build an LLM Wiki”)
  • Not a product comparison (“LLM Wiki vs NotebookLM vs RAG”)
  • Not a Claude Code feature showcase
  • Not a Karpathy biography or gist paraphrase

Those four lines saved me something like four hours. The default gravitational pull of any AI-assisted draft about a new Karpathy gist is toward at least three of them. Without the non-goals pinned in writing, I would have watched Claude produce a competent tutorial, noticed it was wrong, asked for a rewrite, watched it produce a competent product comparison, noticed that was wrong too, and so on. With the non-goals on the page, the first draft steered around all four traps in advance.

Writing non-goals also flushes out the thesis. Every non-goal is a negative-space claim about what the piece actually is. Once I had named four things the LLM Wiki article would not be, the thesis — maintenance is the problem every prior knowledge system failed at, and the LLM Wiki is the first plausible answer — wrote itself.

With AI Assistants

A one-line prompt to Claude produces whatever Claude feels like producing. A hundred-line spec produces something a lot closer to what I wanted.

The difference is not the volume of tokens. It is the direction of compression. Without a spec, the assistant expands every prompt outward into the most plausible completion, which is usually a slightly-more-generic version of what you asked for. With a spec, the assistant has a shape to fit: a goal, a non-goal list, a word budget, a list of forbidden phrasings. It still writes, and it still surprises me with phrasings and examples I would not have chosen. It writes within walls.

The spec is also what survives model changes. Claude model versions changed underneath several of these articles during drafting. The prompts I was issuing in the middle of a session would not have transferred cleanly to the next model version — they assumed context I had built up by hand. The spec did. A new session with a fresh model could re-read the spec and pick up the draft where the previous session left off, because the contract was declarative and on disk.

Where Specs Fail

They rot. A spec written in January reflects January’s reality. By April, three of the constraints may be obsolete and the context paragraph may be describing a landscape that has moved. I treat any spec in docs/specs/ older than sixty days as suspect until I re-read it.

They over-constrain. I have a shelf of draft specs for articles I never wrote because the spec was tight enough that there was no room left to think. The thesis came out in the spec itself, the supporting evidence was already enumerated, and by the time I sat down to draft, there was nothing left to discover. A good spec leaves gaps the draft fills; an over-specified spec leaves none, and the writing has nowhere to find anything new.

They are sometimes wrong. The MCP servers article had a confident spec that aged badly. I revised the conclusion within a month of shipping, because the landscape moved faster than the document. That is what editorial updates are for. The spec was well-thought-through on the day it was written; six weeks was enough for the conclusion to flip underneath it.

When To Skip

Not everything needs a spec. One-line typo fixes, dependency bumps, a CSS tweak — these go straight to a commit. The heuristic I use: if writing the spec will take longer than making the change, it is not a spec candidate. The second heuristic: if the work is pure pattern-matching (five more routes like the three existing ones), a spec adds friction without adding clarity.

The test is whether an honest answer to “what does success look like, and what is this explicitly NOT?” is non-obvious. If both parts are obvious, skip the spec. If either part requires actual thinking, write the spec. Most non-trivial AI-assisted work fails the obvious-test, which is why I keep writing them.

The Artifacts Are On Disk

The specs for this blog live at docs/specs/ — six files totalling roughly eight hundred lines of markdown. The articles that came out of them are on fpaul.dev. The delta between each spec and its published article is where I learned what the spec did not yet know. Those deltas are why I keep writing specs, and why I update the template every time one of them surprises me.