The File That Grew
My user-global CLAUDE.md is fifty lines. It started as three.
The first version said: “Be concise. Use TypeScript. Run tests before committing.” That was February. By March I’d added skill routing rules, spec-first enforcement, a hook that blocked commits without conventional messages, and a memory file that tracked every failed deployment pattern I’d hit. A session last week fired a PreToolUse hook that validated a file path, which triggered a skill that referenced a memory file about directory conventions, which shaped the output of a component I never explicitly instructed it to build.
I didn’t design that pipeline. It emerged from accumulated configuration. That’s when I realized I wasn’t configuring an editor. I was building a system.
The Config That Became Architecture
The 2025 conversation was about model quality — GPT-4.5, Claude 3.5, Gemini Ultra, bigger context windows, better reasoning, more parameters. By 2026 the models are good enough. The bottleneck moved to infrastructure: how you scaffold, constrain, and direct them.
Anthropic drew a useful line in their agent design guide. Workflows are predefined code paths — if X then Y, deterministic, authored by humans. Agents are LLM-controlled — the model decides what to do next, which tool to call, when to stop. The distinction matters because most real systems are both. Your CI pipeline is a workflow. The thing that decides whether to refactor a function or add a test is an agent.
A harness is the runtime layer that makes both modes possible. Not the model. Not the prompt. The structure around them.
Here’s the definition I keep coming back to: a harness is four layers.
Schema — the declarative rules. What the agent is, what it must do, what it must not do.
Tools — the capabilities. What the agent can reach for when it needs to act.
Events — the lifecycle hooks. What happens automatically at specific moments, regardless of what the model decides.
Memory — the persistent knowledge. What survives between sessions and compounds over time.
Each layer has different persistence. Schema is versioned in git. Tools are loaded per session. Events fire on specific triggers. Memory accumulates across sessions. The interplay between these four layers is what separates a configured tool from a designed system.
What You Already Built Without Naming It
If you’ve been using Claude Code for more than a month, you’ve built some version of these four layers. You just didn’t call it a harness.
Schema: Declarative Governance
Your CLAUDE.md is a schema. It declares identity, constraints, and behavioral rules. “Use conventional commits.” “Run the linter before completing.” “Specs go in docs/specs/.” These aren’t suggestions to the model — they’re governance. The model reads them as hard constraints.
Good schemas and good specs share the same structure: declarative, decision-rationale attached, scope pinned down. A CLAUDE.md that says “be helpful and concise” is documentation. A CLAUDE.md that says “never create files unless explicitly requested, prefer editing existing files, run npm run lint before any commit” is governance. One describes vibes; the other describes behavior you can test for.
Project-level CLAUDE.md overrides global. That’s the cascade. Global sets identity, project sets constraints. The composition model is CSS-like: specificity wins.
Tools: The Simplicity Gradient
Skills are Claude Code’s tool layer. They’re callable, auto-discovered, and dispatched by description matching — the model reads the description field and decides whether a skill is relevant. No regex, no intent classification. Pure LLM reasoning as a router.
What makes this interesting is the simplicity gradient. You don’t need skills for everything. A curl command works. A shell script works. A git hook works. An MCP server works. A skill works. Each step up the gradient adds capability and complexity. The question isn’t “what’s most powerful?” It’s “what’s simplest that solves this?”
I wrote about this gradient in CLI Beats MCP — most of the time, a shell command is the right tool. Skills sit at the top of the gradient: maximum capability, maximum overhead. They make sense when you need LLM-aware dispatch, hot-reloading, and description-based routing. For everything else, pick the simplest option.
The dotfiles parallel holds here too. Your skills encode your methodology. Which tools you reach for, how you sequence them, what you consider done. That’s not configuration. That’s identity.
Events: The Nervous System
Hooks are the layer most people underestimate. Claude Code exposes around two dozen lifecycle events — PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, SessionEnd, Stop, SubagentStop, PreCompact, and more. Most of mine hook into four: PreToolUse, PostToolUse, SessionStart, Stop. Those four change everything once you learn what they guarantee.
The key property: hooks execute with guaranteed reliability. They’re not prompts the model might follow. They’re code that runs. A PreToolUse hook that blocks rm -rf / will block it every time, regardless of how creative the model gets with its reasoning. A prompt that says “never delete the root directory” is probabilistic. A hook is deterministic.
This is the nervous system analogy. Your brain (the model) decides what to do. Your reflexes (the hooks) override the brain when safety matters. You don’t think about pulling your hand from a hot stove. The reflex fires before conscious thought arrives.
I covered the full hook system in the hooks guide. The short version: if you’re enforcing constraints through CLAUDE.md rules alone, you’re relying on the model’s compliance. If you’re enforcing them through hooks, you’re relying on code execution. One of these is reliable.
Memory: Persistent Knowledge
CLAUDE.md persists identity. Memory files persist facts. The difference matters.
Your CLAUDE.md says “use conventional commits.” A memory file says “the last three deployments failed because of CSS import ordering in production builds.” One is a rule. The other is learned experience. Both survive between sessions, but they serve different functions.
This connects directly to the LLM Wiki problem. Karpathy argued that we need compiled, persistent knowledge bases — structured repositories that LLMs can read and update. Memory files are a primitive version of exactly this. Anyone maintaining a CLAUDE.md is hand-writing a proto-wiki page. Anyone maintaining memory files is building a knowledge base one session at a time.
The failure mode is also the same: staleness. Knowledge that was true last month isn’t necessarily true today. Memory without maintenance becomes misinformation.
Orchestration: The Emergent Layer
When the four layers interact, orchestration emerges. You don’t build orchestration directly. It falls out of the other layers working together.
Hydra is the clearest example. Six advisors, three peer reviewers, one chairman — an orchestrator-workers pattern where each participant brings different model characteristics. The schema defines roles. The tools define capabilities. The events coordinate handoffs. The memory accumulates review patterns. No single layer is the orchestrator. The orchestration is the interaction.
The Codex plugin demonstrates the same principle from a different angle. Different model, different blind spots, same harness. You can swap the model and keep the infrastructure. That’s the proof that the harness is real — it’s not model-dependent.
CLAUDE.md Is the New Terraform
Here’s the provocation: CLAUDE.md is to AI agents what Terraform is to infrastructure.
Declarative. Versioned. Reviewable. Composable. It describes a desired state, and the runtime figures out how to achieve it. You don’t imperatively tell Claude “first read the file, then check the linter, then run the tests.” You declare “always lint before committing, always run tests before completing” and the model figures out the execution order.
The parallel extends to lifecycle. Terraform has plan, apply, destroy. A harness has schema (plan), tools (apply), hooks (validate), memory (state). Terraform state files track what exists. Memory files track what happened. Both are persistence layers that make the declarative layer work.
But most CLAUDE.md files aren’t treated this way. They’re brain dumps. Random rules accumulated over weeks. “Be concise.” “Use tabs.” “Don’t create unnecessary files.” “Actually, spaces are fine.” “Always run tests.” “Sometimes skip tests for documentation changes.” Contradictions pile up. The model does its best.
The evolution has breakpoints. An empty CLAUDE.md is fine — the model uses defaults. Five lines (“be concise, use TypeScript”) is fine — clear constraints, no conflicts. Thirty lines is the first breakpoint. Rules start contradicting. You need sections, or the model will interpret ambiguities differently between sessions. Eighty lines is the second breakpoint. You need a separation of concerns — identity vs. constraints vs. workflows vs. tool configuration.
A harness-aware CLAUDE.md has structure. Section one: identity and voice. Section two: boundaries and constraints. Section three: tool preferences and workflows. Section four: memory strategy and file conventions. Each section maps to a harness layer. The schema describes the schema.
The moment it stops being config and becomes architecture: when you version it, review changes in PRs, and test the agent’s behavior against it. When a teammate opens a diff on your CLAUDE.md and asks “why did you remove the spec-first rule?” — that’s infrastructure review. That’s IaC.
Where the Abstraction Leaks
Is this overkill? Obviously. For most tasks, a simple CLAUDE.md and a few skills is all you need. The harness framing becomes useful when things break, and things break in predictable ways.
Schema conflicts. Ambiguous rules get interpreted differently between sessions. “Prefer simplicity” means one thing when generating a React component and something else when writing a database migration. Two sessions, same CLAUDE.md, different behavior. The fix is specificity — but over-specified schemas become brittle.
Tool explosion. I had 245 skills and Claude was drowning. Discovery becomes a bottleneck when every task matches fifteen skill descriptions. The wrong skill fires. The right skill doesn’t fire because a more generic one matched first. The fix is curation, not accumulation — but knowing what to cut requires understanding what the model actually uses.
Memory rot. Persistent files go stale. A memory entry from January says “the API returns XML.” The API switched to JSON in March. The model reads the memory file, generates XML parsing code, and you spend twenty minutes debugging. The fix is maintenance — but who maintains memory files? The same problem Karpathy identified for the LLM Wiki: knowledge without curation decays into noise.
The over-designed harness. Forty hooks, two hundred skills, memory files for everything. The system constrains more than it enables. The model spends more tokens navigating the infrastructure than doing the work. Configuration becomes a second codebase.
No standard. Every developer’s harness is incompatible. My CLAUDE.md conventions don’t transfer to yours. My skill naming conflicts with your skill naming. There’s no package.json for harnesses, no shared schema, no interop layer.
These aren’t hypothetical problems. I’ve hit all five. The fact that we have these problems means the architecture is real. You don’t get schema conflicts in a system that doesn’t have a schema. You don’t get memory rot in a system that doesn’t have memory. The failure modes prove the abstraction.
What Those Lines Do
Three lines to fifty, with more pending. Three rules to four layers with orchestration emergent on top.
The CLAUDE.md started as a note to the model. It became a schema. The skills started as shortcuts. They became a tool layer. The hooks started as safety checks. They became a nervous system. The memory files started as reminders. They became a knowledge base.
None of this was planned. All of it was designed — just retroactively, by recognizing what the pieces were becoming and giving them the structure they needed.
The shift is from telling the model what to do to building the system that shapes how it reasons. Last week a PreToolUse hook fired in a fresh clone I had never opened; the harness reassembled itself and the session was productive within a minute. That is what those fifty lines of CLAUDE.md actually do — they hand the next session a working environment before it asks a single question.