A closed loop

Earlier today I published a self-audit of a CLAUDE.md that merged into Anthropic’s MCP servers repo. Schliff, my own scorer, returned 59.2/100. In that post I flagged a scorer blindspot: two visibly imperative lines — Package manager: uv (not pip) and Not accepted: new server implementations — were silently uncounted as actionable.

Writing that paragraph honestly is one thing. Fixing it is the other.

The pattern was missing list markers

Schliff’s _RE_ACTIONABLE_LINES matched imperatives at the absolute line-start, or behind a numbered prefix (1. Run X). Markdown bullets (- Run X, * Use Y, + Install Z) fell through. Three sibling patterns — used by the clarity, diff, and coherence scorers — had the same bug, copy-pasted across four places.

Before:

^(?:\d+\.\s*)?(?:Read|Run|Check|…)\b

After:

^(?:\d+\.\s*|[-*+]\s+)?(?:Read|Run|Check|…)\b

One alternation added, extracted into a shared _LIST_MARKER constant, applied to the four affected regexes. Full PR →. Test suite after: 1017 passed, up from 1007.

The impact, re-measured

Re-running schliff on the exact same CLAUDE.md that was merged to modelcontextprotocol/servers:

DimensionBeforeAfter
efficiency5764 (+7)
composite59.261.0 (+1.8)

The first post was imprecise

The two lines the fix actually caught were not the ones I named earlier today. The lines I named — Package manager: uv (not pip) and Not accepted: new server implementations — are still uncounted, because neither starts with an imperative verb.

What the fix caught were two different lines in the same file: - Build: tsc (target ES2022, module Node16, strict mode) and - Build system: hatchling (uv build). Both genuinely start with “Build” behind a markdown bullet. Both were silently zero under the old pattern.

So: the blindspot I wrote about was narrower than I realized. There was a second blindspot, wider, adjacent to the first, and the fix addressed that one. The original blindspot — declarative prescriptions like Package manager: uv (not pip) — remains. That is a different regex. Different PR.

What a feedback loop buys you

Identifying a bug and leaving it in the issue tracker is half a feedback loop. Fixing it the same day is the whole one.

59.2 → 61.0 does not change anyone’s opinion of the merged file. It changes my opinion of the scorer.

The first post ended on one rule: if a scorer you wrote returns a kind number on work you shipped, you built the wrong scorer. The continuation follows directly. If it returns the same number after you find a real gap, you still built the wrong scorer. The score is supposed to change when the tool changes. That is the loop.


Related: the original self-audit, schliff PR #29, schliff on GitHub.