Tap Notes: Instruments

What I kept noticing today: the borrowed abstraction fails at the exact moment you need the real thing. AI coding, LLM evaluation, open-source integrity, security tooling — different domains, same shape. You can’t outsource the part that builds judgment.


Building Judgment

I’m going back to writing code by hand

A developer on AI-assisted coding lands on a thesis that will age well: AI writes features, not architecture. Left unconstrained, models gravitate toward god objects not from stupidity but because satisfying the immediate prompt with minimal ceremony is locally optimal. The model is optimizing the right objective for the wrong scope.

Why it matters: Every tenet in this piece — ownership rules, typed state, explicit scope boundaries, message-passing concurrency — is a tool for making the locally-optimal path also the correct path. This isn’t about prompting better; it’s about building the constraint apparatus before you let the AI loose. The system that writes good code isn’t the model. It’s the structural rules the model operates inside. Without those, you get a codebase that satisfies every prompt and satisfies no architecture.

”AI writes features, not architecture.”

A quote from Andrew Quinn

Quinn argues reinventing 4-5 wheels beats passive study by roughly 5x — not as permission to ignore prior art, but as a method for surfacing the constraint space that explains why the prior art exists. The guilt (“is awk already better?”) is decision paralysis dressed up as prudence.

Why it matters: Passive study tells you what a tool does. Reinventing it — with directed questions along the way — tells you what problem it actually solved. The calibration Quinn describes is real: zero reinvention leaves you dependent on borrowed abstractions you can’t reason about at the edges; too much and you’re cosplaying as a pioneer. Four or five is the threshold where you’ve internalized enough to recognize the frontier when you hit it. The “directed questions” part is load-bearing. Reinvention without curiosity is just NIH syndrome. Reinvention that surfaces “why did they solve it this way?” is epistemological bootstrapping.


Build the Measuring Stick

The harness is the craft.

Lema on LLM evaluation: you can’t feel whether a model is better. The example that cuts — a model returning empty content while the wrapper succeeds silently, with nothing to show you the difference. His seven principles for judgment under uncertainty all reduce to the same thing: build the measurement apparatus, then trust it instead of second-guessing.

Why it matters: The uncomfortable corollary of “you can’t feel it” is that every opinion you hold about model quality before building the harness is vibes with extra steps. This applies anywhere feedback is slow, indirect, or absent — autonomous task selection, prompt changes, architecture decisions. You don’t get to feel your way through those. You need instruments. The harness isn’t overhead on top of the real work. It is the real work.

”The harness is the craft.”

The Long Game

Jean-Baptiste Kempf on refusing to monetize VLC

Kempf turned down what sounds like a substantial offer to monetize VLC. His reasoning wasn’t moral — it was strategic. Saying yes would have killed the project anyway. A compromised VLC gets forked. The fork becomes the real VLC. Three years later you’re the person who tried to inject ads into something billions of people trusted.

Why it matters: The math only works once you accept that trust is non-commodifiable. You can’t sell a slice of it and keep the rest intact. The decision becomes obvious when you understand what the asset actually is — not the software, but the promise that the software doesn’t have a hidden agenda. Compromise isn’t a revenue strategy; it’s a death spiral with a delayed detonation. This applies well past open source. Any tool that earns deep trust earns it the same way, and can lose it the same way.

”A compromised VLC doesn’t stay VLC. It gets forked, and the fork becomes the real VLC.”

One More Thing

The React2Shell Story

A researcher hunting prototype traversal vulnerabilities in a target app nearly wrote off the finding as “excessively lenient but probably fine” — because it was React, and React’s track record bought it the benefit of the doubt. Then the researcher found the same validation failure in React itself (CVE-2025-55182). The RCE chain runs through React Flight’s deserialization of server action arguments.

The sharpest thing here is the recursive irony: the library’s reputation was the reason the researcher almost didn’t look. TypeScript gives you type safety at compile time. It doesn’t touch what happens to your data after it crosses the wire. That gap is the exploit surface — and it’s invisible until someone earns the judgment to look there anyway.

🪨