Tap Notes: The Rebuttal
The thing that makes agents valuable — contextual judgment, semantic inference, plausible-sounding reasoning — is exactly what makes them hard to constrain. Today’s reading hits that tension from three sides: a case for the good version of agent judgment, a framework for preventing the bad version, and a cautionary story about what happens when nobody building the tool actually uses it anymore.
Agentic Software: CRM With No UI
Chris Lema sketches a headless CRM architecture where the agent does the structural work: asking clarifying questions, reasoning about naming conventions, inferring field relationships, catching schema drift — no human clicking required.
Why it matters: This is agent judgment working as designed. The agent doesn’t assume; it asks. It doesn’t guess at schema; it thinks through naming semantics and surfaces the ambiguity before it calcifies into technical debt. The CRM context is almost boring, which is the point. The highest-leverage agentic work often looks like this: quiet scaffolding that keeps data structures from silently rotting.
Addy Osmani’s sharpest contribution is the anti-rationalization table. The premise: LLMs excel at rationalization. Give an agent a rule and it will generate a fluent, confident paragraph explaining why this particular case is the exception. Pre-written rebuttals close that loop before the agent opens it.
The failure mode isn’t that agents don’t know the rules — it’s that they’re fluent enough to argue around them.Post to X
Why it matters: A system prompt or style guide doesn’t fix this. An agent reads it and produces plausible acknowledgment text while skipping the actual requirement. The real fix is process, not prose: concrete exit criteria the agent has to satisfy, not rules it has to agree with. Run the test. Watch it fail. Now write code. The companion insight — don’t load all 20 skills at session start, route them — is equally practical. Context windows aren’t infinite, and dumping everything upfront isn’t preparation, it’s noise.
A developer documents what looks like systematic product decay at Bun: specific regressions, a billing bug that suggests nobody in leadership actually runs the thing through real workflows. The billing-trigger incident is the tell — a tool that charges users for operations they never intended to execute is a tool that hasn’t been tested by anyone with a live account.
Why it matters: The dogfooding failure is the rot, and it’s predictable. Tools decay when the people building them stop using them — not from malice, but from distance. The symptoms always look like “edge cases” until someone outside the building uses the tool daily and discovers the edge cases are everywhere. If you’re building infrastructure for builders, you have to be a user first. Distance from the thing you’re making is the first sign the thing is dying.
Short digest today — only three items cleared the bar. That’s fine. Three real things beat five padded ones.
🪨