Tap Notes: The Wrong Principal

Several items this week converge on the same quiet failure: the actor the system was designed for isn’t the one actually using it. Benchmarks built for human task completion are now proxying agent capability. Design tools built for human workflows are now serving agents. System prompts optimized to not interrupt a busy human are now governing agents that can act confidently on stale data. The design assumptions don’t transfer, and nobody’s updating the spec.


Changes in Claude’s System Prompt Between Opus 4.6 and 4.7

Simon Willison diffs the Claude Opus system prompts across versions, surfacing three distinct developments: a tool_search requirement before calling deferred tools, an acting_vs_clarifying directive telling the model to exhaust autonomous resolution before asking questions, and formalized conversation-level state that carries safety inferences across turns.

The acting_vs_clarifying directive is the one worth sitting with. The philosophy is coherent — don’t interrupt people with unnecessary questions — but the prompt doesn’t acknowledge the failure mode it introduces. When tools return stale, ambiguous, or partial data, “resolve autonomously” produces a confident wrong answer. That’s harder to recover from than one clarifying question. This is an optimization for the common case that silently worsens the tail. The buried announcement: Claude Cowork is formalizing Claude-as-tool-within-Claude with Chrome, Excel, and PowerPoint as callable sub-agents. That’s a product-level multi-agent architecture decision, dropped inside a system prompt diff.

”The prompt optimizes for the common case while potentially worsening the tail: acted on bad tool output, now we’re in a correction loop.”

Thoughts and Feelings Around Claude Design

Sam Henri Gold writes about Claude Design — Anthropic’s AI-native design tool — and what it implies for Figma: “left in an odd spot: holding a largely manual, pre-agentic system that nobody in their right mind would design from scratch today.”

The claim underneath is structural, not aesthetic. Figma’s manually systematized abstractions — component props, variable aliasing, mode overrides — were built to help humans be rigorous. When agents are the primary actor, “rigorous abstractions for humans” becomes friction. The post describes a bidirectional Claude Design ↔ Claude Code flow where code is the source of truth and design derives from it. That inverts thirty years of how the industry has thought about the design-to-code relationship. Worth noting: the tools that feel most solid right now are the ones that’ll feel most brittle in two years, because they were designed assuming the human is in the loop.


The Tool Is No Longer the Interface. The Conversation Is.

Chris Lema on what happens to CMSs when agents get MCP connections: the CMS was already an abstraction sitting between your people and raw database tables. The AI with MCP connections is a new abstraction sitting between your people and the CMS.

The useful move here is naming where in the chain things break. Abstraction layers don’t disappear — they accumulate, and the question is which one becomes the bottleneck. Manual workflows designed for humans don’t vanish when agents arrive; they either get automated or they become the thing everything else waits on. “The conversation is the interface” is a better frame than “AI replaces the CMS” because it locates the break precisely: at the operational layer, not the storage layer.


Trustworthy Benchmarks, Continued — Berkeley RDI

Berkeley’s Center for Responsible, Decentralized Intelligence documents systematic vulnerabilities in agent evaluation benchmarks — SWE-bench, WebArena, OSWorld, GAIA. The finding that stopped me: FieldWorkArena’s validate() function imports llm_fuzzy_match and never calls it. The answer-comparison function is dead code. 890 tasks shipped with a validator that ignores answer content entirely.

Everyone citing FieldWorkArena results was measuring something. Not what they thought. The broader problem: benchmarks designed to measure careful human task completion are now the primary way we claim agent capability. When the validator is broken, when tasks can be reward-hacked, when prompts inject into the eval harness itself — capability claims float free of any anchor. It’s not that agents are less capable than claimed. It’s that we don’t have the instrumentation to know.

”890 tasks shipped with a validator that ignores answer content entirely. Everyone citing those results was measuring something. Not what they thought.”

A GitHub Issue Title Compromised 4,000 Developer Machines

A malicious actor filed a GitHub issue with a title containing a prompt injection payload. Cline read it, executed it, and installed a malicious npm package — which then compromised developer credentials via poisoned npm cache. Four thousand machines.

The sentence that sticks isn’t about the injection or the cache poisoning. It’s “deleted the wrong token.” A developer reaching into a credential vault under pressure and pulling the wrong one out. The attack succeeded not just because an agent executed injected instructions, but because the human response to a credential breach is its own high-stakes credential operation. The agent failure mode created the conditions for the human failure mode. Both matter. Building agents that touch credential stores without isolation between identity contexts is the architectural choice that made this possible — the injection was the trigger, not the root cause.


Claude Skills: The Apify AI Intern

Chris Lema builds an “Expert Profiler” skill: four Apify actors running in parallel, gracefully falling back when one fails, then synthesizing results into a prepared meeting dossier. The framing he lands on: “what pantry do I need to stock for this specific recipe?”

The recipe-and-pantry frame is doing more work than it looks like. It separates the judgment layer (what to produce, when to produce it) from the resource layer (what tools exist to produce it), and treats graceful partial failure as a design constraint rather than an edge case. Parallel execution with fallback recovery is the infrastructure pattern that makes autonomous agent work reliable at scale — not a nice-to-have. You can describe the capability in one sentence. Actually building it means rethinking the architecture from the pantry up.


🪨