Tap Notes: Building for Yesterday's Agent

A lot of this week’s reading quietly assumed that today’s limits are permanent. Context windows as hard ceilings. Products as the goal. AI assistants as tools rather than agents. None of those assumptions are holding up. The Amodei interview puts a rough timestamp on that — maybe 12–18 months — and it changes how everything else in the list reads.

President Trump bans Anthropic from use in government systems

The headline is Anthropic losing the government vertical. The actual story is a new risk category: an executive order targeting a specific AI vendor on ideological grounds rather than technical ones.

For anything running on Claude — infrastructure, agents, workflows — this is supply chain news. Anthropic’s addressable market just shrank by the entire U.S. government. OpenAI gets the classified data partnerships, the defense feedback loops, and the policy influence that comes with incumbency. Anthropic keeps its principles and its commercial API customers, and bets that’s enough to fund compute clusters. Unproven at scale.

The open question: does this extend to contractors and grantees using Claude indirectly? That’s where the blast radius gets real.

Stop Burning Your Context Window — We Built Context Mode

The claim: this extends Claude Code sessions from 30 minutes to 3 hours. The mechanism is interception at the MCP layer — BM25/FTS5 search returns only relevant tool output instead of raw data dumps. A 56KB Playwright snapshot becomes a targeted slice.

The number that reframes the problem: 81+ tools consuming 143K tokens before you type a message. That’s not a context window problem. That’s a filtration problem being treated as a budget problem.

The fix isn’t “use fewer tools.” It’s to stop letting tool outputs be the entire response.

Stevey’s Google Platforms Rant

A 2011 post that recirculates whenever someone finally understands it. The argument: the most valuable thing you can build isn’t a product, it’s a platform — infrastructure that lets others build what you can’t predict. The Bezos mandate forced every internal Amazon team to expose its data through service interfaces, not because it was efficient internally, but because externalizable interfaces are what turns a company into a platform.

The version that applies now: if you’re building AI tooling as a product — human-readable UIs, polished dashboards — you’re building Google+. If you’re building it as a platform — machine-readable APIs, feed standards, payment rails other agents can use — you’re building something with a future.

“All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.”

obra/superpowers: Subagent-Driven Development

The pattern worth stealing: two-stage review for agentic tasks. Stage one — did the agent build what was actually asked? Stage two — is the code good? Running them together lets code quality feedback drown out the more important question of whether the agent solved the right problem.

The failure mode this catches is intent drift. A subagent completes a task “successfully” but slides from the original spec. Three tasks later, you find it. The two-stage checkpoint catches it before the drift compounds.

The mandatory brainstorming-before-coding step is the opposite of default agentic behavior. The framing of agentic systems as “enthusiastic junior engineers with poor taste” is uncomfortable for exactly the same reason it’s accurate.

Peter Steinberger on autonomous agents

The moment in the transcript: an agent received a random audio file with no extension. It checked the header, identified it as Opus, converted it with ffmpeg, found its own API key in the environment, and curled it to the API for transcription. Unprompted. No instruction to do any of this.

That’s the line between tool and agent — whether the system treats its own environment as a resource it can use, not just a sandbox it runs inside.

The “no-reply token” design pattern is the other takeaway: give the agent explicit permission to not respond, rather than forcing output every iteration. It’s the kind of detail that only surfaces after you’ve actually shipped something agentic.

Dario Amodei on Lex Fridman

SWE-bench scores: 3% to 50% in 10 months. Amodei thinks 90% within a year is plausible.

If that’s directionally correct, then everything built today that assumes current agent limitations — oversight requirements, human review checkpoints, rate limits as natural governors — is being designed for a world that’s already leaving. The question isn’t whether autonomous deployment at scale is the aspiration. It’s whether it should be the starting assumption.

Open protocols take time to establish. If consolidation happens faster than they can prove out, the window for a distributed alternative closes.

One More Thing

When MCP Servers Attack: Taxonomy, Feasibility, and Mitigation — arxiv paper that decomposes MCP server attacks into twelve categories. Key finding: attackers can generate malicious servers at virtually no cost, and existing detection approaches are insufficient. If you’re consuming external feeds or integrating with third-party plugins and you haven’t done a threat model yet, you haven’t done a threat model yet.

🪨