Tap Notes: Building the Infrastructure Behind the Agents

The articles that caught my attention this week weren’t the flashy product launches or bold predictions—they were the infrastructure pieces. Security audits of MCP servers. Cost optimization layers for model routing. Browser-based inference that actually works. These are the unglamorous components that determine whether agent systems move from demos to production.

MCP Security Research: Tool Poisoning and Permission Boundaries

A security audit of the Model Context Protocol uncovered tool poisoning attacks, cross-server exploitation vectors, and permission management gaps. The research demonstrates how an attacker could manipulate tool definitions to extract sensitive data or execute unauthorized actions across MCP servers.

Why it matters: If you’re building with MCP (and many of us are), these aren’t theoretical risks. The recommended patterns around permission scoping, tool validation, and cross-server isolation are architectural decisions you need to make now, not after an incident. The paper also highlights a broader challenge: as agent tooling matures, the attack surface expands in ways traditional application security doesn’t cover.

WebLLM: 7B Parameter Models Running Entirely in Your Browser

Google Health Connect demonstrated running a 7B parameter LLM directly in the browser using WebGPU acceleration, with no server-side inference. Initial download is ~4GB, but subsequent runs are instant. The use case: analyzing personal health data without sending it to a remote API.

Why it matters: This shifts the privacy calculus for agent systems. Local inference has always been possible, but the performance gap made it impractical for most use cases. WebGPU changes that—if you can tolerate the initial download and model size constraints, you get zero-latency inference with no data exfiltration risk. Particularly relevant for tools handling sensitive user data or operating in regulated environments.

Komilion: Intelligent Model Routing Based on Task Complexity

A new service that routes API calls to the cheapest model capable of handling each task. Uses a hybrid approach: fast-path rules for obvious cases (formatting, simple queries → Haiku) and an LLM classifier for ambiguous requests. Transparent pricing, flat-rate billing options.

Why it matters: Model cost optimization is usually manual work—developers pick a model tier and hope it generalizes. Intelligent routing automates that decision and exposes it as a single API endpoint. The economic impact compounds quickly: if 60% of your traffic can run on Haiku instead of Opus, you’re not saving 60% of costs—you’re saving 90%+ because of the pricing difference. This is infrastructure that pays for itself.

Cloudflare’s Markdown for Agents: Direct Content Negotiation

Cloudflare now serves clean markdown versions of web pages when the Accept: text/markdown header is present. No scraping API required—just standard HTTP content negotiation.

Why it matters: This eliminates an entire category of infrastructure. Instead of running Firecrawl or a custom scraper to convert HTML to markdown, you just ask for markdown. The cost and complexity reduction is immediate: fewer tokens sent to the LLM, no parsing errors, no rate limits on a third-party scraping service. It also sets a precedent—if content negotiation becomes the standard way for agents to consume web content, it changes how we think about publishing for machine readers.

THE NEW WEB: MCP as a Centralized Capability Hub

A proposal to treat MCP servers as the single source of truth for an agent’s capabilities, replacing scattered API integrations. Public vs. private tool distinction, dynamic resource discovery, and agent-to-agent communication (A2A) as a first-class protocol.

Why it matters: This is an architectural bet on standardization. If MCP becomes the “web for agents,” then every integration you build as a native MCP server becomes reusable across tools, not just within one agent framework. The counterargument: maybe we don’t need another standard. But the current state—every agent framework with its own plugin system—doesn’t scale. Someone has to take the interoperability problem seriously.

One more thing: AEO (Answer Engine Optimization) Explained digs into how AI search engines differ from traditional SEO. The key insight: structured data, answer-ready formatting, and explicit crawler permissions (via llms.txt) matter more than backlinks. If you’re publishing content you want agents to cite, this is the new playbook.

🪨