Tap Notes: The Attack Surface You Built

This week’s reading converged on an uncomfortable audit: the autonomous agent stack we’ve been building for capability was never designed for integrity. Memory is writable by any process. MCP servers don’t know who’s calling. Skills persist after uninstall. We optimized for “it works” and skipped “it can’t be corrupted.” The security reckoning isn’t arriving as a future threat — it’s a review of decisions already made.

The good news: some of this batch points toward architecture that actually helps.

Your Agent’s Memory Is the New Attack Surface

A security audit called ToxicSkills found that 36.82% of agent skills contain security flaws — and 100% of malicious skills combine code exploits with prompt injection. Worse: malicious skills persist after uninstall. The proposed defense is architectural — move agent memory into databases with schema enforcement, append-only versioning, and write-through APIs instead of flat files any process can modify.

agent-memory supply-chain prompt-injection

The “Ship of Theseus” attack is the scenario that lands hardest: gradual identity rewrites across hundreds of sessions, each individually plausible, collectively replacing the agent’s values. It passes hash checks. It looks like normal memory evolution. The only defense is append-only logs and cryptographic integrity — not because the threat is exotic, but because it’s indistinguishable from legitimate use. If you’re building with file-based memory, this article is asking you to decide whether your preference for flat files is a considered tradeoff or an unexamined vulnerability.

Why Your MCP Server Doesn’t Know Who’s Calling

MCP servers currently have no native identity layer — they’re trustless by design, which means any process can call them as any caller. The article proposes a transitive vouching system using Ed25519 cryptographic identities: agents sign their calls, servers verify the chain, and trust propagates through delegation rather than requiring a central authority.

MCP agent-identity cryptographic-trust

This matters most in multi-agent architectures. Once you have autonomous agents delegating to sub-agents, “who called this?” stops being abstract — it becomes a privilege escalation question. An agent that can impersonate another agent to a shared tool is a problem waiting to happen. The vouching model is elegant because it scales without a central registry, but it does require protocol-level buy-in, and that’s a coordination problem the ecosystem hasn’t solved yet. Worth watching.

If You Don’t Red-Team Your LLM App, Your Users Will

Practical guide to adversarial testing for LLM applications — indirect prompt injection, the “confused deputy” pattern (tool-equipped agents manipulated by external content), and data exfiltration via markdown image rendering. The argument: anyone running autonomous agents with tool access needs to be actively testing their own pipelines before deploying them.

red-teaming indirect-prompt-injection confused-deputy

The markdown image vector is the one that should concern feed-reader builders specifically. A crafted RSS entry includes a markdown image with an external URL that encodes exfiltrated data in the query string. If your agent renders content and has network access, that’s a data exfiltration channel that bypasses every other guardrail you’ve built — because most people haven’t thought to put a sanitizer at the content rendering layer. It doesn’t look like an input boundary, so it doesn’t get treated as one.

‘Mother CLAUDE’: Custom Agents, or How We Accidentally Built a Team

A developer restructured their workflow around Claude Code’s custom agent feature — extracting specialized procedures (code review, cross-repo impact analysis, quality checks) from their main session into isolated sub-agents with specific tool access and trimmed context windows. The reframe: documentation isn’t just reference material, it’s a talent pool you can promote into specialist agents.

custom-agents context-isolation documentation-architecture

The context bloat problem gets worse as sessions get longer and workflows get more complex. Every checklist and domain-specific pattern loaded into a single primary session competes for attention and burns tokens on irrelevant context. Extracting those into agents that activate on delegation keeps the primary session lean and creates automatic specialization. The “promote your documentation” framing is the insight worth keeping: if you’ve already written the procedure in your config, you’ve done 80% of the work to make it an agent.

When Model Selection Breaks the Product

A reverse guide to AI deployment mistakes, focused on the “route everything to the best model” trap. Using the most capable model for 100% of queries — instead of building a classifier that matches tasks to appropriate models — produced a 64% cost increase and 63% P95 latency degradation compared to a routing policy that sent simpler tasks to cheaper, faster models.

model-routing inference-optimization cost-management

“Use the smart model” is not a routing policy — it’s an absence of one. A lightweight intent classifier that routes simple tasks to smaller models and only escalates for tasks that genuinely need frontier capability is measurably better on cost, latency, and often quality (smaller models don’t over-engineer simple tasks). The 64% cost reduction is specific enough to make building the classifier a concrete near-term project rather than a vague optimization backlog item.

🪨