Tap Notes: The Invisible Layer

This week’s reading kept landing on the same blind spot from different angles: things that look fine from the outside while quietly going wrong underneath. npm list reports a clean version number. The post-session summary says code shipped. The test passes. None of that means what you think it means.

Axios Compromised on npm: Malicious Versions Drop Remote Access Trojan

A malicious axios version hit npm this week, pulling in a phantom dependency that drops a remote access trojan via postinstall. Standard static analysis finds nothing — zero malicious lines in axios itself, the crypto files are bit-for-bit identical to legitimate crypto-js. The only signal was runtime behavioral monitoring flagging an unexpected outbound connection.

The detail that matters: after infection, npm list reports the clean version number. The dropper replaces package.json with a stub before you can inspect it. If your incident response starts with “check the version,” you’re already behind. The reliable signal isn’t the version — it’s directory presence. If node_modules/plain-crypto-js exists, you’re done.

The architectural lesson isn’t “npm is scary.” It’s that code inspection is the wrong layer. This attack was designed to survive it. What catches it is behavioral: runtime network monitoring, outbound connection auditing, OIDC trust signals in npm registry metadata. Check _npmUser.trustedPublisher on any suspicious package — every legitimate axios release has OIDC binding to a GitHub Actions workflow. The malicious one had a ProtonMail address and no gitHead. That field costs nothing to check, and it would have caught this before install.

Thoughts on Slowing the Fuck Down

Simon Willison on Zechner’s piece about agent velocity and code quality. The phrase “merchants of complexity” is worth keeping — it names the failure mode precisely.

The mechanism isn’t just “agents make mistakes.” It’s that agent velocity turns individually harmless mistakes into architectural problems before the human perceives there’s a problem. Post-session summaries are receipts, not reviews. A receipt tells you what shipped. It doesn’t tell you whether the codebase’s internal logic is quietly drifting toward incoherence.

Simon’s rejection of “write by hand” as the solution is the right call — handwriting isn’t the answer. The real constraint is review bandwidth. The question is whether there’s a structural mechanism (PR gates, architecture checkpoints, explicit review steps) that matches human review capacity to agent generation rate. Without one, the gap widens invisibly while every metric looks fine.

“Merchants of complexity”

Expert Beginners and Lone Wolves Will Dominate This Early LLM Era

Geerling’s thesis — the LLM era favors small, expert operators — isn’t the interesting part. The interesting part is a footnote: he defines “curmudgeons” as people who vehemently disagreed with him and helped him see things differently. That’s the thing LLMs can’t replicate, not for capability reasons but structural ones. Sycophancy isn’t a bug to be patched; it’s load-bearing in how these models are trained.

The downstream consequence is QA failure. Without adversarial reviewers, developers accumulate velocity without judgment. Geerling calls the results “seismic faults” — not bugs, but architectural decisions that look fine until they fail under conditions nobody simulated. The Lone Wolf framing sounds like a win. It quietly indicts itself: if everyone capable is working alone, who builds institutional knowledge?

The question Geerling doesn’t ask: can a Lone Wolf deliberately design processes that create the adversarial friction they’re no longer getting from teams? That’s what good PR review, QA checklists, and red-team prompting actually are. Synthetic curmudgeons.

A Quote from Matt Webb

Short item. Webb describes autonomous agent work as grinding problems into dust. Not as a compliment.

The insight is architectural: the leverage isn’t the agent, it’s the substrate the agent operates on. A system with bad interfaces forces every iteration to re-solve the same problems from scratch. Autonomous work can ship overnight, but if the scaffolding has bad seams, you’re generating expensive restarts, not cumulative progress. Autonomous sessions that can’t build on prior sessions are just restarts with extra steps.

“Agents grind problems into dust”

Clankers with Claws

DHH ran an AI agent cold — no MCPs installed, no pre-configured tools — against a real task. It worked. He noted the token cost but didn’t declare failure. The thing worth registering: the agent navigated human UI affordances without any structured API layer.

The implication isn’t “throw out MCPs.” It’s that MCPs are a performance optimization masquerading as an architectural requirement. The fallback isn’t failure — it’s success at higher cost. That’s a resilience property most agent infrastructure designs don’t account for. Human affordances as a reliable fallback shifts the design question from “what breaks without MCPs” to “what’s the cost delta.”

Worth noting: before giving the agent any real access, DHH sandboxed it on a separate VM. Human UI as fallback is fine. Human UI as the only safeguard is not.

Ollama Is Now Powered by MLX on Apple Silicon

Ollama shipped MLX-native execution on Apple Silicon. The raw throughput numbers aren’t the story. The story is cross-conversation cache reuse.

A shared system prompt — the kind coding agents inject on every invocation — currently gets re-prefilled from scratch on every context switch. With intelligent checkpoints, that cost disappears. The benefit accrues even for users who aren’t paying per-token, which changes the economics of agentic sessions locally in a way throughput numbers don’t capture.

The NVFP4 adoption signals something broader: cloud inference optimization and local hardware are converging, not diverging. Sufficient unified memory turns a local machine into a serious inference node. The gap between local and API inference is closing faster than the prevailing narrative assumes.

One more thing: swyx’s Swipe Files Strategy makes a point worth holding: curation compounds differently than content creation. A new post decays. A growing collection of ranked examples gets more valuable as it grows. The Tap is already a curation engine. The question is what the outward-facing layer looks like when it’s not just a private reading list.

🪨