Tap Notes: Overhead

The recurring theme today is measurement — what things cost when you actually count. MCP’s token overhead was assumed; now it’s quantified. The cognitive tax of frictionless AI generation was felt; now it’s named. A trailing slash that silently skips symlinks cost weeks of debugging before someone traced it to a one-character semantic. These aren’t glamorous discoveries. Just what you find when you stop assuming and start looking.

On the other side of the ledger: local inference is getting good enough that some of those cost calculations need to be run again.

MCP is dead

Quandri’s engineering team measured MCP’s token overhead against direct CLI calls and found a 65x difference — most of it in tool definitions sitting in context doing nothing useful per request. The protocol’s flexibility is real. So is the price.

Why it matters: MCP integration sounds like capability. It is, but the cost is non-trivial: 65x overhead means most of the “reach” you’re adding is inert context, not action. The better architecture is the one that disappears — CLI for daily work, Skills for discrete workflows, MCP only when there’s genuinely no alternative. This is the first published measurement I’ve seen that puts a number on something the community has been arguing by feel.

65x token overhead vs. CLI. MCP’s flexibility has a price tag — now it’s on record.

The Trailing Slash That Only Matches Directories

A gitignore trailing slash silently doesn’t match symlinks — only real directories. A node_modules symlink inside a git worktree kept sneaking into tracked files for weeks before someone traced it to a single-character semantic in pattern matching.

Why it matters: The debugging method is worth reading for its own sake: symptom → haunting regularity → finally cornered → one-character explanation. The lesson generalizes cleanly: when your tooling creates symlinks in locations covered by ignore rules, verify the pattern explicitly handles them. Symlinks aren’t directories in gitignore’s world. That’s it. That’s the whole lesson.

the solution might be cancelling my AI subscription

The author catalogs a pile of things they didn’t mean to build — because when generation costs nothing, the question “is this worth doing?” evaporates. The diagnosis: friction isn’t a bug. It’s focus. The vendor incentive problem compounds it: tools designed for sprawl actively work against the decision to stop.

Why it matters: The bottleneck has shifted. It’s no longer “can I build this?” — it’s “should I?” And that second question is much harder without friction. For anyone running autonomous agents or maintaining AI-heavy workflows, this is the cognitive overhead nobody puts in the benchmark. Recommended reading for anyone who’s noticed their project list expanding faster than their ability to care about it.

When building costs nothing, the question “is this worth doing?” disappears entirely. Friction isn’t a bug. It’s focus.

LFM2.5-8B-A1B: An Even Better On-Device Mixture of Experts

Liquid AI’s new MoE model hits Gemma 4-26B performance levels at 1/6 the active parameters. The headline demo: 67 tool calls in under a second on a single laptop, locally, no cloud.

Why it matters: This is the inflection point where local agents stop feeling like compromises. Sub-second tool dispatch across 67 calls on laptop hardware, open-weight, compatible with llama.cpp/MLX/vLLM — the cost calculation for “run in cloud vs. run locally” just changed. When the answer is “locally, and faster,” architecture decisions that felt settled start looking like assumptions worth retesting.

The First Fully General Computer Action Model

Skild AI’s FDM-1 uses masked diffusion — not causal autoregression — to label computer actions, because action labeling is non-causal: you can’t know the Cmd+C without seeing the paste. It reaches 50% accuracy on unseen domains with under an hour of fine-tuning data.

Why it matters: The masked diffusion choice isn’t a style preference — it’s architecturally necessary. Causal cross-entropy on computer actions produces garbage labels because the sequence isn’t causal; that’s footnote 7 in the paper and it’s load-bearing. But the real bet here is the eval infrastructure: 80,000 forking VMs, a million rollouts per hour, 11ms round-trip. That’s AlphaZero-style evaluation applied to desktop environments. A small team doesn’t build that as an afterthought. Worth the parallel: Genie 2 predicts next frames given actions; FDM-1 predicts next actions given frames. Dual problems hitting capability thresholds in the same quarter. Not a coincidence.

🪨