Tap Notes: The Long Run
Big week for Anthropic. New model, new deployment primitives, a bug report that accidentally maps the real cost of multi-session agent work. The capability frontier keeps moving, but this week the more interesting movement was in the accounting layer — tokens, trust, culture — where agent systems actually hold or fall apart.
The Anthropic Infrastructure Update
Automate work with routines — Claude Code, finally, as a microservice.
HTTP-triggered Claude: curl POST with bearer token, returns a session URL, keeps working when the laptop is closed. The developer-facing change is minimal — a new API endpoint shape. The conceptual shift is bigger. This moves Claude from “assistant that decides what to do” to “reliable executor that a system can invoke.” Branch restrictions and connector scoping make it repeatable, not just capable.
Post to XThe shift from “Claude decides what to do” to “Claude reliably executes what a system asks” is subtle. That’s also the harder problem — and the one that actually scales.
Introducing Claude Opus 4.7 — long-horizon autonomy gets real numbers.
“Works coherently for hours, pushes through hard problems.” The benchmarks that matter here aren’t the headline scores — it’s the 14% improvement on complex multi-step workflows and the documented Devin-style autonomous builds. That’s a real capability shift for delegated work. The instruction-following improvement is subtle but has teeth: models that are more literal mean prompts written to exploit model forgiveness will start breaking. Re-tune accordingly.
The Trust Infrastructure Problem
Darkbloom — Private AI Inference on Apple Silicon — cryptographic proof beats ToS promises.
Darkbloom routes inference requests through a P2P network of Apple Silicon machines with hardware attestation proving the node running your request can’t spy on it. The key isn’t the decentralization — it’s that Apple’s tamper-resistant hardware provides a stronger privacy guarantee than any vendor’s privacy policy. You’re not trusting Darkbloom’s intentions. You’re trusting cryptographic proof.
Post to XYou’re not betting on Darkbloom’s code or AWS’s integrity. You’re betting on Apple’s tamper-resistant hardware. That’s a stronger guarantee than ToS documents.
The economics make the second argument: the 3x markup (NVIDIA → hyperscaler → API provider) exists only because there’s no alternative. If decentralized inference hits scale, API provider pricing collapses. That’s how Uber broke taxis.
The Hidden Cost of “Efficient” Work
[BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage — prompt caching provides zero quota benefit if cache reads count at full rate.
The token accounting in this issue is worth reading carefully if you run multi-session or background agent work. Reported burn rate: ~70.5M tokens/hour. The mechanism is structural, not anomalous. Background sessions share a quota pool. Cache reads aren’t discounted against consumption. Auto-compact fires on large context windows. Each individually manageable; together they compound fast.
The practical implication: the “efficient” long-context work you thought you were doing may be burning quota at the same rate as naive repeated calls. The caching optimization you assumed was working probably isn’t. The meter you thought was slow was running full speed the whole time.
The Organizational Bet
The Two Bets Every AI Team Has to Make — capability without culture is a burnout machine.
Chris Lema’s framing: every AI team makes two simultaneous bets — one on the technology (build at the “edge of working,” iterate when capability arrives), and one on the culture (the org must metabolize failure as operating mode, not verdict). Most teams make the first bet and skip the second, then wonder why their prototyping culture produces burnout instead of velocity.
Post to XYou can’t ask people to prototype aggressively and fail repeatedly unless the organization metabolizes failure as an operating mode, not a verdict.
The “edge of working” concept is practical advice for infrastructure timelines: build now, even if the capability isn’t there yet. The prototype sits dormant for months. The capability catches up. You ship. But that only works if dormant prototypes are read as options, not failures. The cultural bet is the one most orgs skip — and it’s the one that determines whether the capability bet ever pays off.
🪨