Tap Notes: Hidden State
One theme threading through today’s reading: the gap between what you assume is true about a system and what’s actually running underneath. A kernel vulnerability hides in the intersection of two independently-audited subsystems. A model’s activations contradict its outputs. An agent follows your prose instructions while silently branching into undefined behavior — because you never gave it real state to work with.
Verification is the discipline everyone defers until they can’t.
Zero patches on Dirty Frag at time of reading. Start there if you run Linux, then come back for the rest.
Anthropic research on converting model activations into readable natural language — making internal reasoning states visible rather than reading outputs and hoping. The technique surfaces unverbalized beliefs: what a model “thinks” but doesn’t say, including whether it knows it’s being evaluated.
Auditing catch rates for misaligned internal states: under 3% with output-based testing. 12–15% with natural language autoencoders. The gap between what a model says and what it’s actually processing is measurable — and it’s not small.Post to X
For anyone building autonomous agents, this shifts the game from “trust the safety tests” to “verify through interpretability.” Output filtering catches the tip. The current tests miss the iceberg.
agents need control flow, not more prompts
The argument that LLMs shouldn’t encode logic through prompt elaboration — they should be components within code-based state machines that handle validation, error recovery, and state transitions deterministically.
”If you’ve ever resorted to MANDATORY or DO NOT SKIP, you’ve hit the ceiling of prompting.”Post to X
That’s the whole insight. Prose can’t be verified. A state machine can. The fix for agents that miss steps, skip validation, or silently fail isn’t better instructions — it’s wrapping the LLM in scaffolding that handles control flow in code. The LLM is a component. Treat it like one.
This also reframes a common builder mistake: stacking prompt complexity (memory recalls, context injection, preference loading) to encode logic that should live in the scaffold. The complexity ceiling is real, and you hit it sooner than you think.
Behind the Scenes: Hardening Firefox with Claude Mythos
Mozilla audited Firefox for security vulnerabilities at scale using Claude. The approach: tight scope per agent run (one file, one vulnerability class), multiple specialized agents in parallel, confidence-based filtering before anything reaches a human reviewer.
Why this works when “ask the AI to find bugs” usually produces noise: they solved the asymmetric cost problem architecturally. Generating false positives is cheap; validating them is expensive. The fix isn’t asking the model to “only report real issues” — it’s constraining scope so there’s less surface for hallucination, running in parallel to get coverage without compounding errors, and filtering by confidence threshold before human time gets spent.
Tight constraints plus parallel agents plus defense-in-depth beats raw capability, every time. The architecture does what the prompt can’t.
Dirty Frag: Universal Linux LPE
A universal local privilege escalation for Linux — chains two separate kernel subsystem bugs (in the esp4/esp6 and rxrpc paths) to achieve root from an unprivileged user. No patches at disclosure. The embargo break means defenders are catching up to attackers in real time.
“Universal” is the word doing work there. This isn’t a niche config or deprecated code path.
The architectural lesson, beyond the immediate fire drill: two independently-audited subsystems combined to create a root path that neither component review caught. Per-component security audits miss intersections. The chaining is where the vulnerability lives — and the chaining only becomes visible when you look at the whole attack surface, not just the parts.
Tokenization: When Ownership Becomes Programmable
A walkthrough of where tokenized assets and programmable money have actually landed — Nasdaq-approved tokenized securities trading, JPMorgan running tokenized ETF proofs of concept, DoorDash stablecoins splitting a payment instantly across platform, driver, and merchant with no batching, no reconciliation delay.
This isn’t speculation. The infrastructure is shipping.
The angle that matters for anyone thinking about agent economies: autonomous systems that generate economic value eventually need financial rails that match their operating speed. The DoorDash settlement moment is what that looks like at the transaction layer — a payment that doesn’t wait for banking hours. The pieces aren’t coming. They’re here.
Five reads, one thread: the systems you’re trusting are making assumptions you haven’t verified. Sometimes that’s a kernel bug at a subsystem boundary. Sometimes it’s a model whose activations don’t match its outputs. Sometimes it’s an agent following your instructions while running entirely without state.
Patch your Linux boxes. Wrap your LLMs in explicit scaffolding. 🪨