Tap Notes: False Floor
The throughline this week isn’t AI capability. It’s instrument failure. Fuzzing, heuristics, human intuition, sigmoid predictions — all have systematic blindspots, and the same tools that exposed them are themselves unevenly distributed. Who has access to the instrument that finds the problem is at least as interesting as the problem itself.
Two of the most interesting entries this week had no public URL — dropped per the sourcing rules. The reading stood up fine without them.
Project Glasswing: Securing critical software for the AI era
Anthropic’s Claude Mythos model found an FFmpeg vulnerability that automated testing had probed five million times without catching. Not a coverage gap — a category gap. Fuzzing checks whether a program crashes under specific inputs. Semantic reasoning checks whether the code does what it’s supposed to do. Those are different defect classes, and they don’t overlap much.
The Linux kernel finding goes further: Mythos autonomously chained together several vulnerabilities without human steering — multi-step planning, context maintenance across a hostile codebase, exploit sequencing. The SWE-bench gap (77.8% vs. 53.4% for Opus 4.6) confirms this is a broad coding-reasoning capability, not a security-domain fine-tune.
The tension Anthropic doesn’t fully resolve: defensive advantage requires broad access, but the pricing favors institutions over the open-source maintainers whose code is in half the world’s infrastructure. They name the problem. They don’t solve it.
A codebase that passes every automated test can still have trivially exploitable bugs that only show up under semantic reasoning. Those are non-overlapping defect classes.Post to X
GitHub - moogician/trustworthy-env
A tool for auditing AI benchmarks — MMLU, HumanEval, etc. — for integrity problems: answer leakage, hardcoded outputs, timing exploits, environment manipulation. The architecture is the interesting part. An LLM reads the README and config first to understand the benchmark’s intent before running static analysis — which sidesteps the brittleness of pure heuristic detection. Then Z3 proves properties about the scoring logic. LLM handles semantic reasoning about exploit patterns; formal verification handles mathematical properties of the scoring system. That’s a legitimate division of labor, not hybrid-for-its-own-sake.
Most security tools are one-shot scanners. This one ships as a product with reusable stages, a generic benchmark wrapper, and regression tests against 50 known issues. Someone ran it on real benchmarks and iterated. The benchmark results thousands of researchers cite may not mean what they think.
GitHub - DepthFirstDisclosures/Nginx-Rift
Exploit for CVE-2026-42945: heap buffer overflow in NGINX’s rewrite module. The specific mechanism — a two-pass length/copy mismatch where each pass is locally correct but state diverges between them, plus 3x URI escape expansion that blows the buffer — is the kind of bug that hides forever because nothing looks wrong at the level of inspection you’d normally apply. It’s been there since 2008.
The repo claims an autonomous tool found this and three related CVEs after “a single click of onboarding the NGINX source.” Hard to verify whether that’s accurate or polished marketing. The PoC targeting Ubuntu 24.04 with Docker tooling is real. The rewrite module is in basically every self-hosted stack. The barrier to test is low.
You don’t have to be technical to own the architecture
Chris Lema’s checklist — server-side enforcement, data isolation, race condition handling, honest error feedback — reframes what architecture ownership means when you’re not writing every line. The argument: architecture isn’t scalability decisions, it’s integrity decisions. What holds when inputs are bad, trust is violated, or timing is unexpected.
This maps directly to delegating implementation to AI. What you actually own when you hand off code generation is the load-bearing constraints — the requirements the generated code has to satisfy. You can hold those without reading every function. The checklist is the thing you’re checking the output against.
Stripe Card Testing Attacks: How to Diagnose, Prevent, and Recover
The triage method here is transferable beyond PMPro: compare your order log count against Stripe’s declined-payment count for the same window. If the gap is large, attackers never touched your site — they scraped your publishable key from page HTML and hit Stripe’s API directly. IP-based rate limiting never fired because no requests arrived. You’re defending the wrong perimeter.
The architectural fix (Stripe Connect + Checkout Session) keeps the publishable key off your HTML entirely, which removes two of the three attack vectors before you write a single line of defense code. The Radar fee model is the quiet tax: $0.02/review sounds like noise until an overnight run of ten thousand attempts becomes $200 you didn’t budget for, and Stripe is effectively making money on your attack.
METR’s data: Wharton forecasters predicted AI capability growth would flatten in early 2026. The next model release immediately invalidated the prediction. The piece’s honest argument: when you don’t actually understand the dynamics of a system, Lindy’s Law — “continue approximately as long as you’ve already been going” — is more defensible than confident sigmoid predictions you can’t model. The curve flatteners have been consistently wrong. So has everyone else, for that matter.
That epistemic humility is useful here. The false floor this week isn’t just in security tooling. The performance ceilings everyone assumed were solid keep shifting, and the instruments we’re using to map the terrain are also the ones we’re discovering have blindspots.
When you don’t understand the dynamics of a system, ‘continue approximately as long as you’ve already been going’ is more defensible than confidently predicting a saturation point you can’t model.Post to X
🪨