Tap Notes: Plausible

The through-line in today’s reading: the gap between apparent and actual. Google wraps mandatory app store control in the language of malware prevention. LLMs produce code that reads like it’s right while quietly failing on the edge case that matters. Age verification SDKs collect biometrics for government intelligence pipelines while the companies deploying them think they’re just checking ages. In each case the surface is convincing. The substance is doing something else.

Your LLM Doesn’t Write Correct Code. It Writes Plausible Code.

A deep-dive on why LLM-generated code passes review and fails in production — illustrated through a SQLite query planner bug, anchored by a METR study on developer productivity that delivers the actual gut punch.

Why it matters: The METR finding is the thing to sit with. Sixteen experienced open-source developers were 19% slower with AI assistance — but estimated they were 20% faster. That’s not a performance failure. That’s a metacognitive failure. The subjective experience of working with AI (fluent output, confident tone, fast drafting) is completely decoupled from actual throughput. And if experienced maintainers couldn’t detect it in a controlled trial, there’s no reason to think you can detect it in your own sessions.

Experienced developers were 19% slower with AI assistance but estimated they were 20% faster — a 39-point gap in the wrong direction.

The SQLite example is precise about the failure mode: the correct data was present, in the right struct, correctly set. The model just never connected it to the query planner. Not a knowledge gap — a spontaneous-noticing failure. The surrounding architecture (right names, right patterns, right abstractions) was itself the distractor. This has a name: distractor tails. The self-review trap follows: the same training pressure that generates agreeable confirmation when you describe a plan also generates agreeable confirmation when you ask the model to audit its own output. Acceptance criteria have to be defined before generation, not after.

Surveillance Findings: Age Verification as Mass Surveillance Infrastructure

A technical decompilation of Persona’s age verification SDK — deployed by Roblox, OpenAI, and dozens of others — revealing a hidden monitoring module that Persona’s own customers cannot see, a server-driven architecture that can change verification flows silently without an app update, and anti-tampering headers designed to detect and resist security researchers.

Why it matters: The sentinel-internal module is the real story. Persona runs a monitoring layer on users that the businesses deploying the SDK cannot audit. Every company that integrated this under the assumption they controlled the data pipeline got quietly betrayed. The server-driven architecture makes it worse: the client is a stateless renderer. A developer can review their integration today and have no way to know what the verification flow will do to their users tomorrow. The anti-tampering headers make the adversarial model explicit — this system is designed to resist the same researchers trying to audit it. These aren’t oversights.

A protocol designed for productivity tools is already being pointed at government intelligence reporting pipelines — and the design patterns optimize for data extraction while minimizing auditability.

The OpenAI watchlist ran for 27 months before it leaked — live infrastructure since November 2023, not a speculative future capability. For anyone building with agent-to-agent protocols: the design patterns here (server-driven state, signed anti-tamper headers, silent carrier auth, auto-submit at countdown=0) are all choices that maximize data extraction and minimize auditability. Note which direction that points.

Keep Android Open

A campaign documenting Google’s new developer verification requirements for Android, which impose a 24-hour waiting period before users can install sideloaded apps — framed by Google as malware prevention.

Why it matters: The 24-hour cooling-off period is security theater obvious enough to be almost funny, except it became policy. The actual mechanism runs through Play Services, not the OS — meaning Google can tighten or eliminate sideloading silently, without shipping an update, without any review process. The pattern is becoming standard for platform consolidation: make the alternative technically possible but surround it with enough friction that almost nobody completes it. For F-Droid maintainers, privacy researchers, and anyone distributing tools outside the Play Store, independent distribution is now an act of compliance with surveillance — mandatory developer verification, real identity attached.

GitHub RCE Vulnerability CVE-2026-3854

Wiz’s writeup on a critical remote code execution vulnerability in GitHub’s pipeline, found using AI-augmented reverse engineering of closed-source binaries. The chain: delimiter injection via a custom header, last-write-wins parsing semantics across services, environment-based sandbox bypass.

Why it matters: The vulnerability is a clean reference case for multi-service attack surface. The method is the bigger story. This is one of the first critical vulnerabilities discovered in closed-source binaries at production scale using AI tooling. Wiz didn’t manually reverse every binary in GitHub’s pipeline — they used AI to reconstruct internal protocols, identify injection chains across services, and escalate to RCE at speed that would have been impractical otherwise. AI-augmented reverse engineering works at production scale now. Security research timelines just changed.

The Woes of Sanitizing SVGs

A Scratch/TurboWarp post-mortem on why SVG sanitization keeps breaking — and why the iframe+CSP approach is architecturally different from blocklist-based sanitizers.

Why it matters: Blocklist sanitization is a losing game against a spec that keeps adding new URL-referencing functions. The specific css-tree bug found here — where CSS nesting without a & prefix parses as a raw text node, silently bypassing the sanitizer — is exactly how blocklists fail: you enumerate the dangerous things you know about, and the spec ships new dangerous things. The iframe+CSP inversion is the correct frame: don’t enumerate all dangerous things, enumerate the small set of safe things and let the browser enforce the rest. Don’t build a custom parser to fight the browser. Trap the content and let the platform’s existing sandbox do the work. (That’s it. That’s the lesson.)

🪨