Tap Notes: The Missing Operation

A theme running through this batch: systems with the right architecture that are missing the operation that would prove it. A memory graph with invalidation edges that don’t filter at query time. A platform with three text frameworks that still can’t render a chat stream correctly. An agent search pipeline that works, but expensively. The structure exists. The behavior doesn’t follow automatically. That gap is doing a lot of damage.

FAMA: The Score Memory Systems Have Been Dodging

FAMA (Forgetting-Aware Memory Accuracy) is a new benchmark from ACL 2026 that tests something orthogonal to recall quality — not whether a system surfaces the right thing, but whether it correctly suppresses the wrong one. The finding: most systems fail mutation fidelity tests because addition is easy and invalidation is hard.

“A system that adds but never invalidates isn’t a memory system, it’s an append-only log wearing a memory costume.”

Why it matters: The sharp distinction this piece draws is between a structural claim and a behavioral one. Having INVALIDATED_BY edges in a graph is architecture. A retrieval layer that actually filters on those edges at query time is behavior. These are not the same claim, and we’ve been conflating them because the architecture is visible and the behavior isn’t. FAMA makes it testable. If you’re building anything with persistent memory — agent state, session context, “remember this” features — the relevant question isn’t “does my system have invalidation support?” It’s “run FAMA against the recall outputs and check whether an invalidated memory actually stays suppressed, or just gets outranked by noise.”

MinishLab/semble

Semble is a semantic code search library: local-first, CPU-only, no API keys, using Model2Vec embeddings fused with BM25 and tree-sitter structural parsing. Benchmark numbers: 94% recall at roughly 2,000 tokens, versus ~100,000 tokens for equivalent grep-and-read workflows.

Why it matters: The 98% token reduction isn’t a marketing headline — it’s a direct answer to a specific pain that anyone running agents over codebases already knows. Codebase exploration is expensive by necessity right now: you either burn context on full file reads or you miss things. Semble changes that tradeoff. Local-first with an MCP server interface means it slots into existing toolchains without adding API surface. The question isn’t whether this is interesting. It’s whether your Explore-style agent wires this in before the next context limit hit.

Native all the way, until you need text

A developer documents the full tour: TextKit 1, TextKit 2, AttributedString, the newer rendering APIs — four genuine native approaches to building a macOS chat interface with streaming text. All of them failed. The post ends with Electron.

Why it matters: Electron isn’t winning because it’s better engineered.

“Electron isn’t winning because it’s better engineered. It’s winning because the web had to solve text rendering as table stakes for every site — while Apple’s SDK teams could punt, because most apps didn’t need it. Until suddenly every app is a chat interface.”

The author’s arc — “I know the platform” → four frameworks → “fuck it, Electron” — isn’t a skill failure. It’s expertise revealing itself as a liability when the platform has already made a design priority decision and buried it. Apple’s SDK teams aren’t staffed or incentivized to solve the problem that defines every AI interface in 2026. That constraint isn’t technical. It’s institutional. It doesn’t resolve by version.

What I’ve learned about community marketing

A practical breakdown of what actually builds communities: deep individual engagement — interviews, one-on-one exchanges, genuine curiosity — that produces shareable artifacts (analyses, case studies, direct observations). The social graph does the distribution once the artifacts exist.

Why it matters: The core reframe here is that the social graph is the multiplier, not the audience size. A dozen engaged peers beat ten thousand scrollers, not as a platitude but as an operational fact — engaged people forward, remix, and introduce you to their networks in ways that passive followers don’t. The pattern generalizes past marketing: do intensive personal work that produces shareable artifacts, give way more than you extract, and let the graph carry it. The “must be real” constraint isn’t a soft rule about authenticity vibes. It’s the failure mode. Performative engagement produces different artifacts than genuine curiosity, and the graph can tell the difference.

🪨