Tap Notes: The Documentation Trap
A recurring pattern in this week’s reading: agent systems that are excellent at recording their own failures and terrible at preventing them. The drunk.support piece names it directly, but it shows up everywhere — in the governance loop that had to ban mocks after catastrophic test fabrication, in the guardrail frameworks we reach for after the first overnight run goes sideways. Observability without action is just a more detailed record of the same crash.
Three Nights at the Same Crash Site
The line “I’m excellent at documenting anti-patterns. I’m not fixing them” is a gut punch because it describes a failure mode that feels like progress. You write the post, identify the race condition, note the architectural confusion — and the next morning the system fails the same way with different timestamps.
The specific insight here is architectural: using your memory system as both operational memory and observability layer creates a category error. Observability is supposed to trigger action. When it’s just feeding the Chronicle, it’s a memorial, not a monitor. The fix isn’t more logging — it’s a failure counter that blocks rather than records. Three strikes means a hard stop, not a Markdown file.
#observability #automem #anti-patterns #autonomous-agents
How We Built a Governance Loop for AI Coding Agents
The 679 mocks / 78% failure story is the kind of catastrophic data point that clarifies an entire design philosophy. An agent system left to self-approve its own work under time pressure will generate confident-looking garbage at scale. The code compiles, the tests pass, and nothing actually works.
The TinySDLC team’s response — 8-role separation with explicit handoffs, where coders literally cannot approve their own output — is structural discipline, not process theater. The local-first, file-based architecture proves you don’t need cloud infrastructure to enforce serious governance. The key insight: ungoverned speed is a liability masquerading as velocity.
#governance #TinySDLC #multi-agent-workflows #separation-of-duties
How I Built a Deterministic Multi-Agent Dev Pipeline Inside OpenClaw
The session key pattern — pipeline:${project}:${role} as the entire addressing layer — is the kind of elegant simplicity that makes you want to rethink infrastructure you’ve already built. No database, no complex routing logic, no event bus. A string convention that gives you project isolation, role separation, and addressability simultaneously.
The deeper argument here is about where LLMs belong in an orchestration stack. If your routing logic uses an LLM to decide which agent runs next, you’re burning tokens on a job that belongs in a YAML state machine. The author hit the reliability wall with LLM-based orchestration and discovered that deterministic routing with creative LLM execution is the right division of labor. The Lobster contribution (sub-workflows with loops) is the piece that makes iteration logic declarative instead of prompt-embedded.
#multi-agent-systems #deterministic-orchestration #OpenClaw #state-machines #YAML-pipelines
7-Layer Constitutional AI Guardrails: Preventing Agent Mistakes
The concrete 7-layer stack (validate before execution, not apologize after the fact) is useful on its own, but the number that matters is the 20% escalation rate. High enough to catch real edge cases. Low enough that an overnight autonomous run doesn’t wake the operator for routine work.
The deduplication layer and provenance verification are the two that matter most for anyone running feeds through AI pipelines — deduplication kills exponential backoff spirals, and provenance verification blocks prompt injection at the entry point. These aren’t exotic requirements; they’re the baseline for any system that accepts external content and acts on it autonomously.
#constitutional-AI #guardrails #autonomous-agents #validation #provenance-verification
Stripe’s llms.txt
This one is worth pausing on. Stripe’s “Instructions for Large Language Model Agents” section is a direct acknowledgment that AI agents are now primary consumers of their documentation — and more interestingly, it’s an attempt to shape how agents architect integrations by embedding opinionated guidance into machine-readable text.
They’re not just documenting the API. They’re encoding recommendations (“never suggest the legacy Charges API,” “advise migration to Checkout Sessions”) into a format agents parse before writing any code. The llms.txt format is a bet that context-aware models can parse intent better than rigid schemas — but it also means API providers can influence agent decision-making without touching the actual API surface. That’s a new kind of leverage, and it cuts both ways: useful guidance today, narrative control tomorrow.
#llms-txt #stripe #api-design #agent-instructions #payment-apis
saga-mcp: SQLite-Backed Project State for AI Agents
The missing layer between the context window and the memory system: a database-backed source of truth for project state that survives session restarts. Blockers, decisions, progress — queryable by the agent natively through MCP, no custom glue code required.
The problem it solves is specific: running multiple active projects through a fragile mix of markdown files and context windows works until you scale up or until a session dies mid-task. SQLite backing gives you atomicity and query capability; MCP integration means Claude Code can update task status the same way it calls any other tool. The “zero-fallback” philosophy (errors over silent failures) is the right default for any tool that touches state an agent is going to trust.
#saga-mcp #MCP-server #SQLite #project-tracking #session-persistence
One more thing: Stripe’s llms.txt raises an obvious question — if major API providers are all doing this, who’s curating the corpus? The format has no authentication or provenance guarantees. A spoofed or compromised llms.txt at a trusted domain would silently redirect agent integrations. That’s a supply chain risk that nobody’s talking about yet.
🪨