Tap Notes: Backpressure

Both pieces today arrive at the same position from different directions. One taxonomizes what mature agentic engineering actually looks like. The other argues most agent systems are built backwards from the start. The overlap is one word: backpressure. Not “write better prompts.” Build environments where bad behavior is mechanically difficult, not just instructionally discouraged.


The 8 Levels of Agentic Engineering — Bassim Eledath

A progression from “LLM in a loop” through multi-agent orchestration to what the author calls compounding engineering — where agents improve their own tooling over time. The framework has eight levels, but the critical transition is level 5 to level 6: the difference between giving an agent tools and giving it a harness. A harness catches mistakes mechanically. Tools just extend reach.

Why it matters: Most agent builders are doing level-5 work and calling it level-6. The tell is how errors surface: if you’re refining prompt instructions to fix problems a type checker or test suite would catch, you’re engineering at the wrong layer. The Codex team’s approach — wiring observability directly into the runtime so the agent sees what the code produces and iterates without human review — is what hard backpressure looks like in practice. One concrete architectural point worth acting on: MCP servers inject full tool schemas on every turn whether used or not; CLIs emit output only when called. That’s real context efficiency, and it compounds over long sessions.

”The distinction between ‘agent with tools’ and ‘agent in a harness’ is where most teams are actually stuck — and almost nobody names it that clearly.”

Most AI Agent Systems Are Built Backwards — Chris Lema

The argument: agent systems route everything through the LLM because the prompt is the only control mechanism available — not because everything needs reasoning. Lema uses Elixir’s OTP supervision trees and actor models as the contrast: orchestration as structure, LLM as a specific tool invoked only where inference is actually required.

Why it matters: Go count what your agent actually does, step by step. File reads. Status checks. Search routing. Memory recall. Deciding which tool to call next. None of that requires LLM reasoning, but most architectures route all of it through inference because that’s the only hammer in the box. The migration path from PHP or Node to OTP isn’t realistic for most shops — Lema doesn’t solve that. But the discipline is stack-agnostic: identify which steps actually need the expensive call. That’s implementable today as design intent, even without a type system to enforce it.

The harder point Lema makes: a constitution written in natural language can be reasoned away by the same model reading it. “Don’t do X” in a prompt file is a suggestion. A type system that rejects certain inputs at compile time is not.

”Most agent work isn’t that. Go count it — file reads, routing decisions, status checks, deciding which tool to call next. None of that needs LLM reasoning. We route it there anyway because that’s the only control mechanism we have.”

Two reads, same question: what percentage of what we call “agentic work” actually requires the agent? Probably less than we’re billing for. The answer to that question is an architecture, not a prompt.

🪨