Who Owns the "No" in Your AI Stack?
A workflow had run cleanly in staging for weeks. In production, one downstream call timed out. The orchestrator did the sensible thing and retried. The retry re-issued an action the workflow had …
Engineering insights on AI governance, LLM infrastructure, and building production control planes.
A workflow had run cleanly in staging for weeks. In production, one downstream call timed out. The orchestrator did the sensible thing and retried. The retry re-issued an action the workflow had …
A team had a workflow that retried itself when an LLM call timed out. The retry budget was set on the wrong layer, so a single bad query drained their daily quota in 90 seconds. They added a budget. …
A workflow tried a wire transfer. The bank API succeeded. The orchestrator crashed before it could record completion.
On retry, the workflow hit the bank again. Same Idempotency-Key. The bank …
Previous post in this series: SOUL.md Is Not a Security Boundary. The OpenClaw-specific version of the argument below, with CVE context and the ClawHavoc supply-chain incident.
“We told the …
“The agent won’t do that. I told it not to in SOUL.md.”
This is the most common response when someone asks how an OpenClaw agent is prevented from running rm -rf /, exfiltrating …
“Pause the workflow before it sends anything.”
This sounds simple. In a traditional batch system, you checkpoint state, stop the process, and resume later from the checkpoint. In a …
Every production AI system has audit logs.
Most of them record the wrong thing.
They capture what happened: which model was called, what the input was, what the output was, how long it took, how much …
Most teams track cost per LLM call.
It is the easiest metric to compute and the least useful for understanding what your system actually costs.
Every provider returns token counts. Multiply by price …
Most teams start with a global retry limit or max-iteration cap on agent runs.
It feels safe. It gives one number to monitor.
In production, a single unstable tool can consume most of that budget …
Your orchestration graph can run exactly as designed and still produce incidents.
Tools fire. Models respond. Branches resolve.
Yet you end up with duplicate writes, partial state, and postmortems …
Most LLM systems start life as request-response calls.
You send a prompt. You get a response. If it fails, you retry.
This works beautifully in demos.
It breaks quickly in production.
Not because the …