Why Retry Budgets Should Be Per Tool, Not Per Run
Most teams start with a global retry limit or max-iteration cap on agent runs.
It feels safe. It gives one number to monitor.
In production, a single unstable tool can consume most of that budget …
Engineering insights on AI governance, LLM infrastructure, and building production control planes.
Most teams start with a global retry limit or max-iteration cap on agent runs.
It feels safe. It gives one number to monitor.
In production, a single unstable tool can consume most of that budget …
Your orchestration graph can run exactly as designed and still produce incidents.
Tools fire. Models respond. Branches resolve.
Yet you end up with duplicate writes, partial state, and postmortems …
Most LLM systems start life as request-response calls.
You send a prompt. You get a response. If it fails, you retry.
This works beautifully in demos.
It breaks quickly in production.
Not because the …