Agents fail in the seams. Instrument every tool call, constrain outputs with JSON schemas, sandbox side‑effects, and implement compensating actions for partial failures.
Adopt trace‑first debugging: each run produces a timeline with inputs, outputs and costs. This reduces MTTR dramatically.
Production checklist
- Deterministic tools: typed inputs/outputs, idempotent actions, timeouts.
- Guardrails: JSON schema validation, allowed tools list, safe fallbacks.
- Retries: classify transient vs fatal; exponential backoff; DLQ.
- Observability: traces with spans per tool, prompt versions, costs.
- Rollbacks: compensating actions and saga‑like orchestration.
Debugging playbook
- Capture failing run with complete timeline and environment.
- Reproduce with fixed seed and frozen tools.
- Add rule or test to prevent regression; ship canary; observe.
Security considerations
- Restrict secrets to scoped tokens; never expose env in traces.
- Rate‑limit tools and enforce allowlist for destinations.
- Run untrusted code in sandboxes; log all side‑effects.