Use schemas and self‑check prompts, limit context windows deterministically, and snapshot few‑shot examples. Track regressions with a small, realistic eval set.
Common pitfalls
- Implicit state: hidden variables lead to non‑reproducible outputs.
- Over‑long context: drifts the model and increases cost/latency.
- Ambiguous tasks: missing schema or rubric for correctness.
- Unversioned prompts: impossible to compare and roll back.
Hardening techniques
- Define a JSON schema for outputs; validate strictly.
- Add self‑checks and tool‑assisted re‑ask on failure.
- Use few‑shot libraries with explicit versions and tests.
- Keep prompts short; move rules into structured instructions.
Testing prompts quickly
- Create a tiny eval set (20–50 examples) with edge cases.
- Track win/loss reasons; iterate on instructions, not just wording.
- Freeze a baseline and compare diffs per version.