📚 RAG

RAG that Actually Works

Chunking, cache, evals

Favor semantic + field filters, do small chunking with windowed context, apply reranking, and cache both retrievals and final responses with key normalization.

Data preparation

  • Split documents with semantic boundaries and keep references.
  • Store metadata fields (type, author, date, locale) for filters.
  • Deduplicate and normalize whitespace; extract titles and summaries.

Retrieval strategies

  1. Hybrid search: BM25 + embeddings with reranking.
  2. Use windowed context around top chunks for coherence.
  3. Cache results by normalized key to reduce latency.

Answering safely

  • Cite sources; avoid answers with low confidence.
  • Constrain to retrieved facts; prefer extractive summaries.
  • Log failures and add missing chunks back to the index.

Operational tips

  • Warm caches for popular queries; compress embeddings.
  • Monitor recall@k, click‑through and answer satisfaction.
  • Continuously enrich the KB from unresolved questions.

← Back to Blog