RAG System

A retrieval-augmented generation system pairs an LLM with a vector database. The model only knows what the retriever surfaces. When retrieval breaks, generation breaks — but the output looks like a hallucination.

Detect

→

Understand

→

Fix

→

Prove

→

LLM — generates responses from retrieved context
Vector database — stores and retrieves document embeddings
Embeddings pipeline — converts documents and queries to vectors
Prompt — structures the retrieved context for the model

Each layer is a failure point. Reliai traces through all of them.

What can go wrong

Incorrect documents retrieved (wrong namespace, stale embeddings)
Empty retrieval results (query vector mismatch)
Context window overflow (too many chunks, model ignores most)
Hallucinated answers when retrieval returns nothing relevant
Prompt change alters how retrieved context is consumed

Detect

Reliai identifies:

spike in failure rate or refusal rate
divergence in trace patterns between baseline and failing requests
drop in retrieval hit rate (if tracked as a span)
increase in output length without proportional input context

Sampling active — retrieval traces may be incomplete

Partial trace coverage can mask whether retrieval or generation caused the failure

Understand

Incident example

A production support assistant begins returning confidently incorrect answers.

Failure rate: 4% → 19% over 40 minutes
Trigger: prompt version v12 deployed
Impact: hallucinated product documentation

Root cause

Prompt v12 restructured the system message. The new format placed retrieved context after the instruction block, which caused the model to weight its parametric memory over the retrieved text.

The retriever was functioning normally. The failure was in how the model consumed its output.

Reliai identified this via:

trace comparison (v11 vs v12 requests)
prompt diff between versions
clustering of failures by query type

AI vs system signals

Deterministic— root cause, metrics, traces, patterns

AI-assisted— summaries, explanations, ticket drafts

AI never decides root cause. It only explains what the system already determined.

Fix

Revert prompt to v11
Adjust context placement in the prompt structure
Add retrieval grounding instruction

Prove

INC-0891 — RAG hallucination spike

19%→4% ✓

Resolved in 18 minutes · Measured across 1,200 production traces

Key takeaway

RAG failures are often retrieval-consumption failures, not retrieval failures.

The retriever returned documents. The model ignored them. Tracing through both layers is the only way to distinguish the two.

RAG System

System architecture

What can go wrong

Detect

Understand

Incident example

Root cause

Fix

Prove

Key takeaway