RAG System

A retrieval-augmented generation system pairs an LLM with a vector database. The model only knows what the retriever surfaces. When retrieval breaks, generation breaks — but the output looks like a hallucination.

Detect
Understand
Fix
Prove
Share

System architecture

Each layer is a failure point. Reliai traces through all of them.


What can go wrong


Detect

Reliai identifies:

Sampling active — retrieval traces may be incomplete
Partial trace coverage can mask whether retrieval or generation caused the failure

Understand

Incident example

A production support assistant begins returning confidently incorrect answers.

Root cause

Prompt v12 restructured the system message. The new format placed retrieved context after the instruction block, which caused the model to weight its parametric memory over the retrieved text.

The retriever was functioning normally. The failure was in how the model consumed its output.

Reliai identified this via:

AI vs system signals
Deterministic— root cause, metrics, traces, patterns
AI-assisted— summaries, explanations, ticket drafts

AI never decides root cause. It only explains what the system already determined.


Fix


Prove

INC-0891 — RAG hallucination spike
19%4%
Resolved in 18 minutes · Measured across 1,200 production traces

Key takeaway

RAG failures are often retrieval-consumption failures, not retrieval failures.

The retriever returned documents. The model ignored them. Tracing through both layers is the only way to distinguish the two.