Reliai Demo

Explore a realistic AI reliability workflow in under two minutes.

Simulated failure · INC-1423

Hallucination spike detected

AI Support Copilot · Production · Mar 11, 10:22 AM

Failure rate hit 19% — vs 4% baseline. Reliai detected the regression, opened the incident, and identified the fix before users noticed.

Demo scenario

This is INC-1423 — the same incident from the homepage, live in the product.

AI Support Copilot · Production · Failure rate 19% vs 4% baseline.

Follow the loop: Detect → Understand → Compare → Root Cause → Fix → Prove.

01

Understand — System health

Start at the control panel.

Reliai system status page

AI reliability control panel

AI Support Copilot

Default status page for this AI system. It answers what is happening, whether it is safe, and where an operator should click next.

Is this system safe right now?

Answer: MAYBE

This AI system needs review before the next change.

The system is stable enough to operate, but current signals show elevated reliability risk.

System Health

Reliability score

92

Active incidents

1

Guardrails protecting

17

Traffic

Traces analyzed (24h)

2.3M

Throughput

27

traces/sec · 1m avg

Active services

6

System status

What needs attention next

Latest deployment

Today

Risk score 0.24

Incident pressure

1 incidents / 24h

Latest: Hallucination spike after retriever prompt rollout

Guardrail pressure

17 triggers / 24h

Top policy: structured_output

Deployment risk

Safety before the next rollout

Risk levellow
Risk score0.24
Simulation riskmedium

Guardrail activity

Runtime protection coverage

structured output11
latency retry4
cost budget2

Policy compliance

Organization guardrail coverage

structured output

Mode: enforce

98.0%

Violations last 24h3

cost budget

Mode: warn

96.0%

Violations last 24h5

latency retry

Mode: enforce

94.0%

Violations last 24h2

Recommended next step

Operator guidance

Add hallucination guardrail to retriever prompt

structured_output -> Add hallucination guardrail to retriever prompt. for gpt-4.1

Add retry policy for retrieval failures

latency_retry -> Retry retrieval failures once before fallback

02

Incident

19% failure rate — hallucinated responses

Reliai incident command center

Hallucination spike detectedTodayhigh
Retrieval latency980 ms
baseline410 ms
delta139.0%

Root cause

Prompt rollout changed retrieval behavior

Confidence: 82%

Enable latency retry on retrieval and compare v42 against the previous prompt before expanding rollout.

Root cause confidence 82% based on trace deltas — retrieval latency climbed 139% within 82 min of prompt v42 rollout.

  • prompt version: support/refund-v42 (was v41)
  • retrieval p95 ms: 980
  • chunk count: 6 chunks (was 4)
  • elapsed since deploy min: 82
  • affected trace sample: 128

Trace evidence

failing trace

2310ms end-to-end latency

980ms retrieval latency

3840 tokens

support/refund-v42

baseline trace

1480ms end-to-end latency

410ms retrieval latency

3400 tokens

support/refund-v41

Impact

  • • current value: 980
  • • baseline value: 410
  • • delta: 139.0%

Action

confidence: high · source: root-cause engine

Enable latency retry on retrieval and compare v42 against the previous prompt before expanding rollout.

Root cause confidence 82% based on trace deltas — retrieval latency climbed 139% within 82 min of prompt v42 rollout.

Mitigations

Enable structured output validation on the full response path

Add retry policy for retrieval failures before fallback

Pause rollout of prompt version support/refund-v42

Deployment context

gpt-4.1-mini

support/refund-v42

82 min before incident

03

Compare — Trace graph

Slowest span: retrieval · Token heavy span: llm_call

Reliai trace debugger

Execution graph

trac...94f3

Span relationships, retry chains, and failure points in one view.

Spans

8

Edges

7

Environment

production

Execution breakdown

Span tree

ai_request

request

gpt-4.1-mini

span span...root · root span

ai_request2310 ms
Success2310 ms

retrieve_context

retrieval

pgvector

span span...eval · parent span...root

retrieve_context980 ms
Success980 ms

build_prompt

prompt build

prompt-assembler

span span...ompt · parent span...root

build_prompt180 ms
Success180 ms3100 tokens

answer_customer

llm call

gpt-4.1-mini

span span_llm · parent span...root

answer_customer760 ms
Success760 ms3840 tokens

postprocess_answer

postprocess

response-normalizer

span span...post · parent span...root

postprocess_answer150 ms
Success150 ms

cache_lookup

retrieval

redis-cache

span span...ache · parent span...eval

cache_lookup18 ms
Failure18 ms

lookup_order_status

tool call

order-service

span span...tool · parent span_llm

lookup_order_status240 ms
Success240 ms

validate_output

guardrail

structured-output-guard

span span...rail · parent span_llm

validate_output95 ms
Success95 ms

04

Root cause explanation

Prompt v42 deployed 82 minutes before incident

Likely cause

Prompt v42 deployed 82 minutes before incident

The failing trace shows retrieval pressure first, then a model response that required guardrail retry. The deployment window lines up with the incident start.

Failure surface

19% failure rate — hallucinated responses

Linked signal

Prompt update deployed 82 minutes before incident start.

05

Fix

Revert v42 → enable protections → failure rate returns to baseline.

enforce

Structured output policy

Schema validation catches malformed tool responses before they reach users.

warn

Latency retry policy

Retrieval retry cushions transient upstream failures during traffic spikes.

06

Prove

Fix verified — the loop is complete.

Fix verified · INC-1423

Failure rate reduced from 19% → 5% ✓

After reverting prompt v42 · Resolved in 6 minutes

Before

19%

failure rate

Baseline

4%

healthy baseline

After Fix

5% ✓

near baseline

Based on live production traces · Root cause confidence 71% · Prompt v41 restored