Reliai Demo
Explore a realistic AI reliability workflow in under two minutes.
Simulated failure · INC-1423
Hallucination spike detected
AI Support Copilot · Production · Mar 11, 10:22 AM
Failure rate hit 19% — vs 4% baseline. Reliai detected the regression, opened the incident, and identified the fix before users noticed.
Demo scenario
This is INC-1423 — the same incident from the homepage, live in the product.
AI Support Copilot · Production · Failure rate 19% vs 4% baseline.
Follow the loop: Detect → Understand → Compare → Root Cause → Fix → Prove.
01
Understand — System health
Start at the control panel.
Reliai system status page
AI reliability control panel
AI Support Copilot
Default status page for this AI system. It answers what is happening, whether it is safe, and where an operator should click next.
Is this system safe right now?
Answer: MAYBE
This AI system needs review before the next change.
The system is stable enough to operate, but current signals show elevated reliability risk.
System Health
Reliability score
92
Active incidents
1
Guardrails protecting
17
Traffic
Traces analyzed (24h)
2.3M
Throughput
27
traces/sec · 1m avg
Active services
6
System status
What needs attention next
Latest deployment
Today
Risk score 0.24
Incident pressure
1 incidents / 24h
Latest: Hallucination spike after retriever prompt rollout
Guardrail pressure
17 triggers / 24h
Top policy: structured_output
Deployment risk
Safety before the next rollout
Guardrail activity
Runtime protection coverage
Policy compliance
Organization guardrail coverage
structured output
Mode: enforce
98.0%
cost budget
Mode: warn
96.0%
latency retry
Mode: enforce
94.0%
Recommended next step
Operator guidance
Add hallucination guardrail to retriever prompt
structured_output -> Add hallucination guardrail to retriever prompt. for gpt-4.1
Add retry policy for retrieval failures
latency_retry -> Retry retrieval failures once before fallback
02
Incident
19% failure rate — hallucinated responses
Reliai incident command center
Root cause
Prompt rollout changed retrieval behavior
Confidence: 82%
Enable latency retry on retrieval and compare v42 against the previous prompt before expanding rollout.
Root cause confidence 82% based on trace deltas — retrieval latency climbed 139% within 82 min of prompt v42 rollout.
- • prompt version: support/refund-v42 (was v41)
- • retrieval p95 ms: 980
- • chunk count: 6 chunks (was 4)
- • elapsed since deploy min: 82
- • affected trace sample: 128
Trace evidence
failing trace
2310ms end-to-end latency
980ms retrieval latency
3840 tokens
support/refund-v42
baseline trace
1480ms end-to-end latency
410ms retrieval latency
3400 tokens
support/refund-v41
Impact
- • current value: 980
- • baseline value: 410
- • delta: 139.0%
Action
confidence: high · source: root-cause engine
Enable latency retry on retrieval and compare v42 against the previous prompt before expanding rollout.
Mitigations
Enable structured output validation on the full response path
Add retry policy for retrieval failures before fallback
Pause rollout of prompt version support/refund-v42
Deployment context
gpt-4.1-mini
support/refund-v42
82 min before incident
03
Compare — Trace graph
Slowest span: retrieval · Token heavy span: llm_call
Reliai trace debugger
Execution graph
trac...94f3
Span relationships, retry chains, and failure points in one view.
Spans
8
Edges
7
Environment
production
Execution breakdown
Span tree
ai_request
gpt-4.1-mini
span span...root · root span
retrieve_context
pgvector
span span...eval · parent span...root
build_prompt
prompt-assembler
span span...ompt · parent span...root
answer_customer
gpt-4.1-mini
span span_llm · parent span...root
postprocess_answer
response-normalizer
span span...post · parent span...root
cache_lookup
redis-cache
span span...ache · parent span...eval
lookup_order_status
order-service
span span...tool · parent span_llm
validate_output
structured-output-guard
span span...rail · parent span_llm
04
Root cause explanation
Prompt v42 deployed 82 minutes before incident
Likely cause
Prompt v42 deployed 82 minutes before incident
The failing trace shows retrieval pressure first, then a model response that required guardrail retry. The deployment window lines up with the incident start.
Failure surface
19% failure rate — hallucinated responses
Linked signal
Prompt update deployed 82 minutes before incident start.
05
Fix
Revert v42 → enable protections → failure rate returns to baseline.
enforce
Structured output policy
Schema validation catches malformed tool responses before they reach users.
warn
Latency retry policy
Retrieval retry cushions transient upstream failures during traffic spikes.
06
Prove
Fix verified — the loop is complete.
Fix verified · INC-1423
Failure rate reduced from 19% → 5% ✓
After reverting prompt v42 · Resolved in 6 minutes
Before
19%
failure rate
Baseline
4%
healthy baseline
After Fix
5% ✓
near baseline
Based on live production traces · Root cause confidence 71% · Prompt v41 restored