Guardrails & Compliance

A guardrail layer intercepts model outputs before delivery and enforces safety, policy, and compliance rules. Too loose and unsafe content passes. Too strict and valid user requests are blocked. Both directions cause incidents.

Detect
Understand
Fix
Prove
Share

System architecture


What can go wrong

Over-blocking (false positives)

Under-blocking (false negatives)

Policy drift


Detect

Reliai identifies:


Understand

Incident example — over-blocking

Customer-facing chatbot refusal rate spikes. Legitimate queries about account management are being blocked.

Root cause

Policy v9 lowered the toxicity threshold from 0.65 to 0.45 across all content categories. This was intended to apply only to the violence category but was applied globally due to a config error. Financial and account-related queries that previously scored 0.50–0.60 on the toxicity classifier were now being blocked.

Reliai identified via:


Fix


Prove

INC-4102 — Guardrail over-blocking spike
11.4%2.2%
Resolved in 9 minutes · 600 blocked queries · Refusal rate measured across chatbot traffic

Understand — under-blocking variant

Under-blocking incidents are lower-frequency but higher-severity.

Signs:

Root cause pattern: threshold loosened in error, or classifier update changed score distribution downward.


Key takeaway

Guardrail failures are policy tuning issues, not model failures.

The model is doing what it was told. The guardrail is blocking or passing based on a threshold that may no longer be correct. Tracking policy config changes alongside model changes is the only way to isolate the cause.