Skip to content

Input Guardrails

The Input Guardrails module runs 7 parallel safety checks on user input before it reaches the LLM. Each check returns a decision; the engine aggregates to the most restrictive.

Checks

Check Detects Decision on Fail
Prompt Injection Instruction override attempts block
Jailbreak DAN mode, role-play exploits block
Toxicity Violent, abusive, harmful content block
PII Detection SSN, email, phone, credit card, IP redact
Secret Detection AWS keys, GitHub tokens, JWTs, private keys block
Restricted Topics Illegal activities, exploitation block
Data Exfiltration Training data extraction, system prompt leaks block

Decision Priority

block > escalate > safe-complete-only > redact > allow

Endpoint

POST /v1/guardrails/evaluate-input

Source

src/agentguard/input_guardrails/

Extending

To add a new check, see Adding a Check.