Input Guardrails¶

The Input Guardrails module runs 7 parallel safety checks on user input before it reaches the LLM. Each check returns a decision; the engine aggregates to the most restrictive.

Checks¶

Check	Detects	Decision on Fail
Prompt Injection	Instruction override attempts	`block`
Jailbreak	DAN mode, role-play exploits	`block`
Toxicity	Violent, abusive, harmful content	`block`
PII Detection	SSN, email, phone, credit card, IP	`redact`
Secret Detection	AWS keys, GitHub tokens, JWTs, private keys	`block`
Restricted Topics	Illegal activities, exploitation	`block`
Data Exfiltration	Training data extraction, system prompt leaks	`block`

Decision Priority¶

block > escalate > safe-complete-only > redact > allow

Endpoint¶

POST /v1/guardrails/evaluate-input

Source¶

src/agentguard/input_guardrails/

Extending¶

To add a new check, see Adding a Check.