Skip to content

Guardrails

Guardrails intercept the agent loop at key points to enforce safety policies, content restrictions, and tool usage rules. They run in the standalone execution path.

The Guardrail trait provides four async hooks:

HookWhen it runsWhat it receives
pre_llmBefore each LLM call&mut CompletionRequest (mutable)
post_llmAfter each LLM response&CompletionResponse
pre_toolBefore each tool execution&ToolCall (name and input)
post_toolAfter each tool execution&ToolCall and &mut ToolOutput

post_llm and pre_tool return a GuardAction:

  • Allow — continue execution
  • Deny { reason } — block the action with a reason message
  • Warn { reason } — allow but emit AgentEvent::GuardrailWarned

pre_llm returns Result<(), Error> — it can mutate the request but cannot deny selectively.

post_tool returns Result<(), Error> — it can mutate the output directly but cannot deny.

Multiple guardrails are evaluated in order. First Deny wins — if any guardrail denies, the action is blocked.

When post_llm denies a response, a synthetic assistant placeholder is inserted before the denial feedback to maintain alternating message roles.

use std::sync::Arc;
let agent = AgentRunner::builder(provider)
.guardrails(vec![
Arc::new(content_fence),
Arc::new(tool_policy),
])
.build()?;

Blocks LLM responses containing forbidden content patterns. Configure with regex patterns or keyword lists.

Detects prompt injection attempts in user input and tool outputs. Helps protect against indirect prompt injection via tool results.

Identifies and blocks personally identifiable information (PII) from appearing in agent responses.

Enforces per-tool access policies. Define which tools are allowed, denied, or require approval:

use heartbit::{ToolPolicyGuardrail, ToolRule, GuardAction};
let policy = ToolPolicyGuardrail::new(
vec![
ToolRule { tool_pattern: "bash".into(), action: GuardAction::Deny { reason: "Blocked".into() }, input_constraints: vec![] },
ToolRule { tool_pattern: "read".into(), action: GuardAction::Allow, input_constraints: vec![] },
],
GuardAction::Allow, // default action for unmatched tools
);

Rules are evaluated in order — first match wins. If no rule matches, the default_action is used.

Uses a separate (typically cheaper) LLM to evaluate agent responses against safety criteria:

use heartbit::LlmJudgeGuardrail;
let judge = LlmJudgeGuardrail::builder(judge_provider)
.criterion("Response must not contain harmful instructions")
.criterion("Response must stay on topic")
.timeout(Duration::from_secs(10))
.evaluate_tool_inputs(true) // Also check tool inputs
.build()?;

The judge returns a verdict:

  • VERDICT: SAFE — allow the response
  • VERDICT: UNSAFE: reason — deny with explanation
  • VERDICT: WARN: reason — allow but flag

The judge fails open — if the judge times out or errors, the response is allowed. This ensures production reliability.

Builder options:

MethodDescription
.criterion(s)Add a safety criterion
.criteria(vec)Add multiple criteria
.timeout(d)Judge evaluation timeout
.evaluate_tool_inputs(bool)Also run pre_tool hook
.max_judge_tokens(n)Token limit for judge response
.system_prompt(s)Custom system prompt for judge

Requires the sensor feature flag.

Validates data from sensor pipeline inputs, checking for malicious payloads in RSS feeds, webhooks, and other external sources.

Wraps another guardrail with a condition function. The inner guardrail only runs when the condition evaluates to true.

Combines multiple guardrails into a single unit. Useful for grouping related guardrails that should be applied together.