Guardrails

Guardrails intercept the agent loop at key points to enforce safety policies, content restrictions, and tool usage rules. They run in the standalone execution path.

The Guardrail Trait

The Guardrail trait provides four async hooks:

Hook	When it runs	What it receives
`pre_llm`	Before each LLM call	`&mut CompletionRequest` (mutable)
`post_llm`	After each LLM response	`&CompletionResponse`
`pre_tool`	Before each tool execution	`&ToolCall` (name and input)
`post_tool`	After each tool execution	`&ToolCall` and `&mut ToolOutput`

post_llm and pre_tool return a GuardAction:

Allow — continue execution
Deny { reason } — block the action with a reason message
Warn { reason } — allow but emit AgentEvent::GuardrailWarned

pre_llm returns Result<(), Error> — it can mutate the request but cannot deny selectively.

post_tool returns Result<(), Error> — it can mutate the output directly but cannot deny.

Multiple guardrails are evaluated in order. First Deny wins — if any guardrail denies, the action is blocked.

When post_llm denies a response, a synthetic assistant placeholder is inserted before the denial feedback to maintain alternating message roles.

Registering Guardrails

use std::sync::Arc;

let agent = AgentRunner::builder(provider)
    .guardrails(vec![
        Arc::new(content_fence),
        Arc::new(tool_policy),
    ])
    .build()?;

Built-in Guardrails

ContentFenceGuardrail

Blocks LLM responses containing forbidden content patterns. Configure with regex patterns or keyword lists.

InjectionClassifierGuardrail

Detects prompt injection attempts in user input and tool outputs. Helps protect against indirect prompt injection via tool results.

PiiGuardrail

Identifies and blocks personally identifiable information (PII) from appearing in agent responses.

ToolPolicyGuardrail

Enforces per-tool access policies. Define which tools are allowed, denied, or require approval:

use heartbit::{ToolPolicyGuardrail, ToolRule, GuardAction};

let policy = ToolPolicyGuardrail::new(
    vec![
        ToolRule { tool_pattern: "bash".into(), action: GuardAction::Deny { reason: "Blocked".into() }, input_constraints: vec![] },
        ToolRule { tool_pattern: "read".into(), action: GuardAction::Allow, input_constraints: vec![] },
    ],
    GuardAction::Allow,  // default action for unmatched tools
);

Rules are evaluated in order — first match wins. If no rule matches, the default_action is used.

LlmJudgeGuardrail

Uses a separate (typically cheaper) LLM to evaluate agent responses against safety criteria:

use heartbit::LlmJudgeGuardrail;

let judge = LlmJudgeGuardrail::builder(judge_provider)
    .criterion("Response must not contain harmful instructions")
    .criterion("Response must stay on topic")
    .timeout(Duration::from_secs(10))
    .evaluate_tool_inputs(true)  // Also check tool inputs
    .build()?;

The judge returns a verdict:

VERDICT: SAFE — allow the response
VERDICT: UNSAFE: reason — deny with explanation
VERDICT: WARN: reason — allow but flag

The judge fails open — if the judge times out or errors, the response is allowed. This ensures production reliability.

Builder options:

Method	Description
`.criterion(s)`	Add a safety criterion
`.criteria(vec)`	Add multiple criteria
`.timeout(d)`	Judge evaluation timeout
`.evaluate_tool_inputs(bool)`	Also run `pre_tool` hook
`.max_judge_tokens(n)`	Token limit for judge response
`.system_prompt(s)`	Custom system prompt for judge

SensorSecurityGuardrail

Requires the sensor feature flag.

Validates data from sensor pipeline inputs, checking for malicious payloads in RSS feeds, webhooks, and other external sources.

ConditionalGuardrail

Wraps another guardrail with a condition function. The inner guardrail only runs when the condition evaluates to true.

GuardrailChain

Combines multiple guardrails into a single unit. Useful for grouping related guardrails that should be applied together.