Custom Guardrails

Guardrails intercept agent execution at four points, enabling safety checks, content filtering, PII redaction, and policy enforcement.

The Guardrail Trait

use heartbit::{
    CompletionRequest, CompletionResponse, Error, GuardAction, ToolCall, ToolOutput,
};
use std::pin::Pin;
use std::future::Future;

pub trait Guardrail: Send + Sync {
    /// Called before each LLM call. Can mutate the request.
    fn pre_llm(
        &self,
        _request: &mut CompletionRequest,
    ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + '_>> {
        Box::pin(async { Ok(()) })
    }

    /// Called after each LLM response. Can deny the response.
    fn post_llm(
        &self,
        _response: &CompletionResponse,
    ) -> Pin<Box<dyn Future<Output = Result<GuardAction, Error>> + Send + '_>> {
        Box::pin(async { Ok(GuardAction::Allow) })
    }

    /// Called before each tool execution. Can deny individual tool calls.
    fn pre_tool(
        &self,
        _call: &ToolCall,
    ) -> Pin<Box<dyn Future<Output = Result<GuardAction, Error>> + Send + '_>> {
        Box::pin(async { Ok(GuardAction::Allow) })
    }

    /// Called after each tool execution. Can mutate the output.
    fn post_tool(
        &self,
        _call: &ToolCall,
        _output: &mut ToolOutput,
    ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + '_>> {
        Box::pin(async { Ok(()) })
    }
}

All four hooks have default no-op implementations. Override only the hooks you need.

GuardAction

The post_llm and pre_tool hooks return a GuardAction:

Variant	Effect
`GuardAction::Allow`	Operation proceeds normally
`GuardAction::Deny { reason }`	Operation is blocked. For `post_llm`, the response is discarded and the denial reason is injected as a user message (consumes a turn). For `pre_tool`, the tool receives an error result.
`GuardAction::Warn { reason }`	Operation proceeds but emits `AgentEvent::GuardrailWarned` and an audit record. Useful for shadow enforcement / monitoring mode.

Returning Err from any hook aborts the entire agent run.

When Each Hook Fires

pre_llm — before sending messages to the LLM. Use for injecting safety instructions or redacting sensitive content from the request.
post_llm — after receiving the LLM response. Use for content filtering, toxicity detection, or policy checks.
pre_tool — before executing each tool call. Use for blocking dangerous operations (e.g., destructive bash commands).
post_tool — after tool execution. Use for redacting PII from tool outputs before they enter the conversation.

Complete Example

Here is a guardrail that blocks bash commands containing rm -rf and redacts email addresses from tool outputs:

use heartbit::{Guardrail, GuardAction, GuardrailMeta, ToolCall, ToolOutput, Error};
use std::pin::Pin;
use std::future::Future;

pub struct SafetyGuardrail;

impl GuardrailMeta for SafetyGuardrail {
    fn name(&self) -> &str {
        "safety"
    }
}

impl Guardrail for SafetyGuardrail {
    fn pre_tool(
        &self,
        call: &ToolCall,
    ) -> Pin<Box<dyn Future<Output = Result<GuardAction, Error>> + Send + '_>> {
        let name = call.name.clone();
        let input = call.input.clone();
        Box::pin(async move {
            if name == "bash" {
                if let Some(cmd) = input["command"].as_str() {
                    if cmd.contains("rm -rf") {
                        return Ok(GuardAction::Deny {
                            reason: "Destructive rm -rf commands are not allowed".into(),
                        });
                    }
                }
            }
            Ok(GuardAction::Allow)
        })
    }

    fn post_tool(
        &self,
        _call: &ToolCall,
        output: &mut ToolOutput,
    ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + '_>> {
        // Redact email addresses from tool outputs
        let email_re = regex::Regex::new(r"[\w.+-]+@[\w-]+\.[\w.]+").unwrap();
        output.content = email_re.replace_all(&output.content, "[REDACTED]").into_owned();
        Box::pin(async { Ok(()) })
    }
}

Registering Guardrails

Pass guardrails as Vec<Arc<dyn Guardrail>> to an agent builder:

use std::sync::Arc;
use heartbit::AgentRunner;

let agent = AgentRunner::builder(provider)
    .guardrails(vec![Arc::new(SafetyGuardrail)])
    .build()?;

Multiple guardrails run in order. The first Deny wins — subsequent guardrails are not checked for that hook invocation.

LLM-as-Judge Guardrail

Heartbit includes a built-in LlmJudgeGuardrail that uses a separate (typically cheaper) LLM to evaluate agent outputs against configurable criteria:

use heartbit::LlmJudgeGuardrail;

let judge = LlmJudgeGuardrail::builder(judge_provider)
    .criterion("Response must not contain harmful content")
    .criterion("Response must be factually grounded")
    .timeout(std::time::Duration::from_secs(5))
    .build()?;

The judge guardrail fails open on timeout or judge errors, making it production-safe.