Custom Guardrails
Guardrails intercept agent execution at four points, enabling safety checks, content filtering, PII redaction, and policy enforcement.
The Guardrail Trait
Section titled “The Guardrail Trait”use heartbit::{ CompletionRequest, CompletionResponse, Error, GuardAction, ToolCall, ToolOutput,};use std::pin::Pin;use std::future::Future;
pub trait Guardrail: Send + Sync { /// Called before each LLM call. Can mutate the request. fn pre_llm( &self, _request: &mut CompletionRequest, ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + '_>> { Box::pin(async { Ok(()) }) }
/// Called after each LLM response. Can deny the response. fn post_llm( &self, _response: &CompletionResponse, ) -> Pin<Box<dyn Future<Output = Result<GuardAction, Error>> + Send + '_>> { Box::pin(async { Ok(GuardAction::Allow) }) }
/// Called before each tool execution. Can deny individual tool calls. fn pre_tool( &self, _call: &ToolCall, ) -> Pin<Box<dyn Future<Output = Result<GuardAction, Error>> + Send + '_>> { Box::pin(async { Ok(GuardAction::Allow) }) }
/// Called after each tool execution. Can mutate the output. fn post_tool( &self, _call: &ToolCall, _output: &mut ToolOutput, ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + '_>> { Box::pin(async { Ok(()) }) }}All four hooks have default no-op implementations. Override only the hooks you need.
GuardAction
Section titled “GuardAction”The post_llm and pre_tool hooks return a GuardAction:
| Variant | Effect |
|---|---|
GuardAction::Allow | Operation proceeds normally |
GuardAction::Deny { reason } | Operation is blocked. For post_llm, the response is discarded and the denial reason is injected as a user message (consumes a turn). For pre_tool, the tool receives an error result. |
GuardAction::Warn { reason } | Operation proceeds but emits AgentEvent::GuardrailWarned and an audit record. Useful for shadow enforcement / monitoring mode. |
Returning Err from any hook aborts the entire agent run.
When Each Hook Fires
Section titled “When Each Hook Fires”pre_llm— before sending messages to the LLM. Use for injecting safety instructions or redacting sensitive content from the request.post_llm— after receiving the LLM response. Use for content filtering, toxicity detection, or policy checks.pre_tool— before executing each tool call. Use for blocking dangerous operations (e.g., destructive bash commands).post_tool— after tool execution. Use for redacting PII from tool outputs before they enter the conversation.
Complete Example
Section titled “Complete Example”Here is a guardrail that blocks bash commands containing rm -rf and redacts email addresses from tool outputs:
use heartbit::{Guardrail, GuardAction, GuardrailMeta, ToolCall, ToolOutput, Error};use std::pin::Pin;use std::future::Future;
pub struct SafetyGuardrail;
impl GuardrailMeta for SafetyGuardrail { fn name(&self) -> &str { "safety" }}
impl Guardrail for SafetyGuardrail { fn pre_tool( &self, call: &ToolCall, ) -> Pin<Box<dyn Future<Output = Result<GuardAction, Error>> + Send + '_>> { let name = call.name.clone(); let input = call.input.clone(); Box::pin(async move { if name == "bash" { if let Some(cmd) = input["command"].as_str() { if cmd.contains("rm -rf") { return Ok(GuardAction::Deny { reason: "Destructive rm -rf commands are not allowed".into(), }); } } } Ok(GuardAction::Allow) }) }
fn post_tool( &self, _call: &ToolCall, output: &mut ToolOutput, ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + '_>> { // Redact email addresses from tool outputs let email_re = regex::Regex::new(r"[\w.+-]+@[\w-]+\.[\w.]+").unwrap(); output.content = email_re.replace_all(&output.content, "[REDACTED]").into_owned(); Box::pin(async { Ok(()) }) }}Registering Guardrails
Section titled “Registering Guardrails”Pass guardrails as Vec<Arc<dyn Guardrail>> to an agent builder:
use std::sync::Arc;use heartbit::AgentRunner;
let agent = AgentRunner::builder(provider) .guardrails(vec![Arc::new(SafetyGuardrail)]) .build()?;Multiple guardrails run in order. The first Deny wins — subsequent guardrails are not checked for that hook invocation.
LLM-as-Judge Guardrail
Section titled “LLM-as-Judge Guardrail”Heartbit includes a built-in LlmJudgeGuardrail that uses a separate (typically cheaper) LLM to evaluate agent outputs against configurable criteria:
use heartbit::LlmJudgeGuardrail;
let judge = LlmJudgeGuardrail::builder(judge_provider) .criterion("Response must not contain harmful content") .criterion("Response must be factually grounded") .timeout(std::time::Duration::from_secs(5)) .build()?;The judge guardrail fails open on timeout or judge errors, making it production-safe.