Skip to content

Custom Guardrails

Guardrails intercept agent execution at four points, enabling safety checks, content filtering, PII redaction, and policy enforcement.

use heartbit::{
CompletionRequest, CompletionResponse, Error, GuardAction, ToolCall, ToolOutput,
};
use std::pin::Pin;
use std::future::Future;
pub trait Guardrail: Send + Sync {
/// Called before each LLM call. Can mutate the request.
fn pre_llm(
&self,
_request: &mut CompletionRequest,
) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + '_>> {
Box::pin(async { Ok(()) })
}
/// Called after each LLM response. Can deny the response.
fn post_llm(
&self,
_response: &CompletionResponse,
) -> Pin<Box<dyn Future<Output = Result<GuardAction, Error>> + Send + '_>> {
Box::pin(async { Ok(GuardAction::Allow) })
}
/// Called before each tool execution. Can deny individual tool calls.
fn pre_tool(
&self,
_call: &ToolCall,
) -> Pin<Box<dyn Future<Output = Result<GuardAction, Error>> + Send + '_>> {
Box::pin(async { Ok(GuardAction::Allow) })
}
/// Called after each tool execution. Can mutate the output.
fn post_tool(
&self,
_call: &ToolCall,
_output: &mut ToolOutput,
) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + '_>> {
Box::pin(async { Ok(()) })
}
}

All four hooks have default no-op implementations. Override only the hooks you need.

The post_llm and pre_tool hooks return a GuardAction:

VariantEffect
GuardAction::AllowOperation proceeds normally
GuardAction::Deny { reason }Operation is blocked. For post_llm, the response is discarded and the denial reason is injected as a user message (consumes a turn). For pre_tool, the tool receives an error result.
GuardAction::Warn { reason }Operation proceeds but emits AgentEvent::GuardrailWarned and an audit record. Useful for shadow enforcement / monitoring mode.

Returning Err from any hook aborts the entire agent run.

  1. pre_llm — before sending messages to the LLM. Use for injecting safety instructions or redacting sensitive content from the request.
  2. post_llm — after receiving the LLM response. Use for content filtering, toxicity detection, or policy checks.
  3. pre_tool — before executing each tool call. Use for blocking dangerous operations (e.g., destructive bash commands).
  4. post_tool — after tool execution. Use for redacting PII from tool outputs before they enter the conversation.

Here is a guardrail that blocks bash commands containing rm -rf and redacts email addresses from tool outputs:

use heartbit::{Guardrail, GuardAction, GuardrailMeta, ToolCall, ToolOutput, Error};
use std::pin::Pin;
use std::future::Future;
pub struct SafetyGuardrail;
impl GuardrailMeta for SafetyGuardrail {
fn name(&self) -> &str {
"safety"
}
}
impl Guardrail for SafetyGuardrail {
fn pre_tool(
&self,
call: &ToolCall,
) -> Pin<Box<dyn Future<Output = Result<GuardAction, Error>> + Send + '_>> {
let name = call.name.clone();
let input = call.input.clone();
Box::pin(async move {
if name == "bash" {
if let Some(cmd) = input["command"].as_str() {
if cmd.contains("rm -rf") {
return Ok(GuardAction::Deny {
reason: "Destructive rm -rf commands are not allowed".into(),
});
}
}
}
Ok(GuardAction::Allow)
})
}
fn post_tool(
&self,
_call: &ToolCall,
output: &mut ToolOutput,
) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + '_>> {
// Redact email addresses from tool outputs
let email_re = regex::Regex::new(r"[\w.+-]+@[\w-]+\.[\w.]+").unwrap();
output.content = email_re.replace_all(&output.content, "[REDACTED]").into_owned();
Box::pin(async { Ok(()) })
}
}

Pass guardrails as Vec<Arc<dyn Guardrail>> to an agent builder:

use std::sync::Arc;
use heartbit::AgentRunner;
let agent = AgentRunner::builder(provider)
.guardrails(vec![Arc::new(SafetyGuardrail)])
.build()?;

Multiple guardrails run in order. The first Deny wins — subsequent guardrails are not checked for that hook invocation.

Heartbit includes a built-in LlmJudgeGuardrail that uses a separate (typically cheaper) LLM to evaluate agent outputs against configurable criteria:

use heartbit::LlmJudgeGuardrail;
let judge = LlmJudgeGuardrail::builder(judge_provider)
.criterion("Response must not contain harmful content")
.criterion("Response must be factually grounded")
.timeout(std::time::Duration::from_secs(5))
.build()?;

The judge guardrail fails open on timeout or judge errors, making it production-safe.