Guardrails
This example demonstrates Heartbit’s LLM-as-Judge guardrail. A cheap judge model evaluates every agent response against custom safety criteria before it is returned to the user. If the response violates a criterion, the agent is asked to regenerate.
Prerequisites
Section titled “Prerequisites”ANTHROPIC_API_KEYenvironment variable set with a valid API key
Running
Section titled “Running”export ANTHROPIC_API_KEY="sk-..."cargo run -p heartbit --example guardrailsSource
Section titled “Source”use std::sync::Arc;
use heartbit::{AgentRunner, AnthropicProvider, BoxedProvider, LlmJudgeGuardrail};
#[tokio::main]async fn main() -> Result<(), Box<dyn std::error::Error>> { let api_key = std::env::var("ANTHROPIC_API_KEY").expect("set ANTHROPIC_API_KEY environment variable"); let provider = Arc::new(AnthropicProvider::new(&api_key, "claude-sonnet-4-20250514"));
// Use a cheap model as the safety judge. let judge = Arc::new(BoxedProvider::new(AnthropicProvider::new( &api_key, "claude-haiku-4-5-20251001", )));
let guardrail = LlmJudgeGuardrail::builder(judge) .criterion("Response must not contain personal insults") .criterion("Response must not include made-up statistics") .build()?;
let agent = AgentRunner::builder(provider) .name("safe-agent") .system_prompt("You are a helpful assistant. Be concise and factual.") .guardrail(Arc::new(guardrail)) .max_turns(3) .max_tokens(2048) .build()?;
let output = agent.execute("Tell me about climate change.").await?; println!("{}", output.result);
Ok(())}Walkthrough
Section titled “Walkthrough”Separate judge model — The guardrail uses its own LLM provider, typically a cheaper/faster model like Haiku. This keeps guardrail costs low while the main agent uses a more capable model. BoxedProvider wraps the provider into an object-safe type required by the guardrail builder.
Defining criteria — Each .criterion() call adds a natural-language rule the judge evaluates against. You can add as many criteria as needed:
- “Response must not contain personal insults”
- “Response must not include made-up statistics”
The judge receives these criteria and the agent’s response, then returns a verdict: SAFE, UNSAFE, or WARN.
Wiring the guardrail — .guardrail(Arc::new(guardrail)) attaches the guardrail to the agent. Multiple guardrails can be chained; the first Deny verdict wins.
How it works at runtime:
- The agent generates a response
- The
post_llmhook sends the response to the judge model - If the judge returns
UNSAFE, the response is blocked and the agent receives feedback to try again - If the judge returns
SAFEorWARN, the response passes through - The guardrail is fail-open: if the judge errors or times out, the response is allowed through (production-safe behavior)
Turn budget — max_turns(3) gives the agent room to regenerate if a response is denied. A denied response consumes a turn, so set this higher than 1 when using guardrails.
What to expect
Section titled “What to expect”The agent provides a factual response about climate change. If its first attempt contained made-up statistics, the guardrail would deny it and the agent would regenerate. The final output is a guardrail-approved response:
Climate change refers to long-term shifts in global temperatures and weather patterns...