Skip to content

Guardrails

This example demonstrates Heartbit’s LLM-as-Judge guardrail. A cheap judge model evaluates every agent response against custom safety criteria before it is returned to the user. If the response violates a criterion, the agent is asked to regenerate.

  • ANTHROPIC_API_KEY environment variable set with a valid API key
Terminal window
export ANTHROPIC_API_KEY="sk-..."
cargo run -p heartbit --example guardrails
use std::sync::Arc;
use heartbit::{AgentRunner, AnthropicProvider, BoxedProvider, LlmJudgeGuardrail};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let api_key =
std::env::var("ANTHROPIC_API_KEY").expect("set ANTHROPIC_API_KEY environment variable");
let provider = Arc::new(AnthropicProvider::new(&api_key, "claude-sonnet-4-20250514"));
// Use a cheap model as the safety judge.
let judge = Arc::new(BoxedProvider::new(AnthropicProvider::new(
&api_key,
"claude-haiku-4-5-20251001",
)));
let guardrail = LlmJudgeGuardrail::builder(judge)
.criterion("Response must not contain personal insults")
.criterion("Response must not include made-up statistics")
.build()?;
let agent = AgentRunner::builder(provider)
.name("safe-agent")
.system_prompt("You are a helpful assistant. Be concise and factual.")
.guardrail(Arc::new(guardrail))
.max_turns(3)
.max_tokens(2048)
.build()?;
let output = agent.execute("Tell me about climate change.").await?;
println!("{}", output.result);
Ok(())
}

Separate judge model — The guardrail uses its own LLM provider, typically a cheaper/faster model like Haiku. This keeps guardrail costs low while the main agent uses a more capable model. BoxedProvider wraps the provider into an object-safe type required by the guardrail builder.

Defining criteria — Each .criterion() call adds a natural-language rule the judge evaluates against. You can add as many criteria as needed:

  • “Response must not contain personal insults”
  • “Response must not include made-up statistics”

The judge receives these criteria and the agent’s response, then returns a verdict: SAFE, UNSAFE, or WARN.

Wiring the guardrail.guardrail(Arc::new(guardrail)) attaches the guardrail to the agent. Multiple guardrails can be chained; the first Deny verdict wins.

How it works at runtime:

  1. The agent generates a response
  2. The post_llm hook sends the response to the judge model
  3. If the judge returns UNSAFE, the response is blocked and the agent receives feedback to try again
  4. If the judge returns SAFE or WARN, the response passes through
  5. The guardrail is fail-open: if the judge errors or times out, the response is allowed through (production-safe behavior)

Turn budgetmax_turns(3) gives the agent room to regenerate if a response is denied. A denied response consumes a turn, so set this higher than 1 when using guardrails.

The agent provides a factual response about climate change. If its first attempt contained made-up statistics, the guardrail would deny it and the agent would regenerate. The final output is a guardrail-approved response:

Climate change refers to long-term shifts in global temperatures and weather patterns...