Skip to content

Agent Loop

Every Heartbit agent runs a ReAct (Reason-Act) loop: the LLM generates a response, optionally calls tools, receives results, and repeats until the task is complete or a limit is reached.

User message
|
v
+--------+ +------------+ +--------------+
| LLM | --> | Tool Calls | --> | Tool Results |
+--------+ +------------+ +--------------+
^ |
+-------------------------------------+
(repeat until done)

Each iteration is a turn. On each turn:

  1. The full conversation history (system prompt + messages) is sent to the LLM.
  2. The LLM responds with text, tool calls, or both.
  3. If tool calls are present, they execute in parallel via tokio::JoinSet.
  4. Tool results are appended to the conversation as a new message.
  5. The loop continues with the next LLM call.

The loop ends when the LLM produces a response with no tool calls (stop reason: EndTurn), the turn limit is reached (MaxTurns), or the token budget is exhausted (MaxTokens).

AgentRunner<P> is the core type that implements the agent loop. It is generic over a provider P: LlmProvider.

use std::sync::Arc;
use heartbit::{
AnthropicProvider, BoxedProvider, RetryingProvider,
AgentRunner,
};
let provider = Arc::new(BoxedProvider::new(
RetryingProvider::with_defaults(
AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
)
));
let mut agent = AgentRunner::builder(provider)
.system_prompt("You are a helpful assistant.")
.on_text(Arc::new(|text| print!("{text}")))
.build()?;
let output = agent.execute("Analyze the Rust ecosystem").await?;
println!("\nTokens: {} in / {} out",
output.tokens_used.input_tokens,
output.tokens_used.output_tokens);

AgentRunner::builder(provider) returns an AgentRunnerBuilder with these options:

MethodDescription
.system_prompt(s)Set the system prompt
.tools(vec)Register Vec<Arc<dyn Tool>>
.max_turns(n)Maximum turns before stopping (default: 10)
.max_tokens(n)Max tokens per LLM response
.max_total_tokens(n)Total token budget across all turns
.guardrails(vec)Attach guardrails to the loop
.memory(m)Enable memory system
.context_strategy(s)Set context management strategy
.on_text(cb)Streaming text callback
.on_event(cb)Structured event callback
.on_approval(cb)Human-in-the-loop approval callback
.on_input(cb)Multi-turn input callback (for chat mode)
.structured_schema(s)Force structured JSON output

Every execute() call returns an AgentOutput:

pub struct AgentOutput {
pub result: String, // Final text response
pub tool_calls_made: usize, // Total tool invocations
pub tokens_used: TokenUsage, // Input/output/cache token counts
pub structured: Option<Value>, // Structured output (if schema set)
pub estimated_cost_usd: Option<f64>, // Estimated USD cost (known models)
}

The loop ends when the LLM returns StopReason::EndTurn (natural completion), StopReason::MaxTokens (token limit), or when max_turns is exceeded (returns Error::MaxTurnsExceeded).

TokenUsage accumulates across all turns in a run:

pub struct TokenUsage {
pub input_tokens: u32,
pub output_tokens: u32,
pub cache_creation_input_tokens: u32,
pub cache_read_input_tokens: u32,
pub reasoning_tokens: u32,
}

Cost estimation is available via estimate_cost(model, usage) which returns USD cost for known Claude models, accounting for cache read/write rates.

The on_text callback receives text deltas as they arrive from the LLM’s SSE stream. This provides real-time output without waiting for the full response:

let mut agent = AgentRunner::builder(provider)
.on_text(Arc::new(|text| print!("{text}")))
.build()?;

The on_event callback receives structured AgentEvent variants at key points in the loop:

  • RunStarted / RunCompleted / RunFailed — lifecycle boundaries
  • TurnStarted / LlmResponse — per-turn progress
  • ToolCallStarted / ToolCallCompleted — tool execution tracking
  • ApprovalRequested / ApprovalDecision — HITL interactions
  • ContextSummarized — context management actions
  • GuardrailDenied — guardrail interventions

Use --verbose in the CLI to emit events as JSON to stderr.