Skip to content

Context Management

As agents run multi-turn conversations, the context window fills up. Heartbit provides strategies to manage context size, detect stuck loops, and recover from overflows.

Set a strategy on the agent builder or via config:

No trimming. The full conversation history is sent to the LLM every turn. Simple and predictable, but will eventually hit context limits on long conversations.

Keeps the system prompt plus the most recent messages within a token budget. Older messages are dropped:

[[agents]]
name = "assistant"
context_strategy = { type = "sliding_window", max_tokens = 100000 }

When context exceeds a threshold, the LLM generates a summary of the conversation so far. The summary replaces the older messages:

[[agents]]
name = "assistant"
context_strategy = { type = "summarize", max_tokens = 100000 }

When a ContextOverflow error occurs (the LLM rejects the request because the context is too large), Heartbit automatically:

  1. Summarizes the conversation using the LLM
  2. Injects the summary at position 4 in the message history (preserving the most recent context)
  3. Retries the LLM call

A maximum of 1 compaction per consecutive turn pair prevents infinite compaction loops.

Before compaction, a pre-compaction flush extracts tool results to episodic memory (when memory is enabled) so information isn’t permanently lost.

The DoomLoopTracker detects when an agent is stuck repeating the same actions. It hashes the entire tool-call batch on each turn and tracks consecutive identical batches.

When the count exceeds max_identical_tool_calls, the agent is stopped with an error.

[[agents]]
name = "assistant"
max_identical_tool_calls = 3 # Stop after 3 identical consecutive tool batches

Or via the builder:

let agent = AgentRunner::builder(provider)
.max_identical_tool_calls(3)
.build()?;

Setting this to 0 is rejected at build time. Leave unset (default: None) to disable doom loop detection.

SessionPruneConfig automatically trims old tool results from the conversation before each LLM call. This reduces input tokens without losing the conversation flow:

[[agents]]
name = "assistant"
session_prune = { keep_recent_n = 2, pruned_tool_result_max_bytes = 200, preserve_task = true }
  • keep_recent_n (default: 2) — number of recent message pairs kept at full fidelity
  • pruned_tool_result_max_bytes (default: 200) — tool results exceeding this are replaced with head + tail + [pruned: N bytes]
  • preserve_task (default: true) — keeps the first user message (the original task) from pruning

When the LLM generates a tool call with a misspelled name, Heartbit attempts automatic repair using Levenshtein distance. If a registered tool name is within edit distance 2 of the requested name, the call is redirected to the correct tool with a warning.

This handles common LLM mistakes like bash_command instead of bash without failing the turn.

Control token usage at multiple levels:

SettingScopeDescription
max_tokensPer LLM callMaximum tokens in each LLM response
max_turnsPer agent runMaximum reasoning turns before stopping
Context strategyPer agentOverall context window management

The agent loop tracks cumulative TokenUsage (input, output, cache creation, cache read) across all turns, available in AgentOutput::tokens_used.