Context Management
As agents run multi-turn conversations, the context window fills up. Heartbit provides strategies to manage context size, detect stuck loops, and recover from overflows.
Context Strategies
Section titled “Context Strategies”Set a strategy on the agent builder or via config:
Unlimited (Default)
Section titled “Unlimited (Default)”No trimming. The full conversation history is sent to the LLM every turn. Simple and predictable, but will eventually hit context limits on long conversations.
SlidingWindow
Section titled “SlidingWindow”Keeps the system prompt plus the most recent messages within a token budget. Older messages are dropped:
[[agents]]name = "assistant"context_strategy = { type = "sliding_window", max_tokens = 100000 }Summarize
Section titled “Summarize”When context exceeds a threshold, the LLM generates a summary of the conversation so far. The summary replaces the older messages:
[[agents]]name = "assistant"context_strategy = { type = "summarize", max_tokens = 100000 }Auto-Compaction
Section titled “Auto-Compaction”When a ContextOverflow error occurs (the LLM rejects the request because the context is too large), Heartbit automatically:
- Summarizes the conversation using the LLM
- Injects the summary at position 4 in the message history (preserving the most recent context)
- Retries the LLM call
A maximum of 1 compaction per consecutive turn pair prevents infinite compaction loops.
Before compaction, a pre-compaction flush extracts tool results to episodic memory (when memory is enabled) so information isn’t permanently lost.
Doom Loop Detection
Section titled “Doom Loop Detection”The DoomLoopTracker detects when an agent is stuck repeating the same actions. It hashes the entire tool-call batch on each turn and tracks consecutive identical batches.
When the count exceeds max_identical_tool_calls, the agent is stopped with an error.
[[agents]]name = "assistant"max_identical_tool_calls = 3 # Stop after 3 identical consecutive tool batchesOr via the builder:
let agent = AgentRunner::builder(provider) .max_identical_tool_calls(3) .build()?;Setting this to 0 is rejected at build time. Leave unset (default: None) to disable doom loop detection.
Session Pruning
Section titled “Session Pruning”SessionPruneConfig automatically trims old tool results from the conversation before each LLM call. This reduces input tokens without losing the conversation flow:
[[agents]]name = "assistant"session_prune = { keep_recent_n = 2, pruned_tool_result_max_bytes = 200, preserve_task = true }keep_recent_n(default: 2) — number of recent message pairs kept at full fidelitypruned_tool_result_max_bytes(default: 200) — tool results exceeding this are replaced with head + tail +[pruned: N bytes]preserve_task(default: true) — keeps the first user message (the original task) from pruning
Tool Name Repair
Section titled “Tool Name Repair”When the LLM generates a tool call with a misspelled name, Heartbit attempts automatic repair using Levenshtein distance. If a registered tool name is within edit distance 2 of the requested name, the call is redirected to the correct tool with a warning.
This handles common LLM mistakes like bash_command instead of bash without failing the turn.
Token Budgets
Section titled “Token Budgets”Control token usage at multiple levels:
| Setting | Scope | Description |
|---|---|---|
max_tokens | Per LLM call | Maximum tokens in each LLM response |
max_turns | Per agent run | Maximum reasoning turns before stopping |
| Context strategy | Per agent | Overall context window management |
The agent loop tracks cumulative TokenUsage (input, output, cache creation, cache read) across all turns, available in AgentOutput::tokens_used.