Skip to content

Context Management

As agents run multi-turn conversations, the context window fills up. Heartbit provides strategies to manage context size, detect stuck loops, and recover from overflows.

Set a strategy on the agent builder or via config:

No trimming. The full conversation history is sent to the LLM every turn. Simple and predictable, but will eventually hit context limits on long conversations.

Keeps the system prompt plus the most recent messages within a token budget. Older messages are dropped:

[[agents]]
name = "assistant"
context_strategy = { type = "sliding_window", max_tokens = 100000 }

When context exceeds a threshold, the LLM generates a summary of the conversation so far. The summary replaces the older messages:

[[agents]]
name = "assistant"
context_strategy = { type = "summarize", max_tokens = 100000 }

When a ContextOverflow error occurs (the LLM rejects the request because the context is too large), Heartbit automatically:

  1. Summarizes the conversation using the LLM
  2. Injects the summary at position 4 in the message history (preserving the most recent context)
  3. Retries the LLM call

A maximum of 1 compaction per consecutive turn pair prevents infinite compaction loops.

Before compaction, a pre-compaction flush extracts tool results to episodic memory (when memory is enabled) so information isn’t permanently lost.

The DoomLoopTracker detects when an agent is stuck repeating the same actions. It hashes the entire tool-call batch on each turn and tracks consecutive identical batches.

When the count exceeds max_identical_tool_calls, the agent is stopped with an error.

[[agents]]
name = "assistant"
max_identical_tool_calls = 3 # Stop after 3 identical consecutive tool batches

Or via the builder:

let agent = AgentRunner::builder(provider)
.max_identical_tool_calls(3)
.build()?;

Setting this to 0 is rejected at build time. Leave unset (default: None) to disable doom loop detection.

SessionPruneConfig automatically trims old tool results from the conversation before each LLM call. This reduces input tokens without losing the conversation flow:

[[agents]]
name = "assistant"
session_prune = { keep_recent_n = 2, pruned_tool_result_max_bytes = 200, preserve_task = true }
  • keep_recent_n (default: 2) — number of recent message pairs kept at full fidelity
  • pruned_tool_result_max_bytes (default: 200) — tool results exceeding this are replaced with head + tail + [pruned: N bytes]
  • preserve_task (default: true) — keeps the first user message (the original task) from pruning

When the LLM generates a tool call with a misspelled name, Heartbit attempts automatic repair using Levenshtein distance. If a registered tool name is within edit distance 2 of the requested name, the call is redirected to the correct tool with a warning.

This handles common LLM mistakes like bash_command instead of bash without failing the turn.

Heartbit pre-classifies queries into tool profiles to reduce input tokens:

ProfileTools included~Token cost
ConversationalMemory tools + question only~500 tokens
StandardAll builtins + memory (no MCP)~2,000 tokens
FullEverything including MCP tools~4,500 tokens

Classification uses keyword heuristics:

  • Greetings/simple chat -> Conversational
  • File/code/system keywords -> Standard
  • MCP tool mentions or integration keywords -> Full
  • Long/complex queries -> Standard (safety escalation)

Essential tools (memory_recall, memory_store, question, __respond__) are always included regardless of profile.

Configure via config:

[[agents]]
tool_profile = "standard" # or "conversational", "full"

Or environment variable: HEARTBIT_TOOL_PROFILE=standard

Or builder: AgentRunnerBuilder::tool_profile(ToolProfile::Standard)

For very long conversations, simple summarization may lose information. Recursive summarization uses a cluster-then-summarize approach:

  1. Groups related messages into clusters
  2. Summarizes each cluster independently
  3. Combines cluster summaries into a final summary

Enable via config:

[[agents]]
recursive_summarization = true

Or environment variable: HEARTBIT_RECURSIVE_SUMMARIZATION=true

Control token usage at multiple levels:

SettingScopeDescription
max_tokensPer LLM callMaximum tokens in each LLM response
max_turnsPer agent runMaximum reasoning turns before stopping
Context strategyPer agentOverall context window management

The agent loop tracks cumulative TokenUsage (input, output, cache creation, cache read) across all turns, available in AgentOutput::tokens_used.