Context Management

As agents run multi-turn conversations, the context window fills up. Heartbit provides strategies to manage context size, detect stuck loops, and recover from overflows.

Context Strategies

Set a strategy on the agent builder or via config:

Unlimited (Default)

No trimming. The full conversation history is sent to the LLM every turn. Simple and predictable, but will eventually hit context limits on long conversations.

SlidingWindow

Keeps the system prompt plus the most recent messages within a token budget. Older messages are dropped:

[[agents]]
name = "assistant"
context_strategy = { type = "sliding_window", max_tokens = 100000 }

Summarize

When context exceeds a threshold, the LLM generates a summary of the conversation so far. The summary replaces the older messages:

[[agents]]
name = "assistant"
context_strategy = { type = "summarize", max_tokens = 100000 }

Auto-Compaction

When a ContextOverflow error occurs (the LLM rejects the request because the context is too large), Heartbit automatically:

Summarizes the conversation using the LLM
Injects the summary at position 4 in the message history (preserving the most recent context)
Retries the LLM call

A maximum of 1 compaction per consecutive turn pair prevents infinite compaction loops.

Before compaction, a pre-compaction flush extracts tool results to episodic memory (when memory is enabled) so information isn’t permanently lost.

Doom Loop Detection

The DoomLoopTracker detects when an agent is stuck repeating the same actions. It hashes the entire tool-call batch on each turn and tracks consecutive identical batches.

When the count exceeds max_identical_tool_calls, the agent is stopped with an error.

[[agents]]
name = "assistant"
max_identical_tool_calls = 3   # Stop after 3 identical consecutive tool batches

Or via the builder:

let agent = AgentRunner::builder(provider)
    .max_identical_tool_calls(3)
    .build()?;

Setting this to 0 is rejected at build time. Leave unset (default: None) to disable doom loop detection.

Session Pruning

SessionPruneConfig automatically trims old tool results from the conversation before each LLM call. This reduces input tokens without losing the conversation flow:

[[agents]]
name = "assistant"
session_prune = { keep_recent_n = 2, pruned_tool_result_max_bytes = 200, preserve_task = true }

keep_recent_n (default: 2) — number of recent message pairs kept at full fidelity
pruned_tool_result_max_bytes (default: 200) — tool results exceeding this are replaced with head + tail + [pruned: N bytes]
preserve_task (default: true) — keeps the first user message (the original task) from pruning

Tool Name Repair

When the LLM generates a tool call with a misspelled name, Heartbit attempts automatic repair using Levenshtein distance. If a registered tool name is within edit distance 2 of the requested name, the call is redirected to the correct tool with a warning.

This handles common LLM mistakes like bash_command instead of bash without failing the turn.

Dynamic Tool Selection (ToolProfile)

Heartbit pre-classifies queries into tool profiles to reduce input tokens:

Profile	Tools included	~Token cost
`Conversational`	Memory tools + question only	~500 tokens
`Standard`	All builtins + memory (no MCP)	~2,000 tokens
`Full`	Everything including MCP tools	~4,500 tokens

Classification uses keyword heuristics:

Greetings/simple chat -> Conversational
File/code/system keywords -> Standard
MCP tool mentions or integration keywords -> Full
Long/complex queries -> Standard (safety escalation)

Essential tools (memory_recall, memory_store, question, __respond__) are always included regardless of profile.

Configure via config:

[[agents]]
tool_profile = "standard"  # or "conversational", "full"

Or environment variable: HEARTBIT_TOOL_PROFILE=standard

Or builder: AgentRunnerBuilder::tool_profile(ToolProfile::Standard)

Recursive Summarization

For very long conversations, simple summarization may lose information. Recursive summarization uses a cluster-then-summarize approach:

Groups related messages into clusters
Summarizes each cluster independently
Combines cluster summaries into a final summary

Enable via config:

[[agents]]
recursive_summarization = true

Or environment variable: HEARTBIT_RECURSIVE_SUMMARIZATION=true

Token Budgets

Control token usage at multiple levels:

Setting	Scope	Description
`max_tokens`	Per LLM call	Maximum tokens in each LLM response
`max_turns`	Per agent run	Maximum reasoning turns before stopping
Context strategy	Per agent	Overall context window management

The agent loop tracks cumulative TokenUsage (input, output, cache creation, cache read) across all turns, available in AgentOutput::tokens_used.