Configuration Reference
Heartbit uses TOML configuration files. Pass via --config heartbit.toml or place in the working directory.
Provider
Section titled “Provider”[provider]name = "anthropic" # or "openrouter"model = "claude-sonnet-4-20250514"prompt_caching = true # Anthropic only; default false[provider.retry] # optional: retry transient failuresmax_retries = 3base_delay_ms = 500max_delay_ms = 30000Retries on HTTP status codes 429, 500, 502, 503, 529, and network errors with exponential backoff.
Cascade
Section titled “Cascade”[provider.cascade] # optional: try cheaper models firstenabled = true
[[provider.cascade.tiers]]model = "anthropic/claude-3.5-haiku" # cheapest tier tried first
[provider.cascade.gate]type = "heuristic" # escalate if response is low-qualitymin_output_tokens = 10 # escalate on very short responsesaccept_tool_calls = false # escalate if cheap model wants to use toolsescalate_on_max_tokens = false # escalate on max_tokens stop reasonThe cascading provider tries the cheapest model first and escalates to more expensive tiers when the heuristic gate rejects the response.
Orchestrator
Section titled “Orchestrator”[orchestrator]max_turns = 10max_tokens = 4096run_timeout_seconds = 300 # wall-clock deadline for the entire runrouting = "auto" # "auto", "always_orchestrate", or "single_agent"dispatch_mode = "parallel" # "parallel" or "sequential" (sub-agent dispatch)reasoning_effort = "high" # "high", "medium", "low", or "none"tool_profile = "standard" # "conversational", "standard", or "full"| Field | Default | Description |
|---|---|---|
max_turns | 10 | Maximum reasoning turns for the orchestrator |
max_tokens | 4096 | Maximum tokens per LLM response |
run_timeout_seconds | — | Wall-clock deadline for the entire run |
routing | "auto" | auto selects single-agent when only one agent is defined. always_orchestrate forces orchestrator. single_agent skips orchestration. |
dispatch_mode | "parallel" | How sub-agents are dispatched: parallel (concurrent via tokio::JoinSet) or sequential |
reasoning_effort | — | Controls extended thinking: high, medium, low, or none |
tool_profile | — | Pre-filters tools before each turn: conversational (minimal), standard (default set), full (all tools) |
Agents
Section titled “Agents”[[agents]]name = "researcher"description = "Research specialist"system_prompt = "You are a research specialist."mcp_servers = ["http://localhost:8000/mcp"]
# All optional:max_turns = 20 # override orchestrator defaultmax_tokens = 16384tool_timeout_seconds = 60max_tool_output_bytes = 16384run_timeout_seconds = 120 # per-agent wall-clock deadlinesummarize_threshold = 80000reasoning_effort = "medium" # per-agent overridetool_profile = "full" # per-agent overridecontext_strategy = { type = "sliding_window", max_tokens = 100000 }# context_strategy = { type = "summarize", threshold = 80000 }# context_strategy = { type = "unlimited" }Session Pruning
Section titled “Session Pruning”[agents.session_prune] # optional: trim old tool results before LLM callskeep_recent_n = 2 # keep N most recent message pairs at full fidelitypruned_tool_result_max_bytes = 200 # truncate older tool results to this sizepreserve_task = true # keep the first user message (task) intactMCP Server Authentication
Section titled “MCP Server Authentication”# Simple URLmcp_servers = ["http://localhost:8000/mcp"]
# With authentication headermcp_servers = [{ url = "http://localhost:8000/mcp", auth_header = "Bearer tok_xxx" }]Per-Agent Provider Override
Section titled “Per-Agent Provider Override”[agents.provider]name = "anthropic"model = "claude-opus-4-20250514"prompt_caching = trueEach sub-agent can use a different LLM model by specifying its own [agents.provider] section.
Structured Output (Response Schema)
Section titled “Structured Output (Response Schema)”[agents.response_schema]type = "object"[agents.response_schema.properties.score]type = "number"[agents.response_schema.properties.summary]type = "string"When set, a synthetic __respond__ tool is injected. The agent produces structured JSON via the tool call.
Context Strategies
Section titled “Context Strategies”| Strategy | Description |
|---|---|
unlimited | No trimming (default) |
sliding_window | Keep system prompt + recent messages within max_tokens budget |
summarize | LLM-generated summary when context exceeds threshold |
Memory
Section titled “Memory”[memory]type = "in_memory" # or: type = "postgres", database_url = "..."
[memory.embedding] # optional: enables hybrid retrieval (BM25 + vector)provider = "local" # "openai", "local", or "none" (default)model = "all-MiniLM-L6-v2" # model name (provider-specific)cache_dir = "/tmp/fastembed" # local provider only: model cache directory# api_key_env = "OPENAI_API_KEY" # openai provider onlyMemory backends:
| Backend | Description |
|---|---|
in_memory | In-process memory store (lost on restart) |
postgres | PostgreSQL with pgvector for persistent memory |
Embedding providers enable hybrid retrieval (BM25 keyword scoring + vector cosine similarity fused via RRF):
| Provider | Description |
|---|---|
none | BM25 only (default) |
local | Offline ONNX embeddings via fastembed (requires local-embedding feature) |
openai | OpenAI embeddings API |
Knowledge Base
Section titled “Knowledge Base”[knowledge]chunk_size = 1000 # max bytes per chunk (default: 1000)chunk_overlap = 200 # overlap bytes between chunks (default: 200)
[[knowledge.sources]]type = "file"path = "README.md"
[[knowledge.sources]]type = "glob"pattern = "docs/**/*.md"
[[knowledge.sources]]type = "url"url = "https://docs.example.com/api"Knowledge base sources are loaded at startup. Files are split into overlapping chunks using paragraph-aware splitting.
Restate
Section titled “Restate”[restate]endpoint = "http://localhost:9070"Restate endpoint for durable execution. Requires the restate feature flag.
Daemon
Section titled “Daemon”[daemon]bind = "127.0.0.1:3000" # HTTP API bind addressmax_concurrent_tasks = 4 # bounded concurrencyAuthentication
Section titled “Authentication”[daemon.auth]bearer_tokens = ["$YOUR_API_KEY"] # static API keys (multiple for rotation)jwks_url = "https://idp.example.com/.well-known/jwks.json" # JWT/JWKS authissuer = "https://idp.example.com" # optional: validate iss claimaudience = "heartbit-daemon" # optional: validate aud claim# user_id_claim = "sub" # JWT claim for user ID (default: "sub")# tenant_id_claim = "tid" # JWT claim for tenant ID (default: "tid")# roles_claim = "roles" # JWT claim for roles (default: "roles")Supports both static bearer tokens and JWT/JWKS authentication. Multiple bearer tokens can be configured for key rotation.
[daemon.kafka]brokers = "localhost:9092"consumer_group = "heartbit-daemon" # defaultcommands_topic = "heartbit.commands"events_topic = "heartbit.events"Cron Schedules
Section titled “Cron Schedules”[[daemon.schedules]]name = "daily-review"cron = "0 0 9 * * *" # 6-field cron (sec min hr dom mon dow)task = "Review yesterday's work"Uses 6-field cron expressions: second minute hour day-of-month month day-of-week.
Telemetry
Section titled “Telemetry”[telemetry]otlp_endpoint = "http://localhost:4317"service_name = "heartbit"OpenTelemetry tracing via OTLP exporter. Traces are exported to the configured endpoint.
Minimal Example
Section titled “Minimal Example”[provider]name = "anthropic"model = "claude-sonnet-4-20250514"
[[agents]]name = "researcher"description = "Research specialist"system_prompt = "You are a research specialist."
[[agents]]name = "writer"description = "Writing specialist"system_prompt = "You are a writing specialist."