Configuration Reference

Heartbit uses TOML configuration files. Pass via --config heartbit.toml or place in the working directory.

Provider

[provider]
name = "anthropic"                    # or "openrouter"
model = "claude-sonnet-4-20250514"
prompt_caching = true                 # Anthropic only; default false

Retry

[provider.retry]                      # optional: retry transient failures
max_retries = 3
base_delay_ms = 500
max_delay_ms = 30000

Retries on HTTP status codes 429, 500, 502, 503, 529, and network errors with exponential backoff.

Cascade

[provider.cascade]                    # optional: try cheaper models first
enabled = true

[[provider.cascade.tiers]]
model = "anthropic/claude-3.5-haiku"  # cheapest tier tried first

[provider.cascade.gate]
type = "heuristic"                    # escalate if response is low-quality
min_output_tokens = 10                # escalate on very short responses
accept_tool_calls = false             # escalate if cheap model wants to use tools
escalate_on_max_tokens = false        # escalate on max_tokens stop reason

The cascading provider tries the cheapest model first and escalates to more expensive tiers when the heuristic gate rejects the response.

Orchestrator

[orchestrator]
max_turns = 10
max_tokens = 4096
run_timeout_seconds = 300             # wall-clock deadline for the entire run
routing = "auto"                      # "auto", "always_orchestrate", or "single_agent"
dispatch_mode = "parallel"            # "parallel" or "sequential" (sub-agent dispatch)
reasoning_effort = "high"             # "high", "medium", "low", or "none"
tool_profile = "standard"             # "conversational", "standard", or "full"

Field	Default	Description
`max_turns`	`10`	Maximum reasoning turns for the orchestrator
`max_tokens`	`4096`	Maximum tokens per LLM response
`run_timeout_seconds`	—	Wall-clock deadline for the entire run
`routing`	`"auto"`	`auto` selects single-agent when only one agent is defined. `always_orchestrate` forces orchestrator. `single_agent` skips orchestration.
`dispatch_mode`	`"parallel"`	How sub-agents are dispatched: `parallel` (concurrent via `tokio::JoinSet`) or `sequential`
`reasoning_effort`	—	Controls extended thinking: `high`, `medium`, `low`, or `none`
`tool_profile`	—	Pre-filters tools before each turn: `conversational` (minimal), `standard` (default set), `full` (all tools)

Agents

[[agents]]
name = "researcher"
description = "Research specialist"
system_prompt = "You are a research specialist."
mcp_servers = ["http://localhost:8000/mcp"]

# All optional:
max_turns = 20                        # override orchestrator default
max_tokens = 16384
tool_timeout_seconds = 60
max_tool_output_bytes = 16384
run_timeout_seconds = 120             # per-agent wall-clock deadline
summarize_threshold = 80000
reasoning_effort = "medium"           # per-agent override
tool_profile = "full"                 # per-agent override
context_strategy = { type = "sliding_window", max_tokens = 100000 }
# context_strategy = { type = "summarize", threshold = 80000 }
# context_strategy = { type = "unlimited" }

Session Pruning

[agents.session_prune]                # optional: trim old tool results before LLM calls
keep_recent_n = 2                     # keep N most recent message pairs at full fidelity
pruned_tool_result_max_bytes = 200    # truncate older tool results to this size
preserve_task = true                  # keep the first user message (task) intact

MCP Server Authentication

# Simple URL
mcp_servers = ["http://localhost:8000/mcp"]

# With authentication header
mcp_servers = [{ url = "http://localhost:8000/mcp", auth_header = "Bearer tok_xxx" }]

Per-Agent Provider Override

[agents.provider]
name = "anthropic"
model = "claude-opus-4-20250514"
prompt_caching = true

Each sub-agent can use a different LLM model by specifying its own [agents.provider] section.

Structured Output (Response Schema)

[agents.response_schema]
type = "object"
[agents.response_schema.properties.score]
type = "number"
[agents.response_schema.properties.summary]
type = "string"

When set, a synthetic __respond__ tool is injected. The agent produces structured JSON via the tool call.

Context Strategies

Strategy	Description
`unlimited`	No trimming (default)
`sliding_window`	Keep system prompt + recent messages within `max_tokens` budget
`summarize`	LLM-generated summary when context exceeds `threshold`

Memory

[memory]
type = "in_memory"                    # or: type = "postgres", database_url = "..."

[memory.embedding]                    # optional: enables hybrid retrieval (BM25 + vector)
provider = "local"                    # "openai", "local", or "none" (default)
model = "all-MiniLM-L6-v2"           # model name (provider-specific)
cache_dir = "/tmp/fastembed"          # local provider only: model cache directory
# api_key_env = "OPENAI_API_KEY"     # openai provider only

Memory backends:

Backend	Description
`in_memory`	In-process memory store (lost on restart)
`postgres`	PostgreSQL with pgvector for persistent memory

Embedding providers enable hybrid retrieval (BM25 keyword scoring + vector cosine similarity fused via RRF):

Provider	Description
`none`	BM25 only (default)
`local`	Offline ONNX embeddings via fastembed (requires `local-embedding` feature)
`openai`	OpenAI embeddings API

Knowledge Base

[knowledge]
chunk_size = 1000                     # max bytes per chunk (default: 1000)
chunk_overlap = 200                   # overlap bytes between chunks (default: 200)

[[knowledge.sources]]
type = "file"
path = "README.md"

[[knowledge.sources]]
type = "glob"
pattern = "docs/**/*.md"

[[knowledge.sources]]
type = "url"
url = "https://docs.example.com/api"

Knowledge base sources are loaded at startup. Files are split into overlapping chunks using paragraph-aware splitting.

Restate

[restate]
endpoint = "http://localhost:9070"

Restate endpoint for durable execution. Requires the restate feature flag.

Daemon

[daemon]
bind = "127.0.0.1:3000"            # HTTP API bind address
max_concurrent_tasks = 4            # bounded concurrency

Authentication

[daemon.auth]
bearer_tokens = ["$YOUR_API_KEY"]     # static API keys (multiple for rotation)
jwks_url = "https://idp.example.com/.well-known/jwks.json"  # JWT/JWKS auth
issuer = "https://idp.example.com"  # optional: validate iss claim
audience = "heartbit-daemon"        # optional: validate aud claim
# user_id_claim = "sub"             # JWT claim for user ID (default: "sub")
# tenant_id_claim = "tid"           # JWT claim for tenant ID (default: "tid")
# roles_claim = "roles"             # JWT claim for roles (default: "roles")

Supports both static bearer tokens and JWT/JWKS authentication. Multiple bearer tokens can be configured for key rotation.

Kafka

[daemon.kafka]
brokers = "localhost:9092"
consumer_group = "heartbit-daemon"  # default
commands_topic = "heartbit.commands"
events_topic = "heartbit.events"

Cron Schedules

[[daemon.schedules]]
name = "daily-review"
cron = "0 0 9 * * *"               # 6-field cron (sec min hr dom mon dow)
task = "Review yesterday's work"

Uses 6-field cron expressions: second minute hour day-of-month month day-of-week.

Telemetry

[telemetry]
otlp_endpoint = "http://localhost:4317"
service_name = "heartbit"

OpenTelemetry tracing via OTLP exporter. Traces are exported to the configured endpoint.

Minimal Example

[provider]
name = "anthropic"
model = "claude-sonnet-4-20250514"

[[agents]]
name = "researcher"
description = "Research specialist"
system_prompt = "You are a research specialist."

[[agents]]
name = "writer"
description = "Writing specialist"
system_prompt = "You are a writing specialist."