Providers
Heartbit communicates with LLMs through the LlmProvider trait. Providers are composable — you wrap a base provider with retry logic, cascading, or other behaviors.
LlmProvider Trait
Section titled “LlmProvider Trait”The LlmProvider trait defines two methods:
complete()— send a completion request and receive the full responsestream_complete()— send a completion request and receive an SSE stream of deltas
Both accept messages, tools, system prompt, and configuration (max tokens, temperature, tool choice).
Built-in Providers
Section titled “Built-in Providers”AnthropicProvider
Section titled “AnthropicProvider”Direct integration with the Anthropic API. Supports Claude model families with native SSE streaming.
use heartbit::AnthropicProvider;
let provider = AnthropicProvider::new(api_key, "claude-sonnet-4-20250514");Enable prompt caching to reduce costs on repeated conversations:
let provider = AnthropicProvider::new(api_key, "claude-sonnet-4-20250514") .with_prompt_caching();Prompt caching places 3 cache breakpoints: the system prompt, the last tool definition, and the second-to-last user message.
OpenRouterProvider
Section titled “OpenRouterProvider”Routes requests through OpenRouter, supporting a wide range of models. Translates between OpenAI-format SSE and Heartbit’s internal format.
use heartbit::OpenRouterProvider;
let provider = OpenRouterProvider::new(api_key, "anthropic/claude-sonnet-4-20250514");Provider Composition
Section titled “Provider Composition”Providers compose by wrapping. The typical stack looks like:
CascadingProvider (optional) -> RetryingProvider -> AnthropicProvider / OpenRouterProviderRetryingProvider
Section titled “RetryingProvider”Wraps any provider with exponential backoff retry on transient failures:
use heartbit::RetryingProvider;
let provider = RetryingProvider::with_defaults( AnthropicProvider::new(api_key, "claude-sonnet-4-20250514"));Retries on HTTP status codes: 429 (rate limit), 500, 502, 503, 529, and network errors.
CascadingProvider
Section titled “CascadingProvider”Tries cheaper models first and escalates to more expensive ones when a confidence gate rejects the response:
use heartbit::CascadingProvider;
// Configured via TOML:// enabled = true// tiers = [// { model = "claude-haiku-4-5-20251001" },// { model = "claude-sonnet-4-20250514" },// ]The ConfidenceGate trait evaluates responses. The built-in HeuristicGate checks for:
- Refusal patterns in the response
- Minimum token thresholds
- Tool call acceptance
- MaxTokens stop reason (response may be truncated)
Non-final tiers use complete() even for streaming requests to avoid streaming tokens that get discarded on escalation.
BoxedProvider
Section titled “BoxedProvider”BoxedProvider provides object-safe wrapping for providers. Since LlmProvider uses async methods (RPITIT), it can’t be used as a trait object directly. BoxedProvider bridges this gap:
use heartbit::BoxedProvider;
let provider = Arc::new(BoxedProvider::new( RetryingProvider::with_defaults( AnthropicProvider::new(api_key, "claude-sonnet-4-20250514") )));This is the standard way to pass providers to AgentRunner and Orchestrator.
ToolChoice
Section titled “ToolChoice”Control how the LLM selects tools:
| Variant | Behavior |
|---|---|
Auto | LLM decides whether to use tools (default) |
Any | LLM must use at least one tool |
Tool { name } | LLM must use a specific tool |
Reasoning / Extended Thinking
Section titled “Reasoning / Extended Thinking”Enable extended thinking for complex reasoning tasks. The LLM generates internal reasoning tokens before responding:
[provider]reasoning_effort = "high" # "high", "medium", "low", or "none"Or via env var: HEARTBIT_REASONING_EFFORT=high
Or builder: uses ReasoningEffort enum on the completion request.
Reasoning tokens are tracked separately in TokenUsage::reasoning_tokens and contribute to the total context budget.
Reflection
Section titled “Reflection”Enable reflective reasoning where the agent reviews its own output before finalizing:
[provider]enable_reflection = trueOr via env var: HEARTBIT_ENABLE_REFLECTION=true
The agent generates an initial response, then evaluates it and produces a refined version. This doubles LLM calls but can significantly improve quality on complex tasks.
Structured Output
Section titled “Structured Output”Constrain agent output to a JSON schema using the __respond__ tool pattern:
let schema = serde_json::json!({ "type": "object", "properties": { "answer": { "type": "string" }, "confidence": { "type": "number" } }, "required": ["answer", "confidence"]});
let agent = AgentRunner::builder(provider) .structured_schema(schema) .build()?;
let output = agent.execute("What is 2+2?").await?;let structured = output.structured.unwrap();// {"answer": "4", "confidence": 1.0}When structured_schema is set:
- A synthetic
__respond__tool is injected with the given schema - The agent calls
__respond__to produce structured JSON - The output is validated against the schema
- Available in
AgentOutput::structured
Works in both standalone and Restate execution paths. Note: structured_schema and on_input are mutually exclusive.
Configuration
Section titled “Configuration”[provider]name = "anthropic" # or "openrouter"model = "claude-sonnet-4-20250514"api_key_env = "ANTHROPIC_API_KEY" # env var for API keymax_tokens = 16384prompt_caching = true
[provider.retry]max_retries = 3base_delay_ms = 1000
[provider.cascade]enabled = truetiers = [ { model = "claude-haiku-4-5-20251001" }, { model = "claude-sonnet-4-20250514" },]