Skip to content

Providers

Heartbit communicates with LLMs through the LlmProvider trait. Providers are composable — you wrap a base provider with retry logic, cascading, or other behaviors.

The LlmProvider trait defines two methods:

  • complete() — send a completion request and receive the full response
  • stream_complete() — send a completion request and receive an SSE stream of deltas

Both accept messages, tools, system prompt, and configuration (max tokens, temperature, tool choice).

Direct integration with the Anthropic API. Supports Claude model families with native SSE streaming.

use heartbit::AnthropicProvider;
let provider = AnthropicProvider::new(api_key, "claude-sonnet-4-20250514");

Enable prompt caching to reduce costs on repeated conversations:

let provider = AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
.with_prompt_caching();

Prompt caching places 3 cache breakpoints: the system prompt, the last tool definition, and the second-to-last user message.

Routes requests through OpenRouter, supporting a wide range of models. Translates between OpenAI-format SSE and Heartbit’s internal format.

use heartbit::OpenRouterProvider;
let provider = OpenRouterProvider::new(api_key, "anthropic/claude-sonnet-4-20250514");

Providers compose by wrapping. The typical stack looks like:

CascadingProvider (optional)
-> RetryingProvider
-> AnthropicProvider / OpenRouterProvider

Wraps any provider with exponential backoff retry on transient failures:

use heartbit::RetryingProvider;
let provider = RetryingProvider::with_defaults(
AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
);

Retries on HTTP status codes: 429 (rate limit), 500, 502, 503, 529, and network errors.

Tries cheaper models first and escalates to more expensive ones when a confidence gate rejects the response:

[provider.cascade]
use heartbit::CascadingProvider;
// Configured via TOML:
// enabled = true
// tiers = [
// { model = "claude-haiku-4-5-20251001" },
// { model = "claude-sonnet-4-20250514" },
// ]

The ConfidenceGate trait evaluates responses. The built-in HeuristicGate checks for:

  • Refusal patterns in the response
  • Minimum token thresholds
  • Tool call acceptance
  • MaxTokens stop reason (response may be truncated)

Non-final tiers use complete() even for streaming requests to avoid streaming tokens that get discarded on escalation.

BoxedProvider provides object-safe wrapping for providers. Since LlmProvider uses async methods (RPITIT), it can’t be used as a trait object directly. BoxedProvider bridges this gap:

use heartbit::BoxedProvider;
let provider = Arc::new(BoxedProvider::new(
RetryingProvider::with_defaults(
AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
)
));

This is the standard way to pass providers to AgentRunner and Orchestrator.

Control how the LLM selects tools:

VariantBehavior
AutoLLM decides whether to use tools (default)
AnyLLM must use at least one tool
Tool { name }LLM must use a specific tool

Enable extended thinking for complex reasoning tasks. The LLM generates internal reasoning tokens before responding:

[provider]
reasoning_effort = "high" # "high", "medium", "low", or "none"

Or via env var: HEARTBIT_REASONING_EFFORT=high

Or builder: uses ReasoningEffort enum on the completion request.

Reasoning tokens are tracked separately in TokenUsage::reasoning_tokens and contribute to the total context budget.

Enable reflective reasoning where the agent reviews its own output before finalizing:

[provider]
enable_reflection = true

Or via env var: HEARTBIT_ENABLE_REFLECTION=true

The agent generates an initial response, then evaluates it and produces a refined version. This doubles LLM calls but can significantly improve quality on complex tasks.

Constrain agent output to a JSON schema using the __respond__ tool pattern:

let schema = serde_json::json!({
"type": "object",
"properties": {
"answer": { "type": "string" },
"confidence": { "type": "number" }
},
"required": ["answer", "confidence"]
});
let agent = AgentRunner::builder(provider)
.structured_schema(schema)
.build()?;
let output = agent.execute("What is 2+2?").await?;
let structured = output.structured.unwrap();
// {"answer": "4", "confidence": 1.0}

When structured_schema is set:

  1. A synthetic __respond__ tool is injected with the given schema
  2. The agent calls __respond__ to produce structured JSON
  3. The output is validated against the schema
  4. Available in AgentOutput::structured

Works in both standalone and Restate execution paths. Note: structured_schema and on_input are mutually exclusive.

[provider]
name = "anthropic" # or "openrouter"
model = "claude-sonnet-4-20250514"
api_key_env = "ANTHROPIC_API_KEY" # env var for API key
max_tokens = 16384
prompt_caching = true
[provider.retry]
max_retries = 3
base_delay_ms = 1000
[provider.cascade]
enabled = true
tiers = [
{ model = "claude-haiku-4-5-20251001" },
{ model = "claude-sonnet-4-20250514" },
]