Providers

Heartbit communicates with LLMs through the LlmProvider trait. Providers are composable — you wrap a base provider with retry logic, cascading, or other behaviors.

LlmProvider Trait

The LlmProvider trait defines two methods:

complete() — send a completion request and receive the full response
stream_complete() — send a completion request and receive an SSE stream of deltas

Both accept messages, tools, system prompt, and configuration (max tokens, temperature, tool choice).

Built-in Providers

AnthropicProvider

Direct integration with the Anthropic API. Supports Claude model families with native SSE streaming.

use heartbit::AnthropicProvider;

let provider = AnthropicProvider::new(api_key, "claude-sonnet-4-20250514");

Enable prompt caching to reduce costs on repeated conversations:

let provider = AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
    .with_prompt_caching();

Prompt caching places 3 cache breakpoints: the system prompt, the last tool definition, and the second-to-last user message.

OpenRouterProvider

Routes requests through OpenRouter, supporting a wide range of models. Translates between OpenAI-format SSE and Heartbit’s internal format.

use heartbit::OpenRouterProvider;

let provider = OpenRouterProvider::new(api_key, "anthropic/claude-sonnet-4-20250514");

Provider Composition

Providers compose by wrapping. The typical stack looks like:

CascadingProvider (optional)
  -> RetryingProvider
    -> AnthropicProvider / OpenRouterProvider

RetryingProvider

Wraps any provider with exponential backoff retry on transient failures:

use heartbit::RetryingProvider;

let provider = RetryingProvider::with_defaults(
    AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
);

Retries on HTTP status codes: 429 (rate limit), 500, 502, 503, 529, and network errors.

CascadingProvider

Tries cheaper models first and escalates to more expensive ones when a confidence gate rejects the response:

use heartbit::CascadingProvider;

// Configured via TOML:
// enabled = true
// tiers = [
//   { model = "claude-haiku-4-5-20251001" },
//   { model = "claude-sonnet-4-20250514" },
// ]

The ConfidenceGate trait evaluates responses. The built-in HeuristicGate checks for:

Refusal patterns in the response
Minimum token thresholds
Tool call acceptance
MaxTokens stop reason (response may be truncated)

Non-final tiers use complete() even for streaming requests to avoid streaming tokens that get discarded on escalation.

BoxedProvider

BoxedProvider provides object-safe wrapping for providers. Since LlmProvider uses async methods (RPITIT), it can’t be used as a trait object directly. BoxedProvider bridges this gap:

use heartbit::BoxedProvider;

let provider = Arc::new(BoxedProvider::new(
    RetryingProvider::with_defaults(
        AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
    )
));

This is the standard way to pass providers to AgentRunner and Orchestrator.

ToolChoice

Control how the LLM selects tools:

Variant	Behavior
`Auto`	LLM decides whether to use tools (default)
`Any`	LLM must use at least one tool
`Tool { name }`	LLM must use a specific tool

Configuration

[provider]
name = "anthropic"          # or "openrouter"
model = "claude-sonnet-4-20250514"
api_key_env = "ANTHROPIC_API_KEY"   # env var for API key
max_tokens = 16384
prompt_caching = true

[provider.retry]
max_retries = 3
base_delay_ms = 1000

[provider.cascade]
enabled = true
tiers = [
    { model = "claude-haiku-4-5-20251001" },
    { model = "claude-sonnet-4-20250514" },
]