Providers

Heartbit communicates with LLMs through the LlmProvider trait. Providers are composable — you wrap a base provider with retry logic, cascading, or other behaviors.

LlmProvider Trait

The LlmProvider trait defines two methods:

complete() — send a completion request and receive the full response
stream_complete() — send a completion request and receive an SSE stream of deltas

Both accept messages, tools, system prompt, and configuration (max tokens, temperature, tool choice).

Built-in Providers

AnthropicProvider

Direct integration with the Anthropic API. Supports Claude model families with native SSE streaming.

use heartbit::AnthropicProvider;

let provider = AnthropicProvider::new(api_key, "claude-sonnet-4-20250514");

Enable prompt caching to reduce costs on repeated conversations:

let provider = AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
    .with_prompt_caching();

Prompt caching places 3 cache breakpoints: the system prompt, the last tool definition, and the second-to-last user message.

OpenRouterProvider

Routes requests through OpenRouter, supporting a wide range of models. Translates between OpenAI-format SSE and Heartbit’s internal format.

use heartbit::OpenRouterProvider;

let provider = OpenRouterProvider::new(api_key, "anthropic/claude-sonnet-4-20250514");

Provider Composition

Providers compose by wrapping. The typical stack looks like:

CascadingProvider (optional)
  -> RetryingProvider
    -> AnthropicProvider / OpenRouterProvider

RetryingProvider

Wraps any provider with exponential backoff retry on transient failures:

use heartbit::RetryingProvider;

let provider = RetryingProvider::with_defaults(
    AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
);

Retries on HTTP status codes: 429 (rate limit), 500, 502, 503, 529, and network errors.

CascadingProvider

Tries cheaper models first and escalates to more expensive ones when a confidence gate rejects the response:

use heartbit::CascadingProvider;

// Configured via TOML:
// enabled = true
// tiers = [
//   { model = "claude-haiku-4-5-20251001" },
//   { model = "claude-sonnet-4-20250514" },
// ]

The ConfidenceGate trait evaluates responses. The built-in HeuristicGate checks for:

Refusal patterns in the response
Minimum token thresholds
Tool call acceptance
MaxTokens stop reason (response may be truncated)

Non-final tiers use complete() even for streaming requests to avoid streaming tokens that get discarded on escalation.

BoxedProvider

BoxedProvider provides object-safe wrapping for providers. Since LlmProvider uses async methods (RPITIT), it can’t be used as a trait object directly. BoxedProvider bridges this gap:

use heartbit::BoxedProvider;

let provider = Arc::new(BoxedProvider::new(
    RetryingProvider::with_defaults(
        AnthropicProvider::new(api_key, "claude-sonnet-4-20250514")
    )
));

This is the standard way to pass providers to AgentRunner and Orchestrator.

ToolChoice

Control how the LLM selects tools:

Variant	Behavior
`Auto`	LLM decides whether to use tools (default)
`Any`	LLM must use at least one tool
`Tool { name }`	LLM must use a specific tool

Reasoning / Extended Thinking

Enable extended thinking for complex reasoning tasks. The LLM generates internal reasoning tokens before responding:

[provider]
reasoning_effort = "high"   # "high", "medium", "low", or "none"

Or via env var: HEARTBIT_REASONING_EFFORT=high

Or builder: uses ReasoningEffort enum on the completion request.

Reasoning tokens are tracked separately in TokenUsage::reasoning_tokens and contribute to the total context budget.

Reflection

Enable reflective reasoning where the agent reviews its own output before finalizing:

[provider]
enable_reflection = true

Or via env var: HEARTBIT_ENABLE_REFLECTION=true

The agent generates an initial response, then evaluates it and produces a refined version. This doubles LLM calls but can significantly improve quality on complex tasks.

Structured Output

Constrain agent output to a JSON schema using the __respond__ tool pattern:

let schema = serde_json::json!({
    "type": "object",
    "properties": {
        "answer": { "type": "string" },
        "confidence": { "type": "number" }
    },
    "required": ["answer", "confidence"]
});

let agent = AgentRunner::builder(provider)
    .structured_schema(schema)
    .build()?;

let output = agent.execute("What is 2+2?").await?;
let structured = output.structured.unwrap();
// {"answer": "4", "confidence": 1.0}

When structured_schema is set:

A synthetic __respond__ tool is injected with the given schema
The agent calls __respond__ to produce structured JSON
The output is validated against the schema
Available in AgentOutput::structured

Works in both standalone and Restate execution paths. Note: structured_schema and on_input are mutually exclusive.

Configuration

[provider]
name = "anthropic"          # or "openrouter"
model = "claude-sonnet-4-20250514"
api_key_env = "ANTHROPIC_API_KEY"   # env var for API key
max_tokens = 16384
prompt_caching = true

[provider.retry]
max_retries = 3
base_delay_ms = 1000

[provider.cascade]
enabled = true
tiers = [
    { model = "claude-haiku-4-5-20251001" },
    { model = "claude-sonnet-4-20250514" },
]