Response Caching
Heartbit includes a built-in LRU cache for LLM completion responses. When the same prompt and tool configuration produces the same request, the cached response is returned instantly without an API call.
How It Works
Section titled “How It Works”ResponseCache is a thread-safe LRU cache backed by a Vec with move-to-front on hit and eviction from the back. It uses std::sync::Mutex (never held across .await) for safe concurrent access.
Cache keys are computed from three components via FNV-1a hashing:
- System prompt — the full system prompt text
- Messages — the serialized conversation history
- Tool names — sorted before hashing (order-independent)
Two requests with the same system prompt, message history, and available tools will produce the same cache key regardless of tool ordering.
use std::sync::Arc;use heartbit::AgentRunner;use heartbit::agent::ResponseCache;
let cache = Arc::new(ResponseCache::new(50)); // 50-entry LRU
let runner = AgentRunner::builder(provider) .name("researcher") .system_prompt("You are a researcher.") .response_cache(cache.clone()) .build()?;Wire the cache via AgentRunnerBuilder::response_cache(Arc<ResponseCache>).
Capacity Guidelines
Section titled “Capacity Guidelines”| Use Case | Capacity | Notes |
|---|---|---|
| Development/testing | 10-50 | Fast iteration, low memory |
| Deterministic pipelines | 50-100 | Repeated queries with same inputs |
| Interactive agents | 0 (disabled) | Conversations rarely repeat exactly |
The cache uses O(n) operations per access, which is efficient for typical capacities (10-100). For very large caches (1000+), consider an alternative approach.
When to Use
Section titled “When to Use”Response caching is most effective for:
- Deterministic tasks where the same input reliably produces the same useful output
- Cost reduction on repeated queries during development or batch processing
- Testing scenarios where you want consistent LLM responses without API calls
Response caching is not recommended for:
- Interactive conversations (each turn produces unique message history)
- Tasks where freshness matters (e.g., web search followed by analysis)
- Agents with non-deterministic tool results that feed back into prompts
API Reference
Section titled “API Reference”// Create with max entrieslet cache = ResponseCache::new(100);
// Manual key computationlet key = ResponseCache::compute_key( "system prompt", &messages, &["tool_a", "tool_b"],);
// Manual get/putif let Some(response) = cache.get(key) { // Cache hit} else { let response = provider.complete(request).await?; cache.put(key, response.clone());}
// Utilitiescache.len(); // Current entry countcache.is_empty(); // True if emptycache.clear(); // Remove all entriesWhen wired via AgentRunnerBuilder, the cache is checked automatically before each LLM call and populated after each successful response.
Sharing Across Agents
Section titled “Sharing Across Agents”The cache is wrapped in Arc, so you can share a single cache instance across multiple agents:
let shared_cache = Arc::new(ResponseCache::new(100));
let agent_a = AgentRunner::builder(provider.clone()) .name("agent-a") .system_prompt("You analyze code.") .response_cache(shared_cache.clone()) .build()?;
let agent_b = AgentRunner::builder(provider.clone()) .name("agent-b") .system_prompt("You analyze code.") // Same prompt = cache hits .response_cache(shared_cache.clone()) .build()?;Agents with identical system prompts and tool sets will share cache hits. Agents with different prompts naturally produce different cache keys.