Skip to content

Memory

Heartbit’s memory system gives agents persistent knowledge across turns and sessions. Inspired by MemGPT, it provides agent-facing tools for storing, recalling, and managing memories with sophisticated scoring and decay.

Memory tools are available in the standalone execution path only. They are not supported in the Restate (durable) path.

The Memory trait defines 6 core operations:

MethodDescription
storeStore a new memory entry
recallSearch memories by query
updateUpdate an existing entry
forgetRemove a memory entry
add_linkCreate bidirectional links between entries
pruneRemove weak/stale entries
BackendUse case
InMemoryStoreDevelopment, testing, short-lived agents
PostgresMemoryStoreProduction persistence with pgvector for vector search
NamespacedMemoryMulti-tenant isolation with 3-tier namespace (user/agent/session)

Each memory entry has a type that determines how it’s treated during recall and consolidation:

TypeDescription
EpisodicEvent-based memories (default). What happened, when, in what context.
SemanticFactual knowledge. Consolidated from episodic memories.
ReflectionMeta-observations about patterns. Generated by the reflection system.

When an agent searches memory, results are ranked using a multi-signal scoring system:

Standard BM25 scoring with a 2x boost for keyword matches. This ensures exact keyword hits rank highly.

Four weighted signals combined:

  • Recency — more recent memories score higher
  • Importance — higher-importance memories score higher
  • Relevance — semantic relevance to the query
  • Strength — current memory strength after decay

When embeddings are available, BM25 and vector cosine similarity scores are fused via Reciprocal Rank Fusion (RRF) for the best of both keyword and semantic search.

Memory strength decays over time following an exponential curve:

  • Decay rate: 0.005/hr (~6-day half-life)
  • Strength reinforced by +0.2 on each access, capped at 1.0
  • Memories that are accessed frequently stay strong; unused memories fade

This models natural forgetting — important memories that are revisited stay accessible, while irrelevant ones gradually disappear.

The ReflectionTracker monitors the cumulative importance of stored memories. When a threshold is exceeded, it triggers a reflection prompt that asks the LLM to identify patterns and generate Reflection-type memories.

Reflections are meta-cognitive — they help the agent recognize recurring themes, user preferences, and behavioral patterns.

Configure via reflection_threshold on AgentConfig or the HEARTBIT_REFLECTION_THRESHOLD env var.

The ConsolidationPipeline reduces memory bloat by merging related entries:

  1. Clusters entries by Jaccard keyword similarity
  2. Merges each cluster into a single Semantic entry
  3. The original episodic entries are replaced by the consolidated semantic entry

Trigger manually via the memory_consolidate tool or automatically at session end with consolidate_on_exit.

Weak memories are automatically cleaned up:

  • Session-end pruning — always runs when memory is present; removes entries below strength threshold with minimum age
  • Session pruningSessionPruneConfig trims old tool results from the conversation before LLM calls, reducing input tokens
  • Pre-compaction flush — before context summarization, tool results are extracted to episodic memory so they aren’t lost

Agents interact with memory through 5 tools:

ToolDescription
memory_storeStore a new memory with content, type, importance, and keywords
memory_recallSearch memories by natural language query
memory_updateUpdate the content of an existing memory
memory_forgetRemove a memory by ID
memory_consolidateMerge multiple memories into one (provide source IDs and new content)

Embeddings enable hybrid retrieval (BM25 + vector cosine) for improved recall quality:

ProviderRequirementsDimension
NoopEmbeddingNoneBM25-only fallback
OpenAiEmbeddingOPENAI_API_KEY1536 or 3072
LocalEmbeddingProviderlocal-embedding feature384 (MiniLM default)

Local embeddings run entirely offline via ONNX Runtime (fastembed). Models are downloaded once on first use (~30MB). No API keys required.

[memory]
type = "in_memory" # or "postgres"
[memory.embedding]
provider = "local" # "openai", "local", or "none"
model = "all-MiniLM-L6-v2"
cache_dir = "/tmp/fastembed"
# Agent-level memory settings
[[agents]]
name = "assistant"
reflection_threshold = 50
consolidate_on_exit = true
session_prune = { keep_recent_n = 2, pruned_tool_result_max_bytes = 200, preserve_task = true }