Memory

Heartbit’s memory system gives agents persistent knowledge across turns and sessions. Inspired by MemGPT, it provides agent-facing tools for storing, recalling, and managing memories with sophisticated scoring and decay.

Memory tools are available in the standalone execution path only. They are not supported in the Restate (durable) path.

Memory Trait

The Memory trait defines 6 core operations:

Method	Description
`store`	Store a new memory entry
`recall`	Search memories by query
`update`	Update an existing entry
`forget`	Remove a memory entry
`add_link`	Create bidirectional links between entries
`prune`	Remove weak/stale entries

Storage Backends

Backend	Use case
`InMemoryStore`	Development, testing, short-lived agents
`PostgresMemoryStore`	Production persistence with pgvector for vector search
`NamespacedMemory`	Multi-tenant isolation with 3-tier namespace (user/agent/session)

Memory Types

Each memory entry has a type that determines how it’s treated during recall and consolidation:

Type	Description
Episodic	Event-based memories (default). What happened, when, in what context.
Semantic	Factual knowledge. Consolidated from episodic memories.
Reflection	Meta-observations about patterns. Generated by the reflection system.

Recall Scoring

When an agent searches memory, results are ranked using a multi-signal scoring system:

BM25 Keyword Search

Standard BM25 scoring with a 2x boost for keyword matches. This ensures exact keyword hits rank highly.

Park et al. Composite Scoring

Four weighted signals combined:

Recency — more recent memories score higher
Importance — higher-importance memories score higher
Relevance — semantic relevance to the query
Strength — current memory strength after decay

Hybrid Retrieval

When embeddings are available, BM25 and vector cosine similarity scores are fused via Reciprocal Rank Fusion (RRF) for the best of both keyword and semantic search.

Ebbinghaus Strength Decay

Memory strength decays over time following an exponential curve:

Decay rate: 0.005/hr (~6-day half-life)
Strength reinforced by +0.2 on each access, capped at 1.0
Memories that are accessed frequently stay strong; unused memories fade

This models natural forgetting — important memories that are revisited stay accessible, while irrelevant ones gradually disappear.

Reflection

The ReflectionTracker monitors the cumulative importance of stored memories. When a threshold is exceeded, it triggers a reflection prompt that asks the LLM to identify patterns and generate Reflection-type memories.

Reflections are meta-cognitive — they help the agent recognize recurring themes, user preferences, and behavioral patterns.

Configure via reflection_threshold on AgentConfig or the HEARTBIT_REFLECTION_THRESHOLD env var.

Consolidation

The ConsolidationPipeline reduces memory bloat by merging related entries:

Clusters entries by Jaccard keyword similarity
Merges each cluster into a single Semantic entry
The original episodic entries are replaced by the consolidated semantic entry

Trigger manually via the memory_consolidate tool or automatically at session end with consolidate_on_exit.

Pruning

Weak memories are automatically cleaned up:

Session-end pruning — always runs when memory is present; removes entries below strength threshold with minimum age
Session pruning — SessionPruneConfig trims old tool results from the conversation before LLM calls, reducing input tokens
Pre-compaction flush — before context summarization, tool results are extracted to episodic memory so they aren’t lost

Agent-Facing Tools

Agents interact with memory through 5 tools:

Tool	Description
`memory_store`	Store a new memory with content, type, importance, and keywords
`memory_recall`	Search memories by natural language query
`memory_update`	Update the content of an existing memory
`memory_forget`	Remove a memory by ID
`memory_consolidate`	Merge multiple memories into one (provide source IDs and new content)

Embedding Providers

Embeddings enable hybrid retrieval (BM25 + vector cosine) for improved recall quality:

Provider	Requirements	Dimension
`NoopEmbedding`	None	BM25-only fallback
`OpenAiEmbedding`	`OPENAI_API_KEY`	1536 or 3072
`LocalEmbeddingProvider`	`local-embedding` feature	384 (MiniLM default)

Local embeddings run entirely offline via ONNX Runtime (fastembed). Models are downloaded once on first use (~30MB). No API keys required.

Configuration

[memory]
type = "in_memory"    # or "postgres"

[memory.embedding]
provider = "local"    # "openai", "local", or "none"
model = "all-MiniLM-L6-v2"
cache_dir = "/tmp/fastembed"

# Agent-level memory settings
[[agents]]
name = "assistant"
reflection_threshold = 50
consolidate_on_exit = true
session_prune = { keep_recent_n = 2, pruned_tool_result_max_bytes = 200, preserve_task = true }