Semvec vs. Letta (MemGPT)¶
Letta (originally MemGPT) is an OSS framework that implements OS-style memory paging for LLM agents. This page compares the two on architecture only — we have not run a head-to-head benchmark against Letta. The differences below are structural, derived from each project's public documentation.
Architectural differences¶
| Property | Semvec | Letta |
|---|---|---|
| Memory model | Fixed-size 384-d state vector + 3 bounded memory tiers + verbatim literal cache | OS-style paging: in-context "core memory" blocks + external "archival memory" |
| Per-turn input footprint | Constant — compressed state (~150–350 tokens) | Sized by the in-context blocks Letta keeps loaded |
| What decides what's in context | Closed-form retrieval (cosine + tier weights + anchor / trigger boosts) | An LLM that calls memory-management tools to swap blocks in / out |
| Determinism on replay | Bit-exact across replays | Probabilistic (depends on the memory-management LLM) |
| Numeric / exact-value safety | Verbatim cache with Decimal precision (IBANs, amounts, IDs, dates) |
Stored as text in core / archival blocks; no native exact-value layer |
| Multi-agent coordination | Built-in (Cortex: aggregations + 5-level consensus) | Multi-agent supported; coordination is per-agent |
| Licensing | Proprietary (commercial) | Apache-2.0 |
| Self-hosted | Yes (proprietary license, on-prem) | Yes (OSS) |
The fundamental split: Semvec is math-driven; Letta is LLM-driven. Semvec compresses every
turn through a closed-form EMA update and retrieves through deterministic cosine similarity.
Letta delegates memory management to an LLM that explicitly calls tools like core_memory_append
or archival_memory_search.
That choice has consequences:
- Reproducibility. Semvec replays bit-for-bit across runs; Letta's behaviour depends on whatever the memory-management LLM decides this time.
- Cost predictability. Semvec ingest is free of LLM tokens; Letta's per-turn cost includes whatever memory-management calls the LLM emits.
- Audit trail. Semvec's append-only event store deterministically reconstructs the memory state at any point in time. Letta's archival operations are recorded but the in-context selection remains an LLM choice.
Code shape comparison¶
Semvec — closed-form ingest, deterministic retrieval:
from semvec import SemvecState, SemvecConfig
state = SemvecState(config=SemvecConfig(dimension=768))
for text, embedding in conversation:
state.update(embedding, text) # zero LLM calls
top = state.memory.get_relevant_memories(
embed("what did we decide about auth?"),
top_k=3,
)
Letta — LLM-managed memory tools, swap in / out of context:
# pseudo-code shape; see Letta docs for the live API
from letta_client import Letta
client = Letta(token="...")
agent = client.agents.create(memory_blocks=[...])
# Letta's LLM decides to call core_memory_append, archival_memory_insert, ... per turn
client.agents.messages.create(agent_id=agent.id, messages=[...])
The user-facing surface differs in spirit: Semvec hands you a compressed context as a string that you paste into any LLM call, whereas Letta hosts the agent loop and runs the LLM on your behalf.
When to pick which¶
Pick Semvec when:
- per-turn input cost must be O(1) — fixed system-prompt budget,
- ingest must be free of LLM cost and deterministic across replays,
- exact numeric / fact preservation matters (regulated workloads, audit, financial values),
- you want a memory layer you can compose with any LLM client, not a hosted agent runtime,
- you need an append-only event store with deterministic replay and signed deletion certificates.
Pick Letta when:
- you want an OSS-licensed, batteries-included agent runtime with memory built in,
- LLM-driven adaptive memory paging is the right shape for your problem,
- you're happy to run the LLM through the Letta server and don't need closed-form recall.
Sources¶
- Letta (formerly MemGPT): https://github.com/letta-ai/letta
- MemGPT paper (Packer et al., 2023): https://arxiv.org/abs/2310.08560