Skip to content

Semvec vs. Letta (MemGPT)

Letta (originally MemGPT) is an OSS framework that implements OS-style memory paging for LLM agents. This page compares the two on architecture only — we have not run a head-to-head benchmark against Letta. The differences below are structural, derived from each project's public documentation.

Architectural differences

Property Semvec Letta
Memory model Fixed-size 384-d state vector + 3 bounded memory tiers + verbatim literal cache OS-style paging: in-context "core memory" blocks + external "archival memory"
Per-turn input footprint Constant — compressed state (~150–350 tokens) Sized by the in-context blocks Letta keeps loaded
What decides what's in context Closed-form retrieval (cosine + tier weights + anchor / trigger boosts) An LLM that calls memory-management tools to swap blocks in / out
Determinism on replay Bit-exact across replays Probabilistic (depends on the memory-management LLM)
Numeric / exact-value safety Verbatim cache with Decimal precision (IBANs, amounts, IDs, dates) Stored as text in core / archival blocks; no native exact-value layer
Multi-agent coordination Built-in (Cortex: aggregations + 5-level consensus) Multi-agent supported; coordination is per-agent
Licensing Proprietary (commercial) Apache-2.0
Self-hosted Yes (proprietary license, on-prem) Yes (OSS)

The fundamental split: Semvec is math-driven; Letta is LLM-driven. Semvec compresses every turn through a closed-form EMA update and retrieves through deterministic cosine similarity. Letta delegates memory management to an LLM that explicitly calls tools like core_memory_append or archival_memory_search.

That choice has consequences:

  • Reproducibility. Semvec replays bit-for-bit across runs; Letta's behaviour depends on whatever the memory-management LLM decides this time.
  • Cost predictability. Semvec ingest is free of LLM tokens; Letta's per-turn cost includes whatever memory-management calls the LLM emits.
  • Audit trail. Semvec's append-only event store deterministically reconstructs the memory state at any point in time. Letta's archival operations are recorded but the in-context selection remains an LLM choice.

Code shape comparison

Semvec — closed-form ingest, deterministic retrieval:

from semvec import SemvecState, SemvecConfig

state = SemvecState(config=SemvecConfig(dimension=768))

for text, embedding in conversation:
    state.update(embedding, text)            # zero LLM calls

top = state.memory.get_relevant_memories(
    embed("what did we decide about auth?"),
    top_k=3,
)

Letta — LLM-managed memory tools, swap in / out of context:

# pseudo-code shape; see Letta docs for the live API
from letta_client import Letta

client = Letta(token="...")
agent = client.agents.create(memory_blocks=[...])

# Letta's LLM decides to call core_memory_append, archival_memory_insert, ... per turn
client.agents.messages.create(agent_id=agent.id, messages=[...])

The user-facing surface differs in spirit: Semvec hands you a compressed context as a string that you paste into any LLM call, whereas Letta hosts the agent loop and runs the LLM on your behalf.

When to pick which

Pick Semvec when:

  • per-turn input cost must be O(1) — fixed system-prompt budget,
  • ingest must be free of LLM cost and deterministic across replays,
  • exact numeric / fact preservation matters (regulated workloads, audit, financial values),
  • you want a memory layer you can compose with any LLM client, not a hosted agent runtime,
  • you need an append-only event store with deterministic replay and signed deletion certificates.

Pick Letta when:

  • you want an OSS-licensed, batteries-included agent runtime with memory built in,
  • LLM-driven adaptive memory paging is the right shape for your problem,
  • you're happy to run the LLM through the Letta server and don't need closed-form recall.

Sources

  • Letta (formerly MemGPT): https://github.com/letta-ai/letta
  • MemGPT paper (Packer et al., 2023): https://arxiv.org/abs/2310.08560