Semvec vs. Letta (MemGPT)¶

Letta (originally MemGPT) is an OSS framework that implements OS-style memory paging for LLM agents. This page compares the two on architecture only — we have not run a head-to-head benchmark against Letta. The differences below are structural, derived from each project's public documentation.

Architectural differences¶

Property	Semvec	Letta
Memory model	Fixed-size 384-d state vector + 3 bounded memory tiers + verbatim literal cache	OS-style paging: in-context "core memory" blocks + external "archival memory"
Per-turn input footprint	Constant — compressed state (~150–350 tokens)	Sized by the in-context blocks Letta keeps loaded
What decides what's in context	Closed-form retrieval (cosine + tier weights + anchor / trigger boosts)	An LLM that calls memory-management tools to swap blocks in / out
Determinism on replay	Bit-exact across replays	Probabilistic (depends on the memory-management LLM)
Numeric / exact-value safety	Verbatim cache with `Decimal` precision (IBANs, amounts, IDs, dates)	Stored as text in core / archival blocks; no native exact-value layer
Multi-agent coordination	Built-in (Cortex: aggregations + 5-level consensus)	Multi-agent supported; coordination is per-agent
Licensing	Proprietary (commercial)	Apache-2.0
Self-hosted	Yes (proprietary license, on-prem)	Yes (OSS)

The fundamental split: Semvec is math-driven; Letta is LLM-driven. Semvec compresses every turn through a closed-form EMA update and retrieves through deterministic cosine similarity. Letta delegates memory management to an LLM that explicitly calls tools like core_memory_append or archival_memory_search.

That choice has consequences:

Reproducibility. Semvec replays bit-for-bit across runs; Letta's behaviour depends on whatever the memory-management LLM decides this time.
Cost predictability. Semvec ingest is free of LLM tokens; Letta's per-turn cost includes whatever memory-management calls the LLM emits.
Audit trail. Semvec's append-only event store deterministically reconstructs the memory state at any point in time. Letta's archival operations are recorded but the in-context selection remains an LLM choice.

Code shape comparison¶

Semvec — closed-form ingest, deterministic retrieval:

from semvec import SemvecState, SemvecConfig

state = SemvecState(config=SemvecConfig(dimension=768))

for text, embedding in conversation:
    state.update(embedding, text)            # zero LLM calls

top = state.memory.get_relevant_memories(
    embed("what did we decide about auth?"),
    top_k=3,
)

Letta — LLM-managed memory tools, swap in / out of context:

# pseudo-code shape; see Letta docs for the live API
from letta_client import Letta

client = Letta(token="...")
agent = client.agents.create(memory_blocks=[...])

# Letta's LLM decides to call core_memory_append, archival_memory_insert, ... per turn
client.agents.messages.create(agent_id=agent.id, messages=[...])

The user-facing surface differs in spirit: Semvec hands you a compressed context as a string that you paste into any LLM call, whereas Letta hosts the agent loop and runs the LLM on your behalf.

When to pick which¶

Pick Semvec when:

per-turn input cost must be O(1) — fixed system-prompt budget,
ingest must be free of LLM cost and deterministic across replays,
exact numeric / fact preservation matters (regulated workloads, audit, financial values),
you want a memory layer you can compose with any LLM client, not a hosted agent runtime,
you need an append-only event store with deterministic replay and signed deletion certificates.

Pick Letta when:

you want an OSS-licensed, batteries-included agent runtime with memory built in,
LLM-driven adaptive memory paging is the right shape for your problem,
you're happy to run the LLM through the Letta server and don't need closed-form recall.

Sources¶

Letta (formerly MemGPT): https://github.com/letta-ai/letta
MemGPT paper (Packer et al., 2023): https://arxiv.org/abs/2310.08560