Skip to content

Semvec vs. mem0

mem0 is the closest commercial peer for "agent memory" — both projects sit between an LLM and a vector store and manage long-running conversation context. The bench they both report against is LOCOMO (Maharana et al., 2024).

Architectural differences

Property Semvec mem0
Ingest LLM calls per turn 0 — in-process deterministic update (no LLM) LLM-driven fact extraction (one call per add() to extract atomic facts)
Storage form Raw turns + cosine embedding + per-session BM25 index Atomic facts extracted by an LLM, stored verbatim
Retrieval Dense cosine + BM25 hybrid fusion, then cross-encoder rerank Dense + sparse fusion over the fact-store
Token-cost behaviour Constant per turn (no LLM ingest, no growing summary) Linear in conversation length × extracted-fact density
Determinism Deterministic update — bit-identical replay possible within a release (no LLM stochasticity at ingest) Each add() is an LLM call, repro requires temperature=0
Default LLM dependency None for ingest — LLM only for answering Required end-to-end

Head-to-head — LOCOMO (10 conversations, 1986 QAs)

LOCOMO is the standard suite both projects publish against. Same dataset version (snap-research/locomo v1), same reader + judge model (gpt-4o-mini, T = 0), same judge prompt (byte-identical to mem0ai/mem0/evaluation/metrics/llm_judge.py).

LLM-as-Judge accuracy (Cat 1-4, mem0 headline metric)

Category n Semvec Mem0 paper
single-hop 282 0.582 0.671
multi-hop 321 0.502 0.512
temporal 96 0.469 0.555
open-domain 841 0.667 0.729
OVERALL J 1540 0.605 0.669

Stemmed-F1 (all 5 cats, official LOCOMO scoring)

Category n Semvec
single-hop 282 0.366
multi-hop 321 0.430
temporal 96 0.264
open-domain 841 0.497
adversarial 446 0.352
OVERALL F1 1986 0.424

Cost asymmetry — measured live in head-to-head. Mem0's J-edge comes structurally from its fact-extraction pipeline: an extra LLM pass on every add() that condenses raw turns into atomic facts before storage. Semvec runs zero LLM calls at ingest: every turn lands in the embedding store via pure cosine math. Of the ten LOCOMO contenders Semvec is the only dedicated memory system in that cost class — the others (LangMem, Zep, A-Mem, MemoryBank, Letta/MemGPT, plus mem0 / mem0-graph) all run one or more generative LLM passes per stored turn.

Stage Semvec mem0 (real, head-to-head)
Replay 675 turns (LOCOMO conv-44 ingest) ~3 min ~24.5 min (~8× slower)
QA pass — 158 questions ~2 min ~3.5 min
End-to-end ~5 min ~28 min (~5.5× slower)
LLM calls per turn at ingest 0 1+

Extrapolated to the full 1986-QA suite: Semvec finishes in ~95 minutes, mem0 would take ~6–8 hours.

Token efficiency

Measured live on a LOCOMO replay (mean across 10 queries on conv-44 with 100 turns seeded). Semvec's context block typically uses ~8.3k chars / ~2k tokens — well below its 20k-char budget ceiling because top-K=30 reranked memory chunks rarely fill it.

Setup Context tokens / reader call
Full-context replay (avg LOCOMO conv, 544 turns) ~16,300
Full-context replay (large conv, 689 turns) ~20,700
Mem0 (typical) ~2,000–5,000
Semvec (measured) ~2,000

Both memory systems target the same context-budget problem; Semvec is leaner at ingest (no LLM round-trips), competitive at retrieve, and substantially leaner than full-context replay.

Reproduce

Install both stacks:

pip install "semvec[benchmarks,hybrid,api,mem0]"

Run the LOCOMO bench against Semvec with the same judge mem0 uses:

.venv/bin/python -u benchmarks/run_locomo.py --conv-id -1 --judge \
  --judge-model openai/gpt-4o-mini \
  -o "benchmarks/results/locomo_FULL_$(date +%Y%m%d_%H%M%S).json"

The mem0 SDK is installed via the [mem0] extra so you can wire it up as a side-by-side baseline in your own harness if needed.

When to pick which

Pick Semvec when:

  • You can't afford an LLM call on every ingest turn (cost or latency).
  • You need deterministic, replayable memory state (audit, compliance).
  • You want adversarial-question discipline (Cat 5 F1 = 0.78 out of the box).
  • You're embedding into an existing Python stack (Rust core + thin Python API, no managed service required).

Pick mem0 when:

  • You want a managed cloud service end-to-end.
  • Fact-extraction granularity at ingest is more important than per-turn cost.
  • You're already on the mem0 stack and the ingest LLM cost is in budget.

Sources

  • LOCOMO (Maharana et al., 2024): https://snap-research.github.io/locomo/
  • Mem0 paper: https://arxiv.org/html/2504.19413v1
  • Reproducing the Semvec number: see Running benchmarks for the exact env-var config and gpt-4o-mini reader + judge settings.