Skip to content

FAQ

What is Semvec, in one sentence?

A constant-cost semantic memory layer for LLM agents: a fixed-size 384-dimensional state vector plus a tiered, content-aware memory, so per-turn LLM input cost stays flat regardless of conversation length.

Is this RAG?

Not in the usual sense. RAG retrieves documents at query time. Semvec compresses the conversation itself into a fixed-size state. They compose well — many users run Semvec for conversational signal and a vector DB for document retrieval.

When should I pick Semvec over mem0?

Pick Semvec when:

  • you want per-turn input cost to be O(1) — a fixed 150–350-token block — instead of growing with the number of retrieved records,
  • you cannot afford LLM calls at ingest time (mem0 issues roughly 50 LLM fact-extraction calls per turn; Semvec uses zero),
  • you need numeric / IBAN / amount / date values to round-trip with Decimal precision,
  • you need an append-only event store with deterministic replay and signed deletion certificates.

Pick mem0 when you want an OSS-licensed turnkey memory layer with an established Python / TypeScript API and you are comfortable with LLM-driven fact extraction.

We measured Semvec vs. mem0 head-to-head on LongMemEval-S (gpt-oss-120b on H100, T = 0.0): 42.8 % vs. 36.2 % accuracy (+6.6 pp, McNemar p = 0.020), 17× shorter wall-clock. Strongest deltas on single-session-assistant (+34 pp) and temporal-reasoning (+10.6 pp). See vs mem0 for details.

When should I pick Semvec over Letta (MemGPT)?

Letta implements OS-style memory paging — an LLM decides what to keep in-context vs. swap to an external archival store. Semvec recall is a closed-form function of the state vector and the literal cache, so it is deterministic and reproducible bit-for-bit across replays.

Pick Semvec when you need deterministic recall, exact-value preservation, or constant-cost prompt budgeting. Pick Letta when you want LLM-driven adaptive paging out of the box. We have not run a head-to-head benchmark against Letta. See vs Letta for the full architectural comparison.

When should I pick Semvec over LangChain Memory?

LangChain ships several Memory classes (buffer, summary, vector retriever) — each with different ingest and retrieval semantics and different per-turn footprint. Semvec is O(1) by construction across all use cases and ships a verbatim numeric / fact cache that LangChain does not address at the memory-layer level. Pick LangChain Memory when you want the broader LCEL / chain ecosystem and are happy to compose memory yourself. See vs LangChain Memory.

Does the state ever grow?

No. The state vector itself is fixed-size (384 d by default, configurable). The associated memory tiers are bounded by configured capacities — when full, the lowest-scoring entry is evicted (not the oldest). Configure with SemvecConfig(short_term_size=…, medium_term_size=…, long_term_size=…).

Can I run it offline / air-gapped?

Yes for Community tier. Pro / Enterprise tiers verify Ed25519 JWT signatures locally — no network call to a license server at runtime. Contact support@versino.de for offline-issued JWTs with custom TTLs. The optional one-time anonymous init ping is opt-out via SEMVEC_TELEMETRY=0.

How fast is it?

Per-turn update() is sub-millisecond on a recent x86_64 CPU at dimension 384 — dominated by NumPy / Rust matrix ops, not Python overhead. The whole point of the Rust port was to keep the math out of the GIL. End-to-end including embedding cost is dominated by the embedder, not Semvec itself.

Is the source available?

Compiled wheels are public on PyPI; the Rust source is held closed. Source access is part of Enterprise terms — contact support@versino.de.

What's the patent situation?

EP 25 188 105.8 — patent-pending, novelty acknowledged. The proprietary license you accept when running Semvec covers the patent for licensed use. Self-implementing the same architecture is outside the scope of this documentation.

GPU support?

Embedders run on whatever device you configure (cuda, mps, cpu); the Semvec core itself is CPU-only — the math is small enough that GPU offload would lose more in transfer than it gains.

Which embedder should I use?

For English-only, tight-domain prototypes: all-MiniLM-L6-v2 (384 d). Fast, small, OK on narrow domains.

For German, multilingual, or mixed-domain content: paraphrase-multilingual-mpnet-base-v2 (768 d) or larger. The 384-d default confuses generic terms (e.g. "filter" → coffee filter vs. data filter) on multilingual or mixed-domain text. We measure precision@3 jumping from 66.67 % (MiniLM 384 d) to 86.11 % (mpnet 768 d) on 80 mixed-domain German notes.

See the embedders guide for ready-made wrappers and the concepts page for the protocol.

Can I use my own embedder?

Yes. Any object exposing

embedder.get_embedding(text: str) -> np.ndarray   # shape (dimension,), preferably L2-normalised
embedder.get_dimension() -> int                   # must match SemvecConfig.dimension

works — SentenceTransformers, OpenAI, ONNX int8, or anything you write. Semvec refuses silent hash-based fallbacks: methods that need an embedder raise a descriptive RuntimeError if you do not pass one.

Does Semvec call any LLM during ingest?

No. state.update(embedding, text) is pure mathematical EMA over the embedding plus deterministic memory bookkeeping. There is no internal LLM call at any point in the ingest pipeline. This is a deliberate architectural choice — it makes ingest deterministic, free of external API cost, and fully reproducible bit-for-bit across replays.

How does the licensing system work?

Three tiers: Community (no key, 5 QPS sustained / 50 burst, base retrieval), Pro (200 / 2000 QPS, extended retrieval), Enterprise (unthrottled, all features). Pro and Enterprise require a signed Ed25519 JWT in SEMVEC_LICENSE_KEY. JWTs have a 30-day TTL. Expiry raises LicenseExpiredError; rate-limit exhaustion raises RateLimitError with a retry_after (datetime.timedelta) and the upgrade URL. See the licensing guide.

Is Semvec GDPR / DSGVO compliant?

The compliance pack ships everything you need: append-only event store, retention sweeper, GDPR Art. 17 forget endpoint (POST /v1/users/{id}/forget) returning a cryptographically signed deletion certificate (RSA-PSS-SHA256 or Ed25519, auto-detected). Per-user embedding encryption is opt-in (SqliteEventStore(encryption_seed=…), AES-GCM with HKDF-derived per-user keys). See the Compliance guide.

Where does my data go? Does Semvec phone home?

Semvec is self-hosted by license. Embeddings, conversation data, and derived memory structures stay on your infrastructure and are never transmitted to Versino. The only outbound traffic is the optional one-time anonymous init ping per process — opt-out with SEMVEC_TELEMETRY=0. Full schema and retention at https://www.semvec.io/privacy .

Can I use Semvec inside Claude Code or Cursor?

Yes. The [coding] extra ships a FastMCP server (python -m semvec.coding.mcp_server) and two lifecycle hooks (PreCompact, SessionStart). Wire-up examples are in the Cursor guide and the README's coding-agent compaction section.

How big can a Semvec snapshot get?

On mpnet 768 d, roughly 8.8 kB per memory in JSON / to_bytes(compress=False) and 3.7 kB per memory in to_bytes(compress=True). So 10 000 memories ≈ 88 MB raw / 37 MB compressed. For

100 k memories, wrap your own NPZ / Parquet around the embedding payload.