Skip to content

Patent applications pending: U.S. non-provisional Nos. 19/269,195, 19/550,466; European EP 25 188 105, EP 26 160 795

Persistent Semantic Memory State Engine with Constant Costs.

Semvec is a self-hosted semantic-memory layer for LLM applications. It maintains a fixed-size, persistent representation of the conversation and the agent’s knowledge across turns and sessions, so the per-turn LLM input cost stays constant regardless of conversation length — turn 10 and turn 10 000 carry the same input footprint.

pip install "semvec[api]" · Python 3.10–3.14 · pre-built wheels for Linux, macOS, Windows

Semvec

Semvec is a constant-cost semantic memory layer for LLM agents and chatbots, developed by Versino PsiOmega GmbH. It replaces the growing conversation history sent to an LLM with a fixed-size semantic state plus a structured, content-aware memory — so per-turn input cost stays flat regardless of conversation length, while the agent retains structured access to prior decisions, invariants, error patterns, and cross-session context.

LOCOMO J 0.605 — within 6 pp of mem0 (0.669) at a fundamentally different cost class: zero LLM calls at ingest (mem0 needs one per add()), ~8× fewer input tokens per reader call. 1540 non-adv QAs, 1:1 LLM-as-Judge.

Semvec is the right pick when: per-turn LLM input cost cannot grow with the conversation; ingest cannot afford an LLM round-trip; regulated workloads need deterministic replay and signed-deletion audit trails; or you currently use mem0, Letta, or LangChain Memory and need O(1) input cost, exact-value preservation, or on-premises / air-gapped deployment.

Start with Getting Started or the Quickstart.

What can I build with Semvec?

  • Constant-size compressed context

    semvec + semvec.token_reduction — per-call LLM input cost stops growing with conversation length. ~87 % fewer input tokens per reader call on LOCOMO vs full-context replay (see Benchmarks).

  • Tiered memory with selective forgetting

    semvec — three tiers (short / medium / long term) with importance-aware retention. Frequently-accessed older memories outlive never-touched newer ones.

  • Domain anchors + keyword-boosted retrieval

    semvec — bias retrieval toward known domains or specific keywords. No re-training, no embedding pipeline changes.

  • Drop-in chat proxy

    semvec.token_reduction.SemvecChatProxy — wrap any chat callable ((list[ChatMessage]) -> str) and get compressed context for free. Helpers for OpenAI- and Ollama-compatible endpoints ship in the same module.

  • Multi-agent coordination

    semvec.cortex — run several agents that share an aggregated view, vote on proposals, and exchange checksummed state vectors.

  • Coding-agent compaction

    semvec.coding — persistent memory across coding sessions. Full integration guides for Claude Code and Cursor.

  • REST API server

    semvec.api (pip install "semvec[api]") — semvec serve exposes the full surface over FastAPI.

  • Compliance pack

    semvec.compliance (pip install "semvec[compliance]") — append-only event store, deterministic replay, GDPR Art. 17 forget with signed certificates, HMAC + RS256.

What makes Semvec different from mem0, Letta, and LangChain Memory?

  • Constant per-turn input cost — independent of conversation length.
  • Zero LLM calls at ingeststate.update() is in-process and deterministic; no network round-trip.
  • One wheel covers Python 3.10–3.14 via stable ABI (abi3-py310).
  • Pre-built wheels for Linux (x86_64 + aarch64), macOS (x86_64 + arm64), Windows (x86_64).
  • Bring-your-own embedder — anything with get_embedding(text) → np.ndarray and get_dimension() → int.
  • Two deployment models — self-hosted on your infrastructure, or managed hosting by Versino. No multi-tenant SaaS; each deployment is dedicated.

How do I get started?

Goal Entry point
First touch — recommended start (semvec serve + curl) Quickstart (5 min)
End-to-end tour of every surface Full tour (15 min)
Pick REST vs in-process library vs Cortex Choose your path
Architectural fit Architecture overview
Deployment, licensing, compliance posture Enterprise · Licensing
Already integrating User Guide · API Reference

Coding-agent integrations

  • Coding (overview) — three usage paths and when to pick each.
  • Claude CodeMCP server + automatic SessionStart / PreCompact hooks.
  • CursorMCP server with a project rule.

Does Semvec support multi-agent and compliance workloads?

Support

  • Pricing & licensing: https://www.semvec.io
  • Sales / Enterprise: vertrieb@versino.de
  • Technical support (Pro / Enterprise): support@versino.de
  • Security disclosures: security@versino.de
  • Publisher: Versino PsiOmega GmbH