Patent applications pending: U.S. non-provisional Nos. 19/269,195, 19/550,466; European EP 25 188 105, EP 26 160 795
Persistent Semantic Memory State Engine with Constant Costs.¶
Semvec is a self-hosted semantic-memory layer for LLM applications. It maintains a fixed-size, persistent representation of the conversation and the agent’s knowledge across turns and sessions, so the per-turn LLM input cost stays constant regardless of conversation length — turn 10 and turn 10 000 carry the same input footprint.

Semvec is a constant-cost semantic memory layer for LLM agents and chatbots, developed by Versino PsiOmega GmbH. It replaces the growing conversation history sent to an LLM with a fixed-size semantic state plus a structured, content-aware memory — so per-turn input cost stays flat regardless of conversation length, while the agent retains structured access to prior decisions, invariants, error patterns, and cross-session context.
LOCOMO J 0.605 — within 6 pp of mem0 (0.669) at a fundamentally different cost class: zero LLM calls at ingest (mem0 needs one per add()), ~8× fewer input tokens per reader call. 1540 non-adv QAs, 1:1 LLM-as-Judge.
Semvec is the right pick when: per-turn LLM input cost cannot grow with the conversation; ingest cannot afford an LLM round-trip; regulated workloads need deterministic replay and signed-deletion audit trails; or you currently use mem0, Letta, or LangChain Memory and need O(1) input cost, exact-value preservation, or on-premises / air-gapped deployment.
Start with Getting Started or the Quickstart.
What can I build with Semvec?¶
-
Constant-size compressed context
semvec+semvec.token_reduction— per-call LLM input cost stops growing with conversation length. ~87 % fewer input tokens per reader call on LOCOMO vs full-context replay (see Benchmarks). -
Tiered memory with selective forgetting
semvec— three tiers (short / medium / long term) with importance-aware retention. Frequently-accessed older memories outlive never-touched newer ones. -
Domain anchors + keyword-boosted retrieval
semvec— bias retrieval toward known domains or specific keywords. No re-training, no embedding pipeline changes. -
Drop-in chat proxy
semvec.token_reduction.SemvecChatProxy— wrap any chat callable ((list[ChatMessage]) -> str) and get compressed context for free. Helpers for OpenAI- and Ollama-compatible endpoints ship in the same module. -
Multi-agent coordination
semvec.cortex— run several agents that share an aggregated view, vote on proposals, and exchange checksummed state vectors. -
Coding-agent compaction
semvec.coding— persistent memory across coding sessions. Full integration guides for Claude Code and Cursor. -
REST API server
semvec.api(pip install "semvec[api]") —semvec serveexposes the full surface over FastAPI. -
Compliance pack
semvec.compliance(pip install "semvec[compliance]") — append-only event store, deterministic replay, GDPR Art. 17 forget with signed certificates, HMAC + RS256.
What makes Semvec different from mem0, Letta, and LangChain Memory?¶
- Constant per-turn input cost — independent of conversation length.
- Zero LLM calls at ingest —
state.update()is in-process and deterministic; no network round-trip. - One wheel covers Python 3.10–3.14 via stable ABI (
abi3-py310). - Pre-built wheels for Linux (x86_64 + aarch64), macOS (x86_64 + arm64), Windows (x86_64).
- Bring-your-own embedder — anything with
get_embedding(text) → np.ndarrayandget_dimension() → int. - Two deployment models — self-hosted on your infrastructure, or managed hosting by Versino. No multi-tenant SaaS; each deployment is dedicated.
How do I get started?¶
| Goal | Entry point |
|---|---|
First touch — recommended start (semvec serve + curl) |
Quickstart (5 min) |
| End-to-end tour of every surface | Full tour (15 min) |
| Pick REST vs in-process library vs Cortex | Choose your path |
| Architectural fit | Architecture overview |
| Deployment, licensing, compliance posture | Enterprise · Licensing |
| Already integrating | User Guide · API Reference |
Coding-agent integrations¶
- Coding (overview) — three usage paths and when to pick each.
- Claude Code — MCP server + automatic
SessionStart/PreCompacthooks. - Cursor — MCP server with a project rule.
Does Semvec support multi-agent and compliance workloads?¶
- Cortex (overview) — in-process vs service vs REST.
- Cortex over REST API — clusters, regions, observers.
- Compliance Pack — event store, retention, GDPR forget, signed deletion certificates.
Support¶
- Pricing & licensing: https://www.semvec.io
- Sales / Enterprise:
vertrieb@versino.de - Technical support (Pro / Enterprise):
support@versino.de - Security disclosures:
security@versino.de - Publisher: Versino PsiOmega GmbH