Skip to content

Changelog

Public release history for semvec on PyPI from 0.3.7 onwards. Earlier releases were development iterations and are not part of the public history.

This changelog highlights what changes for users of the library — added APIs, behaviour changes, fixes that affect existing call sites. Internal refactors, audit-cycle iterations, and test-suite-only changes are intentionally omitted; for the full engineering history, see the project's git log.

The format follows Keep a Changelog; versions follow Semantic Versioning and PEP 440.


[0.7.2] — 2026-06-18

Reliability fix: the sidecar embedder client now reconnects automatically after an idle connection is dropped. Previously, when the embedder daemon closed an idle Unix-socket connection — for example a worker left idle through a long client think-time — the next embedding request could hang until its timeout, visible under load as one failed request per virtual user after an idle period. The client now detects the dead connection and reconnects transparently, and retries a request once if its connection drops mid-flight.

Also fixes the ONNX embedder daemon for multilingual models: token_type_ids is now sent only to models that declare it, so XLM-RoBERTa-style embedders such as paraphrase-multilingual-mpnet-base-v2 (which expose only input_ids and attention_mask) run without the Invalid input name: token_type_ids error. No API or behaviour changes for existing code.

[0.7.1] — 2026-06-16

Durable memory, and cross-frontend dedup that survives a restart. Every 0.7.0 call site keeps working — the new behaviour is opt-in.

Added

  • Durable per-session state persistence (SEMVEC_STATE_PERSIST=1, default off). With a Postgres DATABASE_URL, each session's semantic state survives a worker restart: state is written-behind (flushed on a periodic tick and on SIGTERM) and lazily reloaded on first access. A graceful shutdown resumes bit-exact; a hard crash loses at most one flush interval. Assumes one owner worker per session (use sticky routing). See State persistence & durability.
  • Cross-frontend dedup over the Cortex cluster shared session. Every frontend on a cluster writes into one backing session, so a dedup_signal ({is_update, max_sim, matched_id}) now flags overlaps across frontends. Available on POST /v1/cluster/{id}/run (with a response) and POST /v1/cluster/{id}/store, and in-process via a shared SemvecSession (store_qa / run_sync). A per-call dedup_threshold override is accepted on the store path (REST and the in-process store_qa / update_state, plus *_async). Create a cluster with {"drift_exempt": true} to pin its shared session against regional realignment (a pinned cluster cannot be added to a region). See Cross-frontend dedup.
  • Optional Postgres sharding (SEMVEC_STATE_DB_SHARDS, advanced scaling). Comma-separated DSNs spread the state-blob table across backends, keyed by session_id via rendezvous (HRW) hashing (adding a shard moves only ~1/N keys); metadata tables stay on the primary.
  • Compressed state blobs. Persisted state and to_bytes() checkpoints are now compact binary, ~56% smaller than the legacy JSON encoding. Backward compatible: from_bytes() still reads legacy uncompressed blobs, so upgrades need no migration.

Changed

  • matched_id is now durable. The dedup_signal.matched_id identifier is preserved across to_dict / from_dict and a snapshot reload, instead of being regenerated on each load. A frontend can store a matched_id and correlate against it after a restart.

Fixed

  • dedup_threshold validation tightened to the cosine range [-1, 1] on every path that accepts it (update, preview_dedup, the store/run REST endpoints, and the in-process session methods). Out-of-range values are rejected up front rather than silently producing a meaningless decision.

[0.7.0] — 2026-06-04

Use Semvec as an in-process library — no server required.

Added

  • SemvecSession — a library facade for one conversation. Construct it with your own embedder and drive the full per-turn loop in-process:
from semvec import SemvecSession, SemvecState, SemvecConfig

session = SemvecSession(SemvecState(config=SemvecConfig(dimension=384)),
                        my_embedder, SemvecConfig(dimension=384))
result = session.run_sync("How does the deploy pipeline work?")
print(result.context, result.drift_phase)

run() (async) / run_sync() perform the same embed → retrieve → short-circuit → drift → context-block → update loop the REST /v1/run endpoint does, returning a TurnResult. Lower-level methods (store_qa, compute_drift, context_block, triggers/anchors/isolation, literal-cache, export_state/import_state) are available too. Bring your own embedder; cross-encoder reranking is injectable. - Pass SemvecSession(..., enable_bm25=True) to turn on per-session BM25-hybrid retrieval in-process — no environment variable needed. - verify_license_token is now importable from the top-level semvec package for offline license checks; SidecarEmbedderClient and ComplianceState are now exported from semvec.embedder / semvec.compliance. - semvec.cortex.ops exposes dependency-light helpers for multi-agent coordination (vector coupling, semantic-delta transfer, consensus-trust EMA) that previously lived only behind the REST cluster/network layers.

See the new In-Process Library guide for a full walkthrough.

Changed

  • /v1/run produces identical results (the endpoint now shares its implementation with SemvecSession).
  • POST /v1/store now returns 422 (not a misleading 404) when an existing session's response can't be stored (zero-norm embedding); a missing session is still 404.
  • The cross-session aggregation flag is renamed to use_cortex. The REST API still accepts the legacy use_meta_pss alias, so existing clients keep working unchanged.

Fixed

  • Robustness / observability hardening. A triaged audit replaced previously-silent failure paths with explicit logging and correct error propagation across the REST, session, Cortex, compliance and embedder layers. For callers this means: failures surface (as logs, a 422, or a clean exception) instead of an anonymous 500, a misleading 404, or a silently-dropped result. TurnResult / RunResponse gain a retrieval_error flag so a memory-retrieval fault is distinguishable from an empty context, and the sidecar embedder no longer hangs callers when its worker faults.

[0.6.8] — 2026-06-02

Maintenance release — the remaining hardening items from the 0.6.7 audit. No API or behaviour changes for typical callers.

Fixed

  • Python 3.12 fix in the sidecar embedder client. A loop-acquisition path used the deprecated asyncio.get_event_loop(), which raises on 3.12 when no loop is running; it now uses get_running_loop() with a fallback.
  • License-cache scaling. Under multi-tenant load, one token expiring no longer wipes every tenant's cached verification — only the expired token is evicted, avoiding a burst of redundant signature re-verifies.
  • Malformed update results raise a clean error. A missing key in an update result now surfaces as a KeyError instead of an interpreter-level panic crossing the Python/Rust boundary.

Changed

  • get_client_ip always returns a string ("unknown" when the peer is unavailable) rather than None.

[0.6.7] — 2026-06-02

Hardening release. No breaking changes — every public API and return value is behaviour-compatible with 0.6.6 (verified by a release-readiness audit; the full Rust and Python test suites pass). The release closes a class of latent panics that could abort the host interpreter, tightens the REST surface, and adds input validation that surfaces as 422 instead of silent acceptance.

Security

  • Metrics Basic Auth is now constant-time. Credential comparison uses secrets.compare_digest on both fields, closing a timing side-channel.
  • Input validation on POST /v1/session/create. dimension is bounded to 1–16384 (previously unbounded — an oversized value could exhaust memory); out-of-range values now return 422.
  • session_id request fields are constrained to ^[A-Za-z0-9_-]+$ (max 128 chars), rejecting control characters that could be injected into log lines. The pattern matches the UUIDs the server already mints, so no legitimate id is affected.
  • New SEMVEC_TRUSTED_PROXIES env var (comma-separated IPs/CIDRs): X-Forwarded-For / X-Real-IP are honoured only when the direct peer is a configured proxy. Forward-looking — the client-IP helper is not yet consumed by request handling.

Fixed

  • FFI-boundary panics removed. Several unwrap() / expect() paths reachable from the Python bindings could abort the interpreter under panic = "abort"; they now raise Python exceptions or degrade gracefully (topic-switch detection, pattern matching, tier selection, and the dev-key / getrandom paths).
  • No more silently-swallowed errors on /v1/run and import_state — retrieval and state-deserialization failures are now logged instead of returning an empty/false result with no diagnostic trail.
  • cargo run --bin gen-dev-keys works again — the binary source had been truncated, which also broke cargo check without --lib.

Changed

  • POST /v1/dedup-check is now async internally (it awaits the embedder), so a slow sidecar embedder no longer ties up a worker thread under load. The HTTP request and response shapes are unchanged.

[0.6.6] — 2026-05-20

Performance release. Closes a real-world pathology in long-term consolidation: at the heuristic k = n/3, the long-term tier-consolidation initialiser scaled quadratically in k and dominated total cost. On a 200 000-record state the consolidation step took ~52 minutes; this release drops it into the single-digit-minute range.

No behaviour change for any API or return value — the math of the initialiser is bit-identical to 0.6.5.

Performance

  • Tier-consolidation init: previously O(n²) in k; now O(n) per pick via a persistent min-distance cache. Instead of rescanning every prior centroid on every new pick, the algorithm keeps a persistent min-distance cache and updates it in place after each pick. Measured speedup (default workload n=10000, k=3333):

    workload 0.6.5 0.6.6 speedup
    n=2000, k=666 7.4 s 25 ms ~295×
    n=10000, k=3333 >17 min/run 633 ms >1500×

Mathematically identical (verified by a regression test that pins the exact pick sequence from the 0.6.5 implementation). The next consolidation bottleneck is profiled in a follow-up release.


[0.6.5] — 2026-05-20

Setup-quality release. Closes a foot-gun in the Claude Code / Cursor MCP integration: bare pip install "semvec[coding]" did not pull in sentence-transformers, so the MCP server raised RuntimeError: sentence-transformers is required the first time it tried to embed anything. The [coding] extra now bundles both packages, so the MCP server is runnable in one install step.

No behaviour change for any existing API or SemvecState call site.

Fixed

  • pip install "semvec[coding]" now installs everything needed for the MCP server. Previously the extra declared only fastmcp>=2.0; sentence-transformers had to be installed separately. The extra now bundles fastmcp>=2.0 plus sentence-transformers>=3.0.
  • MCP-server ImportError message now points users at pip install "semvec[coding]" instead of the bare sentence-transformers install.

Docs

  • uv run launch alternative documented for Claude Code and Cursor MCP configs — lets uv resolve the project interpreter on the fly. Cross-platform, no hard-coded venv paths, no escaping of Windows backslashes.
  • claude mcp add CLI shortcut documented as an alternative to hand-editing .claude/settings.json.
  • Startup timeout guide for WSL2 / slow filesystems. Documents the two env vars Claude Code honours:

    Env var Default Covers
    MCP_TIMEOUT 30 s initial connect + tool calls
    MCP_CONNECT_TIMEOUT_MS 5 s /mcp reconnect

    Both must be exported in the parent shell — the env block inside mcpServers.semvec does not reach Claude Code itself. The /mcp reconnect -32001 (Request Timeout) failure mode is now correctly traced to MCP_CONNECT_TIMEOUT_MS (instead of being presented as an unfixable quirk). - WSL2 performance note: moving the project from /mnt/c/... to a native Linux filesystem (~/dev/...) drops MCP-server startup time below 5 seconds and removes the need for both timeout overrides. - Troubleshooting expanded with five new symptoms (ModuleNotFoundError: fastmcp / sentence_transformers, connection timed out after 30000ms, Failed to reconnect ... -32001, silent Failed to connect,

    2 min startup on WSL2).


[0.6.4] — 2026-05-19

A read-only companion to the 0.6.3 dedup_signal: callers can now ask Semvec before they call update(). Strictly additive — every 0.6.x call site keeps working untouched.

Added

  • state.preview_dedup(embedding) — same {is_update, max_sim, matched_id} dict that an update() call would attach as dedup_signal, but without storing the candidate or mutating any state. The natural read-only counterpart for RAG / agent frontends that want to decide whether the incoming text is worth feeding into the downstream RAG index:

    sig = state.preview_dedup(embedding)
    if not sig["is_update"]:
        state.update(embedding, text)
    

    Same per-call dedup_threshold= override as update(). Does not consume the per-state rate-limit bucket — safe for high-frequency polling. Returned dict shape is bit-identical to the dedup_signal sub-dict from update(), so a caller's threshold choice transfers cleanly between the two paths.

  • REST POST /v1/dedup-check — HTTP surface for the same computation. Request: {session_id, text, dedup_threshold?}. Response: the DedupSignal payload that already shipped in 0.6.3. Authenticated like /v1/run. See the DedupSignal user-guide page.

Docs

  • The "What the signal does not do → No insert suppression" note in the DedupSignal guide is now followed by the preview_dedup() story — that is the right tool when you need to decide before storing.

[0.6.3] — 2026-05-19

A read-only similarity-hint on every update() call, plus stable per-memory identifiers. Strictly additive — every 0.6.x call site keeps working untouched.

Added

  • dedup_signal on SemvecState.update() — every update now returns an informational {is_update, max_sim, matched_id} block alongside the existing metrics. The signal lets RAG / agent frontends route an incoming memory as "update of an existing fact" vs. "genuinely new" without spending an LLM call. Storage stays append-only; the caller decides what to do with the hint. See the new DedupSignal user-guide page.

    res = state.update(embedding, text)
    sig = res["dedup_signal"]
    # {'is_update': True, 'max_sim': 0.91, 'matched_id': '019e3f65-...'}
    
  • MemoryUnit.id — every MemoryUnit carries a stable UUIDv7 that round-trips through to_dict() / from_dict(). Used as matched_id in dedup_signal; also useful for any application that needs a stable handle for a memory. Pre-0.6.3 snapshots that lack the field receive a fresh UUID on load (lossless migration).

  • SemvecConfig.dedup_update_threshold (default 0.85) — the cosine threshold above which dedup_signal.is_update flips to True. Tunable globally on the config, or per call:

    state.update(emb, text, dedup_threshold=0.95)
    
  • REST /v1/run: response gains an optional dedup_signal field with the same {is_update, max_sim, matched_id} shape. null when no state update happened on the call (e.g. a /run without a response field).

Docs


[0.6.1] — 2026-05-13

Documentation + API hygiene patch on top of 0.6.0. No behaviour change on the happy path; one bucket-exhaust failure mode now returns a clean HTTP 429 instead of a generic 500.

Fixed

  • REST API: RateLimitError (raised by the Rust core when the per-SemvecState token bucket is empty) now surfaces as HTTP 429 with a dynamic Retry-After header derived from exc.retry_after. Previously the exception leaked through as a generic 500, masking the bucket signal from clients that wanted to back off and retry. Pro / Enterprise license tiers bypass the bucket (since 0.3.7), so this change is only observable on Community / unlicensed callers.
  • Docs: concepts-glossary.md referenced state.create_resonance_trigger(...) in a copy-pasteable example; the only method that exists on SemvecState is add_resonance_trigger(...). Copy-pasted code would have raised AttributeError.
  • Docs: Patent-status notice in enterprise/index.md previously linked to the EPO Patent Register. The application is in the 18-month confidentiality period under Art. 93 EPC and is not yet publicly available — the link has been replaced with an explicit pre-publication disclaimer and a commitment to add the Register link once the application is published.
  • Docs: Glossary entries for cluster_fallback_threshold, drift_threshold, resonance-trigger absorption, and anchor/trigger composition no longer leak internal mechanism — interface and behaviour only, per the llms.txt disclosure policy.

Removed

  • REST API: Unused slowapi.Limiter scaffold from semvec.api.routes and semvec.api.app (instantiated Limiter, registered RateLimitExceeded exception handler, set app.state.limiter). No @limiter.limit decorator was ever attached to any route, so the scaffold was dead code. Rate limiting has always been enforced one layer down in the Rust core, not in the HTTP middleware stack. The slowapi dependency has been dropped from the [api] extra.

[0.6.0] — 2026-05-13

Sharpening release. Adds production-shaped knobs to the REST API (sidecar embedder, session lifecycle, hybrid-retrieval tuning), keeps every /v1/run default identical to 0.5.6, and lands measurable per-turn speed-ups on the hot path.

Added — Retrieval

  • BM25-hybrid retrieval in /v1/run. Per-session lexical (BM25) index fused with dense cosine via Reciprocal Rank Fusion before the cross-encoder rerank stage. Default off — opt in via SEMVEC_HYBRID_BM25=1. Empirical lift on LOCOMO 10-convo (1986 QAs, gpt-4o): +2.6 pp weighted F1 vs the dense-only baseline (0.469 → 0.495). Strongest single-category lift: multi-hop +5.3 pp. Pulls in bm25s + nltk via the new semvec[hybrid] extra.
  • Weighted RRF fusion via SEMVEC_RRF_WEIGHTS="1.0,0.4" (dense, BM25). Lets you bias the fusion when BM25 hurts single-fact precision on your domain. Unset = uniform 1.0. Companion knobs: SEMVEC_BM25_FETCH_K (default 50), SEMVEC_BM25_REBUILD_EVERY (default 64 ingests between snapshot rebuilds), SEMVEC_RRF_K (default 60).
  • Cross-encoder rerank stage behind BM25-hybrid. Env-tunable via SEMVEC_RERANK_MODEL=<hf-id> (e.g. cross-encoder/ms-marco-MiniLM-L-6-v2), SEMVEC_RERANK_FETCH_K (default 50 candidates fed into the cross-encoder), SEMVEC_RERANK_BATCH (default 64), SEMVEC_RERANK_FP16=1, SEMVEC_RERANK_THREADS. Off by default — set SEMVEC_RERANK_MODEL to activate.
  • Tunable retrieval at /v1/run. Four env knobs replace the previous wheel-baked defaults: SEMVEC_RUN_TOP_K (top-K passed to retrieval, default 5), SEMVEC_MMR_FETCH_K (MMR candidate pool; default 0 = MMR off), SEMVEC_MMR_LAMBDA (relevance-vs-diversity, default 0.5), SEMVEC_CONTEXT_BUDGET_CHARS (total-text budget across all selected memories, default 4 000). The per-memory legacy 150-char cap is gone — sum-of-text is now the constraint.

Added — REST API runtime

  • Sidecar embedder daemon. semvec serve --embedder-mode sidecar spawns a single embedder process and points every API worker at it over UDS (default) or TCP. Eliminates the per-worker model load on multi-worker deployments and shares one model copy across --workers N. The Python sidecar is the default. An optional Rust-native sidecar (SEMVEC_USE_RUST_EMBEDDER=1 / SEMVEC_EMBEDDER_BIN=<path>) is picked up automatically by the supervisor when the binary is present; see Embedders for the trade-off table.
  • semvec serve --embedder <URL> — point workers at an externally managed embedder daemon (e.g. on a dedicated host). Useful for GPU-pinning the embedder on one node and CPU-scaling the API on others. Coexists with --embedder-mode sidecar.
  • SessionManager lifecycle. Per-process session table now has an idle-TTL sweeper and a hard cap. SEMVEC_MAX_SESSIONS (default 10 000), SEMVEC_SESSION_IDLE_TTL_S (default 1 800 s), SEMVEC_SESSION_SWEEP_S (default 60 s). Eviction is LRU on idle time. In-memory only — sessions evicted by TTL or cap stop existing for that worker; persist via /v1/session/{id}/export if you need them back.
  • Graceful SIGTERM drain. SessionManager.shutdown() is wired to FastAPI's lifespan — on SIGTERM, in-flight requests complete, the embedder client closes cleanly, and the session table empties. Behind a load balancer this enables zero-error rolling restarts.
  • Embedder LRU cache + in-flight de-duplication. Off by default — set SEMVEC_EMBEDDER_CACHE_SIZE=<entries> (10 000 is a good starting point) to wrap the active embedder. Cache hits skip the model entirely; concurrent requests for the same text wait on one in-flight future instead of issuing duplicate model calls. ~2.9× throughput win on repeat-heavy chat traffic; no effect on cold workloads.
  • Per-request retrieval defaults are read at start-up. routes.py now reads SEMVEC_RUN_TOP_K / SEMVEC_MMR_* / SEMVEC_CONTEXT_BUDGET_CHARS / SEMVEC_RERANK_* / SEMVEC_BM25_* / SEMVEC_RRF_* / SEMVEC_HYBRID_BM25 once per worker. No per-call kwargs needed — set env, restart, done. See CLI reference for the full table.

Added — Benchmarks

  • benchmarks/run_locomo.py --judgeLLM-as-Judge re-evaluator that reuses the mem0 paper's judge prompt verbatim. Cross-paper numbers become apples-to-apples without a second run. benchmarks/run_locomo_judge.py remains as the dedicated entry point for re-judging an existing run.
  • No more openai SDK dependency for the judge. The OpenAI-compat adapter is now requests-backed, so the [benchmarks] extra alone is enough. Works against any OpenAI-compatible endpoint (vLLM, LiteLLM, OpenRouter, Ollama).
  • find_dotenv for the judge runner. .env lookup walks up from the runner's CWD, so the judge runs cleanly from worktrees and sub-directories — not just the repo root.

Performance

All 0.5.6 API surfaces unchanged; numbers below are end-to-end process-level wins, not micro-benchmarks.

  • /v1/run async-native rewrite: parallel embed of query and store-text, ASGI-middleware bypass for Depends(verify_license), LRU-cached Ed25519 verify (256 entries), CORSMiddleware skipped when no origins are configured, threadpool=200. End-to-end: +63 % cumulative throughput on a mixed /v1/run workload, +772 % on the QA-only flow vs 0.5.6.
  • Memory hot-path: single-pass safe_cosine_similarity (+91 % turn-rate) and additional inner-loop optimisations in the long-term consolidation path (+57 % turn-rate). Storage layout for retrieval matrices reworked to drop conversion overhead.
  • Prometheus high-cardinality leak fixed in REST request metrics (session IDs no longer leak into label keys).

Removed

  • All non-LOCOMO benchmark surfaces: LongBench, MT-Bench, longmemeval, scaling / load / k6 / cortex / consensus / coding runners and their datasets. The Python module semvec.benchmarks.longmemeval is gone. LOCOMO is now the single publication-grade bench shipped with the wheel.
  • semvec[longmemeval] extra (folded into semvec[benchmarks] which now only pulls sentence-transformers).
  • openai Python SDK as a [benchmarks] dependency.

Notes

  • No behaviour change at defaults. Every new knob ships off; an unmodified 0.5.6 caller sees the same /v1/run pipeline. Hybrid, abstain, boosters, sidecar, RRF-weights — all opt-in.
  • pip install "semvec[hybrid]" is required for BM25-hybrid; the base wheel does not ship BM25 dependencies.
  • LOCOMO drift envelope: gpt-4o via OpenRouter is non-deterministic even at temperature=0 (~40 % per-QA churn), aggregate drift ≤ ±0.5 pp. See parity envelope for the full picture.

[0.5.6] — 2026-05-05

Caller-controlled retrieval-text truncation. Adds the "top-1 ungutted, rest truncated" pattern to the direct-library path and removes the hardcoded 500-char cap from the REST API.

Added

  • SerializerConfig.full_first: bool = False in semvec.token_reduction. When set, SemvecStateSerializer returns the highest-ranked retrieved memory verbatim and continues to truncate the rest at max_memory_chars:
snippet — assumes `state` is a populated SemvecState
from semvec.token_reduction import SemvecStateSerializer, SerializerConfig

cfg = SerializerConfig(top_k=5, max_memory_chars=200, full_first=True)
context = SemvecStateSerializer(cfg).serialize(state, query_text="...")
# Entry 1: full text. Entries 2..5: capped at 200 chars.
  • max_text_chars query parameter on GET /v1/state/context (range 1–100 000, default 500). Replaces the previously hardcoded 500-char slice. Combine with full_first=true to keep the top hit verbatim regardless of the cap.

Changed

  • Retrieval-output truncation is no longer wheel-baked on either surface. Defaults match pre-0.5.6 behaviour, so no existing caller needs to change anything.

Compatibility

Strictly additive — full_first defaults to False, max_text_chars defaults to 500. Existing 0.5.x callers see no behaviour change.


[0.5.5] — 2026-05-04

Activates the per-tier QPS limits documented in README/PRIVACY since 0.2.0.

Changed

  • Per-state token bucket is now active as the primary rate-limit layer. Both update() and the three calculate_* methods draw from one bucket per SemvecState. The bucket implements the README-documented tier caps:
Tier Sustained Burst
Community (no key) 5 QPS 50
Pro 200 QPS 2000
Enterprise unlimited unlimited
  • Sliding-window probe-defence (100/s update, 30/s calculate_*) is now Community-only. Pro and Enterprise skip this layer because the bucket already covers their use cases at higher caps; the previous "Pro / Enterprise bypass everything" branch admitted unlimited QPS contrary to the documented 200/2000 Pro contract.
  • RateLimitError message now includes the tier, the QPS/burst contract, the retry-after delay in ms, and the upgrade URL.

Compatibility

Conversational chat, MCP servers, smoke-tests, and small pytest suites are unaffected — their typical rates are well below 5 QPS sustained and fit inside the 50 burst window. Workloads above 5 QPS sustained (heavy batch ingest, large benchmark replays) should:

  • use update_batch() (one Python call, one bucket-acquire per item but one cross-language hop),
  • shard across multiple SemvecState instances (each has its own bucket),
  • or upgrade to Pro / Enterprise.

The compliance event-replay path (EventReplayService) bypasses both layers — replay must not lock itself out re-folding its own log.

See Licensing guide for the full picture including a per-workload QPS table.


[0.5.4] — 2026-05-03

Marketing-wording correction + repo cleanup. Companion to 0.5.3 — no PyPI-wheel content changes.

Changed

  • "Novelty acknowledged" framing of the EPO Search Report removed in full from README, the documentation site hero, llms.txt, llms-full.txt, and the FAQ. The European Search Report is a prior-art search; it is neither a grant nor a standalone novelty determination. Substantive examination is the next step in the EPO process. The patent-pending statement remains, scoped to "application EP 25 188 105 filed at the European Patent Office".
  • The FAQ "What's the patent situation?" answer now explains the role of the European Search Report explicitly.

Removed (post-tag repo cleanup)

  • The Cloudflare Worker semvec-telemetry.versino.workers.dev was deleted along with its KV namespace and any logged records. The endpoint now returns HTTP 404 (Cloudflare error 1042 — no Worker bound to subdomain).
  • The telemetry/worker/ source tree (the Worker's TypeScript code) was removed from the repository so the public source mirrors the deleted deployment. The Worker source was never part of the PyPI wheel — this is a repo-tree cleanup, not a release change.

No code, no API, no behaviour changes vs. 0.5.3.


[0.5.3] — 2026-05-03

Privacy release. semvec no longer phones home.

Removed

  • Anonymous init telemetry (semvec._telemetry). The default-on opt-out init ping — version, OS, architecture, Python version, per-machine pseudonym (SHA-256 of a local random salt and the machine ID) — is gone. No HTTP request leaves the package on import. The Cloudflare Worker endpoint is no longer contacted; the SEMVEC_TELEMETRY* environment variables are no longer read; the ~/.semvec/telemetry-salt file is no longer created or used (you can safely delete it from existing installs). The previously cited GDPR Art. 6(1)(f) basis ("legitimate interest in patent enforcement") is withdrawn in full.
  • HyperLogLog "diversity sketch" (semvec._diversity). A HyperLogLog cardinality-counting component that posted an estimate to the same Cloudflare Worker at process exit is removed in full.
  • The atexit-registered ping-completion join that blocked process exit for up to 700 ms waiting for the telemetry socket is gone with the module.
  • Custom User-Agent string (semvec-telemetry/<version>) is no longer sent because no telemetry request is made.

Why

The collection mechanism was disproportionate to its stated purpose, contradicted itself across three docstrings vs. the runtime default, and documented an Article 13 transparency gap (the diversity sketch was a second, undisclosed posting separate from the init ping). Public PyPI download statistics (pypistats overall semvec) cover the legitimate install-count signal without any client-side data flow. The Privacy Notice has been rewritten to reflect the new state — see the README's Telemetry section on PyPI.

Compatibility

  • License-JWT verification, inference, state updates, retrieval, REST API, Compliance Pack — all unchanged. License keys are still verified locally against the embedded Ed25519 public key with no network call.
  • No public-API surface change. Existing 0.5.x callers run untouched.
  • If you set SEMVEC_TELEMETRY=0 in your environment, you can remove the variable; it is no longer read.

[0.5.2] — 2026-05-03

Documentation release. No code changes.

Added

  • Full Claude Code integration guide at /guides/claude-code/.claude/settings.json walk-through, automatic SessionStart and PreCompact lifecycle hooks explained, CLAUDE.md project rule template, end-to-end example session, troubleshooting.
  • Coding-Agents overview at /guides/coding/ — decision tree across the four usage paths (MCP + Claude Code hooks, MCP + Cursor rule, in-process CodingEngine, REST API).
  • Cortex overview at /guides/cortex/ and Cortex over REST API at /guides/cortex-rest/ — full coverage of the multi-agent stack including cluster, region, observer, and network endpoints with curl + httpx examples.
  • Guides landing page at /guides/ and consolidated nav.

Changed

  • docs/guides/compliance.md opens with an explicit layer table (library vs. cron vs. API) so it is clear which [compliance] capabilities need the [api,compliance] extra.

[0.5.1] — 2026-05-03

Documentation release. No code changes.

Added

  • llms.txt and llms-full.txt — machine-readable documentation indexes for AI search engines and LLM crawlers.
  • Architectural comparisons at /comparisons/ — head-to-head with mem0 (measured on LOCOMO), Letta, and LangChain Memory.
  • FAQ at /guides/faq/ — when to use semvec vs. mem0/Letta/LangChain Memory, GPU/offline/licensing/patent questions.
  • JSON-LD SoftwareApplication + SoftwareSourceCode schema in index.html for richer search-engine snippets.

Changed

  • PyPI metadata: expanded keywords and project URLs (FAQ, Quickstart, Comparisons, REST API reference, PyPI).

[0.5.0] — 2026-05-03

First production-stable release. Backward-compatible — every 0.4.x call site keeps working.

Added

  • Per-call meta= kwarg on SemvecState.update(). state.update(emb, text, meta={"confidence": 0.9, "source": "kg"}) lands the per-call dict on MemoryUnit.meta and travels through every snapshot. Symmetrical with the existing ComplianceState.update(meta=…) path.
  • include_adaptive_params=False privacy toggle on to_dict() / to_bytes(). Combined with the existing include_memory_text=False and include_literal_cache_text=False, snapshots can now be redacted along all three independent dimensions for hand-off to third-party support.
  • Ed25519 in sign_certificate / verify_certificate. Auto-detected from the loaded key — pass an Ed25519 PEM and you get a 64-byte signature; pass an RSA PEM and you keep the existing RSA-PSS-SHA256 path. Cross-algorithm verifies return False rather than raising.
  • Opt-in per-user embedding encryption. SqliteEventStore(encryption_seed=…) enables AES-GCM with HKDF-SHA256-derived per-user keys. Backup-leak attackers no longer recover raw vectors. Default off (back-compat); the cosine query path keeps working transparently.
  • RetentionSweeper hooks. New rebuild_worker=, on_before_delete=, on_after_delete= kwargs mirror the FastAPI DELETE/forget enqueue pattern. Hook exceptions are swallowed and audit-logged so a misbehaving observer cannot take down a nightly retention run.
  • musllinux wheels for Linux (x86_64 + aarch64). pip install semvec now works on Alpine / k8s-slim / Lambda-custom-runtime without a compiler toolchain.
  • [jwt] extra for pyjwt>=2.9 — covers the issue-side of user JWTs without manual install. Verify-only path ([compliance]) is unchanged.
  • meta_filter= predicate on MultiResolutionMemory.get_relevant_memories. Caller-supplied Callable[[MemoryUnit], bool] runs after sort, before truncation — Source/Confidence policy can now be applied at retrieval time without post-filtering in Python.
  • protection_score= kwarg on inject_memory — bootstrap a state with persisted long-term decisions / invariants that survive selective forgetting.
  • Additional internal adaptive-tuning fields on SemvecConfig. Internal tuning surface; not part of the public configuration contract.

Changed

  • ConsensusEngine.vote_on_proposal rejects unregistered voters. Pre-fix, the engine silently fell back to default_weight when a non-local voter wasn't registered via register_instance(…). Now raises typed ValueError with a pointer to the registration call. The local-only path (voting_instance=None) is unchanged.

Documentation

  • New section in token reduction API: "When does the proxy pay for itself?" — explains the ~10-turn break-even point.
  • Compliance guide: explicit RSA-PSS-SHA256 with MGF1 note alongside the new Ed25519 path; KeyRegistry.register / rotate / revoke keyword-only example.
  • Core API: to_dict / to_bytes signatures expanded with all three privacy toggles plus a "Snapshot redaction" subsection with a worked example.
  • Correcting memories: ResonanceTrigger.weight picking-table (1.0 default / 2–5 specific / 6–10 hard pin / 0 input-isolation only).

[0.4.5] — 2026-05-02

Fixed

  • NegativeAttractor list now survives to_dict / from_dict / to_bytes / from_bytes round-trips. 0.4.4 added state.add_negative_attractor(...) but the persistence path silently dropped the list — a session snapshot taken after registering attractors restored to an empty list, so a coding agent that built up an "anti-pattern" library across sessions lost it on every restart. Pre-0.4.5 snapshots without the new key restore as an empty list (forward-compatible).

[0.4.4] — 2026-05-02

Added

  • Per-trigger weight= field on ResonanceTrigger (default 1.0, range [0, 10]). A corrected fact's trigger can now outrank topic-default triggers; weight=0 silences the boost while leaving the trigger active for input-isolation. The retrieval re-rank computes boost = γ · max_t(strength_t · weight_t).
  • Anti-resonance in standard retrieval. New state.add_negative_attractor(error_vector, description, source, severity), clear_negative_attractors(), and negative_attractor_count getter. Negative attractors now influence state.memory.get_relevant_memories(...) directly with a multiplicative penalty (1 − δ · max_strength) against candidates that align with any registered attractor above SemvecConfig.negative_attractor_threshold. Previously this was wired only into semvec.coding.CodingEngine. Default penalty δ is 0.5 (SemvecConfig.negative_attractor_penalty).
  • Per-call meta= kwarg on ComplianceState.update(emb, text, *, meta=None) — merges into default_meta with the per-call value winning on key conflicts. Source/Confidence-tagged events without rebuilding the wrapper.
  • DELETE / forget endpoints now enqueue a vector rebuild when set_compliance_dependencies(rebuild_worker=…) is configured. Pure-library callers without a FastAPI session manager keep the previous behaviour.
  • New guide: Correcting memories — covers the five mechanisms (Recency, Trigger weight, NegativeAttractor, Source/Confidence meta, Hard event delete) with code examples.

Changed

  • The trigger boost loop no longer breaks on the first keyword match. With per-trigger weights, a heavier-weighted later trigger may produce a larger contribution than a saturated keyword match whose weight is small, so all triggers are now evaluated and the maximum contribution wins.

[0.4.3] — 2026-05-02

Fixed

  • semvec.__version__ is now a single source of truth. The Python facade had a hard-coded __version__ = "0.4.1" literal that desynced from the actual wheel version when 0.4.2 shipped — pip show semvec reported 0.4.2 while import semvec; semvec.__version__ returned "0.4.1". The string is now imported from semvec._core.__version__, populated from CARGO_PKG_VERSION at compile time. CI guards against the regression.

[0.4.2] — 2026-05-01

Fixed

  • Python 3.10 compatibility. Several modules (semvec.api.models, semvec.api.middleware.compliance_auth, semvec.compliance.{audit,event_store,extractors,retention}) used from datetime import UTC, which only exists in Python 3.11+. On 3.10, the import raised ImportError on first use of the API or Compliance Pack. Replaced with timezone.utc everywhere; behaviour on 3.11+ is unchanged. Affects every 3.10 install of 0.4.0 / 0.4.1.

Documentation

  • README and the documentation site reframed feature-first; patent appears once at the top of each, the rest reads as product documentation.
  • Documentation site published at https://semvec-docs.pages.dev.
  • HMAC middleware: documented that the query string is not part of the canonical request — sign the path only, treat query parameters as read-only-shape filters, do not put tamper-relevant input there.
  • POST /v1/compliance/users/{uid}/forget overrides the request-body reason field with the fixed value "user_request" before the certificate is signed — the cert is an operator-issued attestation, not user-supplied content. Callers that need a different reason use forget_user() from Python directly.

[0.4.1] — 2026-05-01

Fixed

  • SqliteEventStore(path=":memory:") now works end-to-end. Pre-fix, every store operation opened a fresh sqlite3.connect(":memory:"), so init_schema() and append() landed in disjoint ephemeral DBs and the very first append failed with no such table: memory_events. The :memory: branch now keeps a single connection alive for the store's lifetime, guarded by a threading.Lock so concurrent FastAPI workers cannot corrupt the DB. File-backed stores are unchanged.
  • /v1/compliance/users/{uid}/forget returns a typed 503 when the operator has not configured a compliance signing key. Pre-fix the endpoint deleted the user's events first and then failed with a generic 500 RuntimeError when sign_certificate could not find the private key — operator-side mis-configuration silently ate user data. forget_user() now resolves the private key before the delete; the endpoint returns HTTPException(503, detail="compliance_keypair_unconfigured").

Documentation

  • README and Compliance guide clarify that [compliance] is the pure-Python extra (cryptography>=42) and that mounting the FastAPI compliance router needs [api] on top (pip install "semvec[api,compliance]").
  • KeyRegistry.register / rotate / revoke documented as keyword-only.

[0.4.0] — 2026-05-01

Major release: Compliance Pack (semvec.compliance).

Added

A new sub-package next to cortex and coding, adding the data-protection and cryptographic-verification layers that regulated tenants need on top of the base SemvecState. Every feature is gated behind a SEMVEC_ENABLE_* env var, all defaulting to off.

Foundations

  • EntityKind gains three new variants — numeric, date, identifier.
  • ComplianceConfig.from_env() reads five feature flags and two retention day counters.

Event store + replay

  • New MemoryEvent schema (UUID, tz-aware UTC, embedding, JSON-safe meta, optional source-event back-reference).
  • EventStore ABC + SqliteEventStore (file-backed, embeddings as JSON arrays). Cosine top-N via NumPy scan.
  • EventReplayService rebuilds SemvecState deterministically from the event log.
  • ComplianceState wrapper composes SemvecState and mirrors every successful update() into the store. Failures (dim mismatch, isolation reject) propagate without writing.

Retention + GDPR Art. 17

  • RetentionSweeper.sweep(retention_days=30) — idempotent purge with audit-log entries.
  • forget_user() — synchronous Art. 17 wipe + signed DeletionCertificate (RSA-PSS-SHA256). Always returns a certificate, even on an empty store.
  • Embedded public key shipped with the wheel so customers can verify_certificate(cert) without configuring anything.

Verbatim-precise facts

  • NumericFact / DateFact / IdFact dataclasses with Decimal, tz-aware datetime, and ISO-13616 IBAN mod-97 validation respectively. Pure regex — no LLM in the hot path.

HMAC request signing + RS256 user JWT

  • AWS-SigV4-style canonical request, HMAC-SHA256, constant-time tag compare.
  • _internal_verify_user_rs256_jwt for per-user RS256 JWTs (private key never leaves the client device).
  • KeyRegistry Protocol + InMemoryKeyRegistry with register / rotate (24h grace) / revoke / lookup.
  • ComplianceHmacMiddleware enforces the full flow on every /v1/compliance/* request: mandatory headers, ±60 s timestamp window, path-user-id ↔ signed-user-id check, signature verify, nonce-replay check.

REST API

  • New router under /v1/compliance/users/{uid}/...GET memory, DELETE memory[/event_id], POST forget, GET facts?type=numeric|date|identifier. Forget endpoint serialises the signed DeletionCertificate so callers can verify offline.

Async worker

  • InMemoryRebuildWorker decouples the post-DELETE rebuild from the request path. Single daemon thread, flush() test seam, shutdown() graceful-exit hook.

Dependencies

  • New runtime dependency: cryptography>=42 (was already in [api] extras for RSA-PSS-SHA256).

[0.3.8] — 2026-04-30

Fixed

  • LiteralCache.clear() now wipes every field, not just entities. Pre-fix, clear() left decisions, invariants, error_patterns, test_history, and code_structures behind, so a follow-up record_* call appended to old data. The new semantics match the method name: a cleared cache is empty.
  • Bad input to SemvecState.update() now raises a typed ValueError instead of crashing the host process. Two cases covered: dimension mismatch between input_embedding and the configured state dimension (message tells you both numbers and how to fix); empty input_embedding. SemvecConfig(dimension=0) was already handled (raises ConfigurationError) — a regression test pins it.

Documentation

  • README documents the SemvecChatProxy break-even point (~10 turns) explicitly so very-short conversations don't trigger the proxy by default.

[0.3.7] — 2026-04-30

Changed

  • Pro and Enterprise license tiers now bypass the per-state rate limits on update() and calculate_*. Paying customers are no longer subject to throttles meant to discourage anonymous probing. Tier is read from SEMVEC_LICENSE_KEY once at SemvecState construction and cached. Community / anonymous (no license) keep the existing limits (100/s update, 30/s calculate_*).
  • RateLimitError messages now state what to try (slow down, batch via update_batch(), shard across separate SemvecState instances, set a Pro/Enterprise license token) and where to upgrade.

Fixed

  • Privacy toggle now also covers the LiteralCache. The include_memory_text=False argument added in 0.3.6 only redacted the three memory tiers; state.to_dict() was still emitting every literal-cache entities[].value / context / key, decisions, invariants, error patterns, and code structures in clear. New keyword-only argument include_literal_cache_text=True on to_dict() and to_bytes() (default backwards-compatible). Calling both flags at once produces a fully text-redacted snapshot:
snippet — assumes a configured state
snap = state.to_dict(
    include_memory_text=False,
    include_literal_cache_text=False,
)

Embeddings, kind enum, timestamps, importances, and access counts always ride along — the redacted snapshot is still functionally restorable via SemvecState.from_dict() and retrieval against it works.


Earlier releases (0.3.0a1 through 0.3.6) were development iterations and are not part of the public history. Versions on PyPI before 0.3.7 have been removed.