Skip to content

Compliance Pack — Enterprise / Regulated Industries

Install

# Library-only use (event store, extractors, certificates,
# HMAC signing — no FastAPI):
pip install "semvec[compliance]"

# When you want the FastAPI router + middleware too:
pip install "semvec[api,compliance]"

The [compliance] extra pulls in cryptography>=42 for the DeletionCertificate signer and the RS256 user-JWT verifier. The FastAPI router (compliance_router) and middleware (ComplianceHmacMiddleware) live in semvec.api.* and need the heavier [api] extra (FastAPI, slowapi, SQLAlchemy, prometheus).

The Compliance Pack adds the cryptographic verification, retention, and selective-deletion layers that regulated tenants need on top of the base SemvecState. Every feature ships behind a SEMVEC_ENABLE_* environment variable, defaulting to off, so an existing deployment that imports semvec does not pick up new behaviour by accident.

What's in the pack

Capability Module Why
Append-only event store semvec.compliance.event_store The 3-tier memory + EMA vector + literal cache become derived views — rebuildable from the events at any time. Single source of truth for what shaped the state.
Deterministic replay semvec.compliance.event_replay Two replays of the same event stream produce bit-identical semantic_states. Required for audit re-construction and after-deletion rebuilds.
Automatic 30-day retention semvec.compliance.retention Cron-friendly sweeper that purges anything older than retention_days and writes an audit record per affected user.
GDPR Art. 17 forget semvec.compliance.retention.forget_user Synchronous wipe + signed DeletionCertificate the customer can verify offline.
Verbatim-precise facts semvec.compliance.extractors Regex-based numeric / date / identifier extractors. Decimal precision; never roundtrips through float. Includes IBAN mod-97 checksum.
HMAC request signing semvec.compliance.hmac_signing + api.middleware.compliance_auth AWS-SigV4-style (METHOD, PATH, SHA256(body), TS, NONCE) canonical, HMAC-SHA256, constant-time verify, replay defence.
RS256 user JWT semvec.compliance.rs256 + key_registry Per-user public key registered server-side, private key never leaves the client. The server cannot forge tokens.
Async vector rebuild semvec.compliance.workers.vector_rebuild Decouples the post-DELETE replay from the request path — the API endpoint enqueues, the worker rebuilds, the session store gets the new vector.

Quickstart

Wire an event-sourced state

from semvec import SemvecConfig
from semvec.compliance.event_store import SqliteEventStore
from semvec.compliance.state_proxy import ComplianceState

store = SqliteEventStore(path="events.sqlite")
store.init_schema()

state = ComplianceState(
    SemvecConfig(dimension=384),
    event_store=store,
    user_id="user-42",
    default_meta={"channel": "chat"},
)

# Every successful update appends a MemoryEvent. Failures (dim
# mismatch, isolation reject) propagate without writing.
state.update(my_embedder.get_embedding("Hello"), "Hello")

Extract verbatim facts

from semvec.compliance.extractors import extract_facts

text = "Mein Kontostand ist 1.247,38 € am 15.08.2026"
for fact in extract_facts(text):
    print(fact.kind, fact)
# numeric   NumericFact(value=Decimal('1247.38'), unit='EUR', ...)
# date      DateFact(value=datetime(2026, 8, 15, tzinfo=UTC), ...)

Decimal precision is enforced — Decimal('0.1') + Decimal('0.2') == Decimal('0.3') exactly. Float roundtrips are forbidden.

Run the retention sweeper

from semvec.compliance.retention import RetentionSweeper

report = RetentionSweeper(store=store).sweep(retention_days=30)
print(report.deleted_total, report.deleted_per_user)

Idempotent — a second call with the same retention window is a no-op.

Issue a signed DeletionCertificate (GDPR Art. 17)

from semvec.compliance.audit import InMemoryAuditLog
from semvec.compliance.retention import forget_user

cert = forget_user(
    user_id="user-42",
    store=store,
    audit_log=InMemoryAuditLog(),
    issuer="versino-compliance",
)

# Customer-side verification (offline):
from semvec.compliance.certificates import verify_certificate
assert verify_certificate(cert)  # uses the wheel-embedded pubkey

The certificate's reason field is server-controlled. The POST /v1/compliance/users/{uid}/forget HTTP endpoint always writes reason="user_request" into the signed payload, even if the request body carries a different value (e.g. a "reason":"user_request_dsgvo_art17"). This is intentional — the signed certificate is an attestation issued by the operator, so an arbitrary user-supplied string in there would dilute its evidentiary value. Use the forget_user() Python API directly if you need a custom reason (e.g. ttl_expired from a sweeper).

The wheel ships with the operator's RSA-3072 public key embedded at build time (set the SEMVEC_COMPLIANCE_PUBKEY_PEM repository secret in CI). Customers can verify the certificate without any configuration. Operators on a self-managed deployment override the key via SEMVEC_COMPLIANCE_PUBKEY_FILE or SEMVEC_COMPLIANCE_PUBKEY_PEM.

Sign HTTP requests against the server

from semvec.compliance.hmac_signing import sign_request
from datetime import UTC, datetime
import secrets

body = b'{"reason":"user_request"}'
ts = datetime.now(UTC).isoformat()
nonce = secrets.token_hex(16)
signature = sign_request(
    secret=my_hmac_secret,
    method="POST",
    path="/v1/compliance/users/user-42/forget",
    body=body,
    timestamp=ts,
    nonce=nonce,
)

headers = {
    "X-Semvec-User-Id": "user-42",
    "X-Semvec-Key-Id": my_kid,
    "X-Semvec-Timestamp": ts,
    "X-Semvec-Nonce": nonce,
    "X-Semvec-Signature": signature,
}

Sign the path, not the URL. The middleware verifies against request.url.path only — the query string is not part of the canonical request. For GET /v1/compliance/users/user-42/facts?type=numeric the signing path is /v1/compliance/users/user-42/facts. Hitting the URL with the query string baked into the signed path produces a 401 bad_signature.

Practical consequence: do not put tamper-relevant input in the query string (?action=delete-style toggles). Filters that only shape the response (?type=numeric) are fine — the worst a MitM can do is change the filter on a read-only request. A future release may include the canonical query string in the signed payload (AWS-SigV4 §3.2.4 style), which would be a breaking change to client signers; current call sites should keep query parameters read-only-shape.

Mount the FastAPI middleware

from fastapi import FastAPI
from semvec.api.compliance_routes import (
    compliance_router,
    set_compliance_dependencies,
)
from semvec.api.middleware.compliance_auth import ComplianceHmacMiddleware
from semvec.compliance.audit import InMemoryAuditLog
from semvec.compliance.event_store import SqliteEventStore
from semvec.compliance.key_registry import InMemoryKeyRegistry
from semvec.compliance.nonce_cache import InMemoryNonceCache

store = SqliteEventStore(path="events.sqlite")
store.init_schema()
registry = InMemoryKeyRegistry()
nonce_cache = InMemoryNonceCache(window_seconds=60)

app = FastAPI()
app.add_middleware(
    ComplianceHmacMiddleware,
    registry=registry,
    nonce_cache=nonce_cache,
    protected_prefix="/v1/compliance",
)
set_compliance_dependencies(store=store, audit_log=InMemoryAuditLog())
app.include_router(compliance_router)

Failure modes the middleware enforces:

  • missing_signature — required X-Semvec-* headers absent.
  • timestamp_out_of_window — clock skew exceeds the configured window.
  • unknown_key — user/key pair not in the registry.
  • user_id_mismatch — signed user-id does not match the path's user-id.
  • bad_signature — HMAC verify failed.
  • nonce_replayed — same nonce already observed in the window (HTTP 409).

Runtime configuration

# Feature flags — every one defaults to off.
export SEMVEC_ENABLE_EVENT_STORE=1
export SEMVEC_ENABLE_RETENTION_SWEEPER=1
export SEMVEC_ENABLE_HMAC_SIGNING=1
export SEMVEC_ENABLE_RS256_JWT=1
export SEMVEC_ENABLE_NUMERIC_EXTRACTOR=1

# Retention windows.
export SEMVEC_RETENTION_DAYS_CHAT=30        # default 30
export SEMVEC_RETENTION_DAYS_AUDIT=2555     # default ~ 7 years

# DeletionCertificate keys.
export SEMVEC_COMPLIANCE_PRIVKEY_FILE=/path/to/compliance.priv.pem
# (Operators only; the matching public key is embedded in the wheel.)

Architecture notes

  • Event store is authoritative; everything else is derived. A reset of the EMA vector or the 3-tier memory does not lose information — replay rebuilds them. A delete in the event store is the only way to genuinely forget something.
  • Replay never trips the rate limiter. The replay path uses the internal _internal_record_replay_step() accessor on SemvecState which skips the per-state community-tier limiter. Public update() keeps the limiter to discourage probing of the update equation.
  • HMAC verify is constant-time. subtle::ConstantTimeEq on the Rust side; the Python facade just forwards the bytes. Malformed signatures (wrong length, non-hex chars) return False instead of raising — never let a parser error escalate to a panic.
  • Body verify, then nonce. The middleware verifies the HMAC signature before it consumes the nonce. A bad signature on a legitimate retry does not lock out the genuine retry from re-using the same nonce.

Demo script

scripts/demo_compliance_pack.py walks every feature end-to-end. Runs in <2 s against a temporary SQLite store; uses the operator key at /mnt/c/Versino PsiOmega GmbH/semvec_pypi_private_key/compliance.priv.pem when available, otherwise mints an ephemeral key just for the demo.

SEMVEC_TELEMETRY=0 python scripts/demo_compliance_pack.py

Limitations

  • In-memory backends only by default. SQLite event store is fine for single-process / development / small deployments. For multi-replica production, swap in a Postgres + pgvector backend (the EventStore ABC pins the contract) and replace InMemoryNonceCache with a Redis or Postgres-backed cache. Both swaps are half-day ports against the existing tests.
  • HMAC secret bootstrap is on you. The Compliance Pack does not ship a "first key registration" flow. Customers exchange the HMAC secret with you out-of-band when they get their license JWT.
  • Replay can be slow on huge corpora. Re-folding a million events through SemvecState.update() is O(N). The async worker keeps the request path snappy, but the rebuild itself is still N steps. Future work: a merge-friendly checkpoint format that lets replays start from a snapshot.