2026-05-13">
Skip to content

REST API (semvec[api])

The optional semvec[api] extra ships a FastAPI-based HTTP service that exposes Semvec's full feature surface plus a multi-layer multi-agent coordination stack. It is auth-gated by the bundled Ed25519 JWT licensing system — the same JWT already used for in-process licensing. No password store; no separate API-key table.

pip install "semvec[api]"
semvec serve --host 0.0.0.0 --port 8080
# or programmatically
python -m uvicorn semvec.api:create_app --factory --port 8080

Auth

Send the license JWT via either header:

Authorization: Bearer eyJhbGciOiJFZERTQSI...
# or
X-API-Key: eyJhbGciOiJFZERTQSI...

For local development, the wheel must be built with the dev-anonymous Cargo feature for SEMVEC_ALLOW_ANONYMOUS=1 to bypass auth. The official PyPI wheel ships without this feature; every request requires a valid license JWT. To experiment locally without a license, either:

  1. Build from source: maturin develop --features dev-anonymous, or
  2. Issue yourself a short-TTL development license (see licensing).

Persistence

DATABASE_URL controls the SQLAlchemy engine. Default: sqlite:///semvec.db. Postgres is supported by setting e.g. DATABASE_URL=postgresql://user:pw@host/db. The hot semantic state lives in-memory (SessionManager); SQLite stores only session/cluster/member/region/audit metadata.

Session lifecycle

The in-memory SessionManager enforces an idle-TTL plus a hard cap, so a long-running worker cannot leak unbounded SemvecState instances. Tunable per worker:

Variable Default What it does
SEMVEC_MAX_SESSIONS 10000 Hard cap on concurrent sessions per worker. On overflow the LRU-on-idle-time session is evicted.
SEMVEC_SESSION_IDLE_TTL_S 1800 (30 min) A session that has not been touched for this long is eligible for eviction.
SEMVEC_SESSION_SWEEP_S 60 How often the background sweeper checks for expired sessions. Lower = more responsive, higher = less wake-up overhead.

Eviction is in-memory only. When a session is evicted, the SemvecState for that ID disappears from this worker — the SQLAlchemy row stays untouched, but the hot state has to be rebuilt from the next request (or restored via /v1/session/{session_id}/import). Persist proactively via GET /v1/session/{session_id}/export if you need a snapshot to survive eviction.

Graceful SIGTERM drain. SessionManager.shutdown() is wired into FastAPI's lifespan. On SIGTERM the server stops accepting new requests, in-flight ones complete, the embedder client (and sidecar, when used) closes cleanly, then the session table empties. Behind a reverse proxy / load balancer this enables zero-error rolling restarts:

# Production-style restart: send SIGTERM, wait for the process to exit on its own,
# spawn the new worker. systemd's KillSignal=SIGTERM + TimeoutStopSec=60 does this
# for free.
kill -TERM $(cat /run/semvec.pid)

Performance characteristics

/v1/run is async-native end-to-end since the sharpening release:

  • Query + last-response embeds run in parallel when both are present on the same request.
  • License verification is LRU-cached (Ed25519 verify, 256 entries) and bypasses the FastAPI Depends() dispatcher via a dedicated ASGI middleware — typical /v1/run no longer pays the verify cost twice.
  • CORS middleware is skipped when no CORS_ALLOW_ORIGINS is configured.
  • Threadpool default is 200 workers so synchronous sub-paths (e.g. cross-encoder reranks on CPU) do not starve other coroutines.

End-to-end measurements vs the 0.5.6 baseline on the same hardware:

Workload Δ throughput
Mixed /v1/run (store + retrieve) +63 %
QA-only flow (retrieve, no store) +772 %
Long-term tier-consolidation hot path +57 %
Single-pass similarity scoring +91 %

API surface is unchanged — these are infrastructure-level wins, not new endpoints or new request shapes.

Endpoint Overview

Sessions

Method Path Purpose
GET /v1/health liveness + active-session count (no auth)
POST /v1/run single-turn run: retrieve context + optionally store previous answer
POST /v1/store learn from an LLM response
POST /v1/session/create explicit session creation (optional template + policy vectors)
DELETE /v1/session/{session_id} delete a session
GET /v1/metrics/{session_id} full metrics snapshot. Convenience alias: GET /v1/state/metrics?session_id=… accepts the session id as a query parameter.
GET /v1/state/context?session_id=&top_k=&full_first=&max_text_chars= retrieve relevant memories; each item carries a memory_hash + truncated flag. The truncation cap is caller-controlled via max_text_chars (default 500, range 1–100 000). With full_first=true the top hit is returned ungutted regardless of the cap.
GET /v1/session/{session_id}/memories/{memory_hash} expand a single memory to full text + importance + access_count + timestamp

Session Control

Method Path Purpose
POST/DELETE /v1/session/{session_id}/trigger resonance triggers (keyword + embedding)
POST /v1/session/{session_id}/anchor drift anchors
GET /v1/session/{session_id}/anchor_score anchor score + drift threshold
PUT /v1/session/{session_id}/isolation isolation filter (OPEN / FILTER / QUARANTINE / LOCKDOWN)
POST /v1/session/{session_id}/isolation/release release quarantine
POST /v1/session/{session_id}/memory synthetic memory injection
GET /v1/session/{session_id}/export serialize with checksum
POST /v1/session/{session_id}/import restore from exported dict
POST /v1/session/{session_id}/verify behavioral consistency check

Cluster

Method Path Purpose
POST /v1/cluster/ create cluster (201); aggregation_mode = weighted_average or attention; coupling_factor ∈ [0, 1]
GET /v1/cluster/ list owned clusters
GET /v1/cluster/{cluster_id} state + aggregate_vector
DELETE /v1/cluster/{cluster_id} tears down backing session too
POST /v1/cluster/{cluster_id}/store seed Q&A into shared session
POST /v1/cluster/{cluster_id}/run query cluster session (cluster_id == session_id)
POST /v1/cluster/{cluster_id}/feedback blend aggregate back into members
POST/DELETE /v1/cluster/{cluster_id}/members / {session_id} membership CRUD

Region (Consensus)

Method Path Purpose
POST /v1/region/ create region (201); consensus_threshold, vote_window_seconds
GET /v1/region/ list owned
GET /v1/region/{region_id} state + last_realignment + recent drift events
DELETE /v1/region/{region_id} delete region + meta-session
POST/DELETE /v1/region/{region_id}/clusters / {cluster_id} attach/detach clusters
GET /v1/region/{region_id}/events?limit=20 recent drift events

Drift events are published internally when /run detects drift on a cluster-backing session. The DriftEventBus fans out to per-region callbacks; a realignment fires when a fraction of members > threshold vote within the rolling window.

Global Observer

Method Path Purpose
POST /v1/observer/ create or return existing (idempotent per license subject)
GET /v1/observer/summary observer state incl. anomaly_count
POST /v1/observer/sample trigger manual sample
GET /v1/observer/anomalies recent anomalies (newest first)
DELETE /v1/observer/anomalies clear anomaly log
POST/DELETE /v1/observer/regions / {region_id} register / unregister region

Anomaly types: cross_cluster_convergence (3+ clusters across ≥ 2 regions converged to the same non-initialization phase), systemic_drift (>50 % of observed clusters show drift indicators), cluster_divergence (cluster interaction_count >3× region average).

Idempotency

Semvec ≤ 0.6.1 does not implement an Idempotency-Key header. Side-effecting POSTs (/v1/run, /v1/store, /v1/session/create, /v1/cluster/* writes, /v1/region/* writes, /v1/observer/*) are processed at-least-once: a client retry after a network timeout will re-apply the side effect.

Mitigations the operator owns until native support ships:

  • Generate session / cluster / region IDs client-side (UUID v4) and pass them in the request body where the schema accepts it. The server is (license_subject, id)-unique, so a retry with the same explicit ID returns 409 / 200 deterministically instead of creating a duplicate.
  • For /v1/run and /v1/store, hold a short-lived (session_id, content_hash) dedup map on the client and skip the retry if the previous attempt already completed.
  • Idempotency native support is on the roadmap — track via the GitHub issue tracker.

Audit events are not exposed via REST in semvec ≤ 0.6.1. Compliance routes (semvec[compliance]) exist internally but no /v1/audit/* HTTP endpoint is registered. Query the audit_log table directly via DATABASE_URL, or use the semvec.audit Python API (audit_log, audited) for programmatic access.

OpenAPI / interactive docs

The FastAPI app serves the standard schema and interactive docs with FastAPI defaults:

Path Purpose
GET /openapi.json OpenAPI 3.1 schema
GET /docs Swagger UI
GET /redoc ReDoc

All three are served behind LicenseAuthMiddleware — the public bypass list is only /v1/health and /metrics. To browse the schema you must send a valid license JWT (or run with SEMVEC_ALLOW_ANONYMOUS=1 for local development). If you need to expose /docs for an external auditor, terminate auth at a reverse proxy and have it inject the JWT, or generate a static HTML render of /openapi.json from a CI job and host it separately.

Pagination

There is no cursor-based pagination in semvec ≤ 0.6.1. Listing endpoints accept a limit query parameter with FastAPI-enforced ge/le bounds, but they do not emit X-Next-Cursor / X-Total-Estimate headers and the server keeps no scroll state — you cannot page past limit.

Endpoint limit default Min Max
GET /v1/region/{region_id}/events 20 1 1000
GET /v1/observer/anomalies 20 1 1000
GET /v1/session/{session_id}/entities (max_results) 20 1 1000

Listing endpoints that return the full owned set, unpaged:

  • GET /v1/cluster/ — every cluster owned by the calling license subject
  • GET /v1/region/ — every region owned by the calling license subject
  • GET /v1/network/users/active — single record; not affected

On multi-tenant or long-lived deployments these can grow unbounded. Until cursor pagination ships, cap them at the reverse-proxy layer (nginx client_max_body_size for safety + an explicit application-layer prune job) or shard licenses so no one subject ever owns more than a few thousand clusters/regions.

Rate-limit headers

The server does not emit X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset headers. A 429 response carries only Retry-After: 60 and {"detail": "Too many requests"}. Clients that need per-tier quota visibility have to derive it from their own license-tier metadata, not from response headers.

Pre-auth requests are not rate-limited. A client sending invalid licenses gets 401 Unauthorized per request with no IP-level shedding. A reverse-proxy WAF / nginx limit_req keyed on source IP is required for DoS protection against credential-stuffing or auth-flood patterns.

Per-tenant quoting

Two scopes coexist and they are not the same:

Concern Scope Source of truth
Resource ownership (sessions, clusters, regions, observers, entities) Per license_subject (sub claim of the license JWT) LicenseAuthMiddleware populates request.state.license; routes filter on license_subject(request)
Rate limiting Per remote IP (slowapi.util.get_remote_address) limiter = Limiter(key_func=get_remote_address) in semvec.api.routes

Implication: a single license shared across N hosts gets N × the per-IP quota; conversely, several licenses behind one NAT egress share one quota. If you need per-license rate limiting, terminate it at an API gateway in front of semvec serve and key the gateway's limiter on the JWT sub claim, not on the source IP.

The community/Pro/Enterprise tier numbers documented in the licensing page describe the target enforcement, not the per-IP enforcement that ships in semvec ≤ 0.6.1. Until the limiter switches to license_subject keying, treat the tier numbers as a usage policy, not a server-side guarantee.

Network

Method Path Purpose
POST /v1/network/transfer semantic delta-vector transfer
POST /v1/network/users/switch switch user partition (saves current, activates target)
GET /v1/network/users/active currently active user
POST /v1/network/users/{user_id}/serialize serialize user partition
POST /v1/network/consensus propose consensus vector
GET /v1/network/consensus/trust current trust scores per instance

Literal cache

Method Path Purpose
POST /v1/session/{session_id}/entities store a verbatim code entity (201)
GET /v1/session/{session_id}/entities?q=&max_results= list / keyword-query
DELETE /v1/session/{session_id}/entities/{key:path} remove entity

Observability

/metrics exposes Prometheus metrics behind Basic Auth (METRICS_USER / METRICS_PASSWORD env vars). A request middleware collects semvec_requests_total{method, endpoint, status} and semvec_request_duration_seconds{method, endpoint} automatically.

Error handling

All error responses carry a JSON body with a single detail field:

{"detail": "Session not found"}
Status When it fires
400 Malformed state-import payload (/v1/session/{session_id}/import), unknown aggregation_mode on cluster creation, unknown entity kind on literal-cache store.
401 Missing or invalid license JWT on any route except /v1/health. Also /metrics without valid Basic Auth.
402 License JWT signature is valid but the token is expired. Includes a "renew at …" hint pointing at https://www.semvec.io.
404 Session / cluster / region / observer / entity / memory not found, or caller's license subject does not own the resource (the server does not leak resource existence across tenants).
422 Pydantic validation failure — missing or out-of-range request field. The body conforms to FastAPI's standard {"detail": [{"loc": [...], "msg": "...", "type": "..."}]} shape.
429 Rate-limit exceeded. Response carries Retry-After: 60. The Community/Pro/Enterprise QPS numbers come from the per-SemvecState in-process bucket (see licensing), not from a server-wide HTTP throttle — see Per-tenant quoting above. For DoS protection in front of semvec serve, terminate rate-limiting at a reverse proxy.
500 Unhandled server error — logged via uvicorn access log with request ID. Investigate server logs.
503 /metrics endpoint hit without METRICS_USER / METRICS_PASSWORD env vars configured.

The detail string on 402 includes the upgrade URL; on 401 it distinguishes between "Missing license token" and "Invalid license: …"; on 404 it tells you whether the session or the specific sub-resource was missing.

Minimal quickstart

snippet — requires a running `semvec serve` on :8080; RunResponse exposes `session_id` and `context`
import httpx

client = httpx.Client(
    base_url="http://localhost:8080/v1",
    headers={"X-API-Key": "eyJhbGciOiJFZERTQSI..."},
)

run = client.post("/run", json={"message": "What is Kubernetes?"}).json()
sid = run["session_id"]
# feed to your LLM with run["context"] as the system prompt ...
client.post("/store", json={"session_id": sid, "response": "Kubernetes..."})

See also