REST API (semvec[api])¶
The optional semvec[api] extra ships a FastAPI-based HTTP service that exposes Semvec's full feature surface plus a multi-layer multi-agent coordination stack. It is auth-gated by the bundled Ed25519 JWT licensing system — the same JWT already used for in-process licensing. No password store; no separate API-key table.
pip install "semvec[api]"
semvec serve --host 0.0.0.0 --port 8080
# or programmatically
python -m uvicorn semvec.api:create_app --factory --port 8080
Auth¶
Send the license JWT via either header:
For local development, the wheel must be built with the dev-anonymous
Cargo feature for SEMVEC_ALLOW_ANONYMOUS=1 to bypass auth. The
official PyPI wheel ships without this feature; every request requires
a valid license JWT. To experiment locally without a license, either:
- Build from source:
maturin develop --features dev-anonymous, or - Issue yourself a short-TTL development license (see licensing).
Persistence¶
DATABASE_URL controls the SQLAlchemy engine. Default: sqlite:///semvec.db. Postgres is supported by setting e.g. DATABASE_URL=postgresql://user:pw@host/db. The hot semantic state lives in-memory (SessionManager); SQLite stores only session/cluster/member/region/audit metadata.
Session lifecycle¶
The in-memory SessionManager enforces an idle-TTL plus a hard cap, so a long-running worker cannot leak unbounded SemvecState instances. Tunable per worker:
| Variable | Default | What it does |
|---|---|---|
SEMVEC_MAX_SESSIONS |
10000 |
Hard cap on concurrent sessions per worker. On overflow the LRU-on-idle-time session is evicted. |
SEMVEC_SESSION_IDLE_TTL_S |
1800 (30 min) |
A session that has not been touched for this long is eligible for eviction. |
SEMVEC_SESSION_SWEEP_S |
60 |
How often the background sweeper checks for expired sessions. Lower = more responsive, higher = less wake-up overhead. |
Eviction is in-memory only. When a session is evicted, the SemvecState for that ID disappears from this worker — the SQLAlchemy row stays untouched, but the hot state has to be rebuilt from the next request (or restored via /v1/session/{session_id}/import). Persist proactively via GET /v1/session/{session_id}/export if you need a snapshot to survive eviction.
Graceful SIGTERM drain. SessionManager.shutdown() is wired into FastAPI's lifespan. On SIGTERM the server stops accepting new requests, in-flight ones complete, the embedder client (and sidecar, when used) closes cleanly, then the session table empties. Behind a reverse proxy / load balancer this enables zero-error rolling restarts:
# Production-style restart: send SIGTERM, wait for the process to exit on its own,
# spawn the new worker. systemd's KillSignal=SIGTERM + TimeoutStopSec=60 does this
# for free.
kill -TERM $(cat /run/semvec.pid)
Performance characteristics¶
/v1/run is async-native end-to-end since the sharpening release:
- Query + last-response embeds run in parallel when both are present on the same request.
- License verification is LRU-cached (Ed25519 verify, 256 entries) and bypasses the FastAPI
Depends()dispatcher via a dedicated ASGI middleware — typical/v1/runno longer pays the verify cost twice. - CORS middleware is skipped when no
CORS_ALLOW_ORIGINSis configured. - Threadpool default is 200 workers so synchronous sub-paths (e.g. cross-encoder reranks on CPU) do not starve other coroutines.
End-to-end measurements vs the 0.5.6 baseline on the same hardware:
| Workload | Δ throughput |
|---|---|
Mixed /v1/run (store + retrieve) |
+63 % |
| QA-only flow (retrieve, no store) | +772 % |
| Long-term tier-consolidation hot path | +57 % |
| Single-pass similarity scoring | +91 % |
API surface is unchanged — these are infrastructure-level wins, not new endpoints or new request shapes.
Endpoint Overview¶
Sessions¶
| Method | Path | Purpose |
|---|---|---|
| GET | /v1/health |
liveness + active-session count (no auth) |
| POST | /v1/run |
single-turn run: retrieve context + optionally store previous answer |
| POST | /v1/store |
learn from an LLM response |
| POST | /v1/session/create |
explicit session creation (optional template + policy vectors) |
| DELETE | /v1/session/{session_id} |
delete a session |
| GET | /v1/metrics/{session_id} |
full metrics snapshot. Convenience alias: GET /v1/state/metrics?session_id=… accepts the session id as a query parameter. |
| GET | /v1/state/context?session_id=&top_k=&full_first=&max_text_chars= |
retrieve relevant memories; each item carries a memory_hash + truncated flag. The truncation cap is caller-controlled via max_text_chars (default 500, range 1–100 000). With full_first=true the top hit is returned ungutted regardless of the cap. |
| GET | /v1/session/{session_id}/memories/{memory_hash} |
expand a single memory to full text + importance + access_count + timestamp |
Session Control¶
| Method | Path | Purpose |
|---|---|---|
| POST/DELETE | /v1/session/{session_id}/trigger |
resonance triggers (keyword + embedding) |
| POST | /v1/session/{session_id}/anchor |
drift anchors |
| GET | /v1/session/{session_id}/anchor_score |
anchor score + drift threshold |
| PUT | /v1/session/{session_id}/isolation |
isolation filter (OPEN / FILTER / QUARANTINE / LOCKDOWN) |
| POST | /v1/session/{session_id}/isolation/release |
release quarantine |
| POST | /v1/session/{session_id}/memory |
synthetic memory injection |
| GET | /v1/session/{session_id}/export |
serialize with checksum |
| POST | /v1/session/{session_id}/import |
restore from exported dict |
| POST | /v1/session/{session_id}/verify |
behavioral consistency check |
Cluster¶
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/cluster/ |
create cluster (201); aggregation_mode = weighted_average or attention; coupling_factor ∈ [0, 1] |
| GET | /v1/cluster/ |
list owned clusters |
| GET | /v1/cluster/{cluster_id} |
state + aggregate_vector |
| DELETE | /v1/cluster/{cluster_id} |
tears down backing session too |
| POST | /v1/cluster/{cluster_id}/store |
seed Q&A into shared session |
| POST | /v1/cluster/{cluster_id}/run |
query cluster session (cluster_id == session_id) |
| POST | /v1/cluster/{cluster_id}/feedback |
blend aggregate back into members |
| POST/DELETE | /v1/cluster/{cluster_id}/members / {session_id} |
membership CRUD |
Region (Consensus)¶
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/region/ |
create region (201); consensus_threshold, vote_window_seconds |
| GET | /v1/region/ |
list owned |
| GET | /v1/region/{region_id} |
state + last_realignment + recent drift events |
| DELETE | /v1/region/{region_id} |
delete region + meta-session |
| POST/DELETE | /v1/region/{region_id}/clusters / {cluster_id} |
attach/detach clusters |
| GET | /v1/region/{region_id}/events?limit=20 |
recent drift events |
Drift events are published internally when /run detects drift on a cluster-backing session. The DriftEventBus fans out to per-region callbacks; a realignment fires when a fraction of members > threshold vote within the rolling window.
Global Observer¶
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/observer/ |
create or return existing (idempotent per license subject) |
| GET | /v1/observer/summary |
observer state incl. anomaly_count |
| POST | /v1/observer/sample |
trigger manual sample |
| GET | /v1/observer/anomalies |
recent anomalies (newest first) |
| DELETE | /v1/observer/anomalies |
clear anomaly log |
| POST/DELETE | /v1/observer/regions / {region_id} |
register / unregister region |
Anomaly types: cross_cluster_convergence (3+ clusters across ≥ 2 regions converged to the same non-initialization phase), systemic_drift (>50 % of observed clusters show drift indicators), cluster_divergence (cluster interaction_count >3× region average).
Idempotency¶
Semvec ≤ 0.6.1 does not implement an Idempotency-Key header. Side-effecting POSTs (/v1/run, /v1/store, /v1/session/create, /v1/cluster/* writes, /v1/region/* writes, /v1/observer/*) are processed at-least-once: a client retry after a network timeout will re-apply the side effect.
Mitigations the operator owns until native support ships:
- Generate session / cluster / region IDs client-side (UUID v4) and pass them in the request body where the schema accepts it. The server is
(license_subject, id)-unique, so a retry with the same explicit ID returns 409 / 200 deterministically instead of creating a duplicate. - For
/v1/runand/v1/store, hold a short-lived(session_id, content_hash)dedup map on the client and skip the retry if the previous attempt already completed. - Idempotency native support is on the roadmap — track via the GitHub issue tracker.
Audit events are not exposed via REST in semvec ≤ 0.6.1. Compliance routes (
semvec[compliance]) exist internally but no/v1/audit/*HTTP endpoint is registered. Query theaudit_logtable directly viaDATABASE_URL, or use thesemvec.auditPython API (audit_log,audited) for programmatic access.
OpenAPI / interactive docs¶
The FastAPI app serves the standard schema and interactive docs with FastAPI defaults:
| Path | Purpose |
|---|---|
GET /openapi.json |
OpenAPI 3.1 schema |
GET /docs |
Swagger UI |
GET /redoc |
ReDoc |
All three are served behind LicenseAuthMiddleware — the public bypass list is only /v1/health and /metrics. To browse the schema you must send a valid license JWT (or run with SEMVEC_ALLOW_ANONYMOUS=1 for local development). If you need to expose /docs for an external auditor, terminate auth at a reverse proxy and have it inject the JWT, or generate a static HTML render of /openapi.json from a CI job and host it separately.
Pagination¶
There is no cursor-based pagination in semvec ≤ 0.6.1. Listing endpoints accept a limit query parameter with FastAPI-enforced ge/le bounds, but they do not emit X-Next-Cursor / X-Total-Estimate headers and the server keeps no scroll state — you cannot page past limit.
| Endpoint | limit default |
Min | Max |
|---|---|---|---|
GET /v1/region/{region_id}/events |
20 | 1 | 1000 |
GET /v1/observer/anomalies |
20 | 1 | 1000 |
GET /v1/session/{session_id}/entities (max_results) |
20 | 1 | 1000 |
Listing endpoints that return the full owned set, unpaged:
GET /v1/cluster/— every cluster owned by the calling license subjectGET /v1/region/— every region owned by the calling license subjectGET /v1/network/users/active— single record; not affected
On multi-tenant or long-lived deployments these can grow unbounded. Until cursor pagination ships, cap them at the reverse-proxy layer (nginx client_max_body_size for safety + an explicit application-layer prune job) or shard licenses so no one subject ever owns more than a few thousand clusters/regions.
Rate-limit headers¶
The server does not emit X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset headers. A 429 response carries only Retry-After: 60 and {"detail": "Too many requests"}. Clients that need per-tier quota visibility have to derive it from their own license-tier metadata, not from response headers.
Pre-auth requests are not rate-limited. A client sending invalid
licenses gets 401 Unauthorized per request with no IP-level shedding.
A reverse-proxy WAF / nginx limit_req keyed on source IP is required
for DoS protection against credential-stuffing or auth-flood patterns.
Per-tenant quoting¶
Two scopes coexist and they are not the same:
| Concern | Scope | Source of truth |
|---|---|---|
| Resource ownership (sessions, clusters, regions, observers, entities) | Per license_subject (sub claim of the license JWT) |
LicenseAuthMiddleware populates request.state.license; routes filter on license_subject(request) |
| Rate limiting | Per remote IP (slowapi.util.get_remote_address) |
limiter = Limiter(key_func=get_remote_address) in semvec.api.routes |
Implication: a single license shared across N hosts gets N × the per-IP quota; conversely, several licenses behind one NAT egress share one quota. If you need per-license rate limiting, terminate it at an API gateway in front of semvec serve and key the gateway's limiter on the JWT sub claim, not on the source IP.
The community/Pro/Enterprise tier numbers documented in the licensing page describe the target enforcement, not the per-IP enforcement that ships in semvec ≤ 0.6.1. Until the limiter switches to license_subject keying, treat the tier numbers as a usage policy, not a server-side guarantee.
Network¶
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/network/transfer |
semantic delta-vector transfer |
| POST | /v1/network/users/switch |
switch user partition (saves current, activates target) |
| GET | /v1/network/users/active |
currently active user |
| POST | /v1/network/users/{user_id}/serialize |
serialize user partition |
| POST | /v1/network/consensus |
propose consensus vector |
| GET | /v1/network/consensus/trust |
current trust scores per instance |
Literal cache¶
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/session/{session_id}/entities |
store a verbatim code entity (201) |
| GET | /v1/session/{session_id}/entities?q=&max_results= |
list / keyword-query |
| DELETE | /v1/session/{session_id}/entities/{key:path} |
remove entity |
Observability¶
/metrics exposes Prometheus metrics behind Basic Auth (METRICS_USER / METRICS_PASSWORD env vars). A request middleware collects semvec_requests_total{method, endpoint, status} and semvec_request_duration_seconds{method, endpoint} automatically.
Error handling¶
All error responses carry a JSON body with a single detail field:
| Status | When it fires |
|---|---|
| 400 | Malformed state-import payload (/v1/session/{session_id}/import), unknown aggregation_mode on cluster creation, unknown entity kind on literal-cache store. |
| 401 | Missing or invalid license JWT on any route except /v1/health. Also /metrics without valid Basic Auth. |
| 402 | License JWT signature is valid but the token is expired. Includes a "renew at …" hint pointing at https://www.semvec.io. |
| 404 | Session / cluster / region / observer / entity / memory not found, or caller's license subject does not own the resource (the server does not leak resource existence across tenants). |
| 422 | Pydantic validation failure — missing or out-of-range request field. The body conforms to FastAPI's standard {"detail": [{"loc": [...], "msg": "...", "type": "..."}]} shape. |
| 429 | Rate-limit exceeded. Response carries Retry-After: 60. The Community/Pro/Enterprise QPS numbers come from the per-SemvecState in-process bucket (see licensing), not from a server-wide HTTP throttle — see Per-tenant quoting above. For DoS protection in front of semvec serve, terminate rate-limiting at a reverse proxy. |
| 500 | Unhandled server error — logged via uvicorn access log with request ID. Investigate server logs. |
| 503 | /metrics endpoint hit without METRICS_USER / METRICS_PASSWORD env vars configured. |
The detail string on 402 includes the upgrade URL; on 401 it distinguishes between "Missing license token" and "Invalid license: …"; on 404 it tells you whether the session or the specific sub-resource was missing.
Minimal quickstart¶
import httpx
client = httpx.Client(
base_url="http://localhost:8080/v1",
headers={"X-API-Key": "eyJhbGciOiJFZERTQSI..."},
)
run = client.post("/run", json={"message": "What is Kubernetes?"}).json()
sid = run["session_id"]
# feed to your LLM with run["context"] as the system prompt ...
client.post("/store", json={"session_id": sid, "response": "Kubernetes..."})
See also¶
- Quickstart — 5-minute REST + library walk-through
- Cortex over REST — user-guide page that contextualises this API
- Architecture — abstract component model