REST API (`semvec[api]`)¶

The optional semvec[api] extra ships a FastAPI-based HTTP service that exposes Semvec's full feature surface plus a multi-layer multi-agent coordination stack. It is auth-gated by the bundled Ed25519 JWT licensing system — the same JWT already used for in-process licensing. No password store; no separate API-key table.

pip install "semvec[api]"
semvec serve --host 0.0.0.0 --port 8080
# or programmatically
python -m uvicorn semvec.api:create_app --factory --port 8080

Auth¶

Send the license JWT via either header:

Authorization: Bearer eyJhbGciOiJFZERTQSI...
# or
X-API-Key: eyJhbGciOiJFZERTQSI...

For local development, the wheel must be built with the dev-anonymous Cargo feature for SEMVEC_ALLOW_ANONYMOUS=1 to bypass auth. The official PyPI wheel ships without this feature; every request requires a valid license JWT. To experiment locally without a license, either:

Build from source: maturin develop --features dev-anonymous, or
Issue yourself a short-TTL development license (see licensing).

Persistence¶

DATABASE_URL controls the SQLAlchemy engine. Default: sqlite:///semvec.db. Postgres is supported by setting e.g. DATABASE_URL=postgresql://user:pw@host/db. The hot semantic state lives in-memory (SessionManager); SQLite stores only session/cluster/member/region/audit metadata.

Session lifecycle¶

The in-memory SessionManager enforces an idle-TTL plus a hard cap, so a long-running worker cannot leak unbounded SemvecState instances. Tunable per worker:

Variable	Default	What it does
`SEMVEC_MAX_SESSIONS`	`10000`	Hard cap on concurrent sessions per worker. On overflow the LRU-on-idle-time session is evicted.
`SEMVEC_SESSION_IDLE_TTL_S`	`1800` (30 min)	A session that has not been touched for this long is eligible for eviction.
`SEMVEC_SESSION_SWEEP_S`	`60`	How often the background sweeper checks for expired sessions. Lower = more responsive, higher = less wake-up overhead.

Eviction is in-memory only. When a session is evicted, the SemvecState for that ID disappears from this worker — the SQLAlchemy row stays untouched, but the hot state has to be rebuilt from the next request (or restored via /v1/session/{session_id}/import). Persist proactively via GET /v1/session/{session_id}/export if you need a snapshot to survive eviction.

Graceful SIGTERM drain. SessionManager.shutdown() is wired into FastAPI's lifespan. On SIGTERM the server stops accepting new requests, in-flight ones complete, the embedder client (and sidecar, when used) closes cleanly, then the session table empties. Behind a reverse proxy / load balancer this enables zero-error rolling restarts:

# Production-style restart: send SIGTERM, wait for the process to exit on its own,
# spawn the new worker. systemd's KillSignal=SIGTERM + TimeoutStopSec=60 does this
# for free.
kill -TERM $(cat /run/semvec.pid)

Performance characteristics¶

/v1/run is async-native end-to-end since the sharpening release:

Query + last-response embeds run in parallel when both are present on the same request.
License verification is LRU-cached (Ed25519 verify, 256 entries) and bypasses the FastAPI Depends() dispatcher via a dedicated ASGI middleware — typical /v1/run no longer pays the verify cost twice.
CORS middleware is skipped when no CORS_ALLOW_ORIGINS is configured.
Threadpool default is 200 workers so synchronous sub-paths (e.g. cross-encoder reranks on CPU) do not starve other coroutines.

End-to-end measurements vs the 0.5.6 baseline on the same hardware:

Workload	Δ throughput
Mixed `/v1/run` (store + retrieve)	+63 %
QA-only flow (retrieve, no store)	+772 %
Long-term tier-consolidation hot path	+57 %
Single-pass similarity scoring	+91 %

API surface is unchanged — these are infrastructure-level wins, not new endpoints or new request shapes.

Endpoint Overview¶

Sessions¶

Method	Path	Purpose
GET	`/v1/health`	liveness + active-session count (no auth)
POST	`/v1/run`	single-turn run: retrieve context + optionally store previous answer
POST	`/v1/store`	learn from an LLM response
POST	`/v1/session/create`	explicit session creation (optional template + policy vectors)
DELETE	`/v1/session/{session_id}`	delete a session
GET	`/v1/metrics/{session_id}`	full metrics snapshot. Convenience alias: `GET /v1/state/metrics?session_id=…` accepts the session id as a query parameter.
GET	`/v1/state/context?session_id=&top_k=&full_first=&max_text_chars=`	retrieve relevant memories; each item carries a `memory_hash` + `truncated` flag. The truncation cap is caller-controlled via `max_text_chars` (default 500, range 1–100 000). With `full_first=true` the top hit is returned ungutted regardless of the cap.
GET	`/v1/session/{session_id}/memories/{memory_hash}`	expand a single memory to full text + importance + access_count + timestamp

Session Control¶

Method	Path	Purpose
POST/DELETE	`/v1/session/{session_id}/trigger`	resonance triggers (keyword + embedding)
POST	`/v1/session/{session_id}/anchor`	drift anchors
GET	`/v1/session/{session_id}/anchor_score`	anchor score + drift threshold
PUT	`/v1/session/{session_id}/isolation`	isolation filter (`OPEN` / `FILTER` / `QUARANTINE` / `LOCKDOWN`)
POST	`/v1/session/{session_id}/isolation/release`	release quarantine
POST	`/v1/session/{session_id}/memory`	synthetic memory injection
GET	`/v1/session/{session_id}/export`	serialize with checksum
POST	`/v1/session/{session_id}/import`	restore from exported dict
POST	`/v1/session/{session_id}/verify`	behavioral consistency check

Cluster¶

Method	Path	Purpose
POST	`/v1/cluster/`	create cluster (201); `aggregation_mode` = `weighted_average` or `attention`; `coupling_factor` ∈ [0, 1]
GET	`/v1/cluster/`	list owned clusters
GET	`/v1/cluster/{cluster_id}`	state + aggregate_vector
DELETE	`/v1/cluster/{cluster_id}`	tears down backing session too
POST	`/v1/cluster/{cluster_id}/store`	seed Q&A into shared session
POST	`/v1/cluster/{cluster_id}/run`	query cluster session (cluster_id == session_id)
POST	`/v1/cluster/{cluster_id}/feedback`	blend aggregate back into members
POST/DELETE	`/v1/cluster/{cluster_id}/members` / `{session_id}`	membership CRUD

Region (Consensus)¶

Method	Path	Purpose
POST	`/v1/region/`	create region (201); `consensus_threshold`, `vote_window_seconds`
GET	`/v1/region/`	list owned
GET	`/v1/region/{region_id}`	state + last_realignment + recent drift events
DELETE	`/v1/region/{region_id}`	delete region + meta-session
POST/DELETE	`/v1/region/{region_id}/clusters` / `{cluster_id}`	attach/detach clusters
GET	`/v1/region/{region_id}/events?limit=20`	recent drift events

Drift events are published internally when /run detects drift on a cluster-backing session. The DriftEventBus fans out to per-region callbacks; a realignment fires when a fraction of members > threshold vote within the rolling window.

Global Observer¶

Method	Path	Purpose
POST	`/v1/observer/`	create or return existing (idempotent per license subject)
GET	`/v1/observer/summary`	observer state incl. anomaly_count
POST	`/v1/observer/sample`	trigger manual sample
GET	`/v1/observer/anomalies`	recent anomalies (newest first)
DELETE	`/v1/observer/anomalies`	clear anomaly log
POST/DELETE	`/v1/observer/regions` / `{region_id}`	register / unregister region

Anomaly types: cross_cluster_convergence (3+ clusters across ≥ 2 regions converged to the same non-initialization phase), systemic_drift (>50 % of observed clusters show drift indicators), cluster_divergence (cluster interaction_count >3× region average).

Idempotency¶

Semvec ≤ 0.6.1 does not implement an Idempotency-Key header. Side-effecting POSTs (/v1/run, /v1/store, /v1/session/create, /v1/cluster/* writes, /v1/region/* writes, /v1/observer/*) are processed at-least-once: a client retry after a network timeout will re-apply the side effect.

Mitigations the operator owns until native support ships:

Generate session / cluster / region IDs client-side (UUID v4) and pass them in the request body where the schema accepts it. The server is (license_subject, id)-unique, so a retry with the same explicit ID returns 409 / 200 deterministically instead of creating a duplicate.
For /v1/run and /v1/store, hold a short-lived (session_id, content_hash) dedup map on the client and skip the retry if the previous attempt already completed.
Idempotency native support is on the roadmap — track via the GitHub issue tracker.

Audit events are not exposed via REST in semvec ≤ 0.6.1. Compliance routes (semvec[compliance]) exist internally but no /v1/audit/* HTTP endpoint is registered. Query the audit_log table directly via DATABASE_URL, or use the semvec.audit Python API (audit_log, audited) for programmatic access.

OpenAPI / interactive docs¶

The FastAPI app serves the standard schema and interactive docs with FastAPI defaults:

Path	Purpose
`GET /openapi.json`	OpenAPI 3.1 schema
`GET /docs`	Swagger UI
`GET /redoc`	ReDoc

All three are served behind LicenseAuthMiddleware — the public bypass list is only /v1/health and /metrics. To browse the schema you must send a valid license JWT (or run with SEMVEC_ALLOW_ANONYMOUS=1 for local development). If you need to expose /docs for an external auditor, terminate auth at a reverse proxy and have it inject the JWT, or generate a static HTML render of /openapi.json from a CI job and host it separately.

Pagination¶

There is no cursor-based pagination in semvec ≤ 0.6.1. Listing endpoints accept a limit query parameter with FastAPI-enforced ge/le bounds, but they do not emit X-Next-Cursor / X-Total-Estimate headers and the server keeps no scroll state — you cannot page past limit.

Endpoint	`limit` default	Min	Max
`GET /v1/region/{region_id}/events`	20	1	1000
`GET /v1/observer/anomalies`	20	1	1000
`GET /v1/session/{session_id}/entities` (`max_results`)	20	1	1000

Listing endpoints that return the full owned set, unpaged:

GET /v1/cluster/ — every cluster owned by the calling license subject
GET /v1/region/ — every region owned by the calling license subject
GET /v1/network/users/active — single record; not affected

On multi-tenant or long-lived deployments these can grow unbounded. Until cursor pagination ships, cap them at the reverse-proxy layer (nginx client_max_body_size for safety + an explicit application-layer prune job) or shard licenses so no one subject ever owns more than a few thousand clusters/regions.

Rate-limit headers¶

The server does not emit X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset headers. A 429 response carries only Retry-After: 60 and {"detail": "Too many requests"}. Clients that need per-tier quota visibility have to derive it from their own license-tier metadata, not from response headers.

Pre-auth requests are not rate-limited. A client sending invalid licenses gets 401 Unauthorized per request with no IP-level shedding. A reverse-proxy WAF / nginx limit_req keyed on source IP is required for DoS protection against credential-stuffing or auth-flood patterns.

Per-tenant quoting¶

Two scopes coexist and they are not the same:

Concern	Scope	Source of truth
Resource ownership (sessions, clusters, regions, observers, entities)	Per `license_subject` (`sub` claim of the license JWT)	`LicenseAuthMiddleware` populates `request.state.license`; routes filter on `license_subject(request)`
Rate limiting	Per remote IP (`slowapi.util.get_remote_address`)	`limiter = Limiter(key_func=get_remote_address)` in `semvec.api.routes`

Implication: a single license shared across N hosts gets N × the per-IP quota; conversely, several licenses behind one NAT egress share one quota. If you need per-license rate limiting, terminate it at an API gateway in front of semvec serve and key the gateway's limiter on the JWT sub claim, not on the source IP.

The community/Pro/Enterprise tier numbers documented in the licensing page describe the target enforcement, not the per-IP enforcement that ships in semvec ≤ 0.6.1. Until the limiter switches to license_subject keying, treat the tier numbers as a usage policy, not a server-side guarantee.

Network¶

Method	Path	Purpose
POST	`/v1/network/transfer`	semantic delta-vector transfer
POST	`/v1/network/users/switch`	switch user partition (saves current, activates target)
GET	`/v1/network/users/active`	currently active user
POST	`/v1/network/users/{user_id}/serialize`	serialize user partition
POST	`/v1/network/consensus`	propose consensus vector
GET	`/v1/network/consensus/trust`	current trust scores per instance

Literal cache¶

Method	Path	Purpose
POST	`/v1/session/{session_id}/entities`	store a verbatim code entity (201)
GET	`/v1/session/{session_id}/entities?q=&max_results=`	list / keyword-query
DELETE	`/v1/session/{session_id}/entities/{key:path}`	remove entity

Observability¶

/metrics exposes Prometheus metrics behind Basic Auth (METRICS_USER / METRICS_PASSWORD env vars). A request middleware collects semvec_requests_total{method, endpoint, status} and semvec_request_duration_seconds{method, endpoint} automatically.

Error handling¶

All error responses carry a JSON body with a single detail field:

{"detail": "Session not found"}

Status	When it fires
400	Malformed state-import payload (`/v1/session/{session_id}/import`), unknown `aggregation_mode` on cluster creation, unknown entity `kind` on literal-cache store.
401	Missing or invalid license JWT on any route except `/v1/health`. Also `/metrics` without valid Basic Auth.
402	License JWT signature is valid but the token is expired. Includes a `"renew at …"` hint pointing at https://www.semvec.io.
404	Session / cluster / region / observer / entity / memory not found, or caller's license subject does not own the resource (the server does not leak resource existence across tenants).
422	Pydantic validation failure — missing or out-of-range request field. The body conforms to FastAPI's standard `{"detail": [{"loc": [...], "msg": "...", "type": "..."}]}` shape.
429	Rate-limit exceeded. Response carries `Retry-After: 60`. The Community/Pro/Enterprise QPS numbers come from the per-`SemvecState` in-process bucket (see licensing), not from a server-wide HTTP throttle — see Per-tenant quoting above. For DoS protection in front of `semvec serve`, terminate rate-limiting at a reverse proxy.
500	Unhandled server error — logged via uvicorn access log with request ID. Investigate server logs.
503	`/metrics` endpoint hit without `METRICS_USER` / `METRICS_PASSWORD` env vars configured.

The detail string on 402 includes the upgrade URL; on 401 it distinguishes between "Missing license token" and "Invalid license: …"; on 404 it tells you whether the session or the specific sub-resource was missing.

Minimal quickstart¶

snippet — requires a running `semvec serve` on :8080; RunResponse exposes `session_id` and `context`

import httpx

client = httpx.Client(
    base_url="http://localhost:8080/v1",
    headers={"X-API-Key": "eyJhbGciOiJFZERTQSI..."},
)

run = client.post("/run", json={"message": "What is Kubernetes?"}).json()
sid = run["session_id"]
# feed to your LLM with run["context"] as the system prompt ...
client.post("/store", json={"session_id": sid, "response": "Kubernetes..."})

REST API (semvec[api])¶