2026-05-13">
Skip to content

CLI (semvec)

The semvec command ships with the [api] extra and wraps uvicorn to run the REST API. Install it via:

pip install "semvec[api]"

Commands

semvec serve

Start the Semvec REST API server.

semvec serve [--host HOST] [--port PORT] [--workers N] \
             [--embedder URL] [--embedder-mode inline|sidecar] \
             [--reload] [--log-level LEVEL]
Flag Type Default Description
--host str 0.0.0.0 Bind address. Use 127.0.0.1 to restrict to localhost.
--port int 8080 TCP port.
--workers int 1 Number of uvicorn worker processes. With >1 the model would normally load N times — combine with --embedder or --embedder-mode sidecar to share a single embedder across workers. Incompatible with --reload.
--embedder str unset Sidecar URL (unix:///abs/path.sock or tcp://host:port). Workers inject a SidecarEmbedderClient instead of loading the model in-process. The daemon must already be running (python -m semvec.embedder --listen ...).
--embedder-mode inline / sidecar inline inline: each worker loads its own model. sidecar: semvec serve spawns one embedder daemon, waits for READY, then starts the API workers and points them at it via UDS. Best for multi-worker deployments.
--reload bool flag off Enable uvicorn auto-reload on source change. Development only.
--log-level critical / error / warning / info / debug info Log level for both uvicorn and the semvec application.

The server loads semvec.api:create_app via uvicorn's --factory mode, so every process creates its own SessionManager, ClusterManager, etc. (state is in-memory and therefore per-worker — see the REST API for the SQLite metadata schema used for cross-worker persistence).

python -m semvec.embedder (sidecar daemon)

Stand-alone embedder daemon. The API workers connect to it over UDS or TCP. Use this when you want to scale API workers independently of the embedder, or run the embedder on a different host / GPU.

python -m semvec.embedder --listen unix:///run/semvec/embedder.sock \
                          --model all-MiniLM-L6-v2
Flag Type Default Description
--listen str required unix:///abs/path.sock (Linux/macOS) or tcp://host:port (Windows-friendly).
--model str all-MiniLM-L6-v2 Any sentence-transformers model name.
--dimension int 384 Output dimension — must match the model.
--batch-max int 32 Max texts coalesced per encode call.
--batch-wait-ms float 5.0 Max wait (ms) for a batch to fill. Lower = lower latency, higher = better GPU utilisation.
--ready-fd int unset Inheritable fd to write READY\n on once the listener is accepting. Used by semvec serve --embedder-mode sidecar for the parent/child handshake.
--log-level critical / error / warning / info / debug info Daemon log level.

The daemon installs SIGTERM / SIGINT handlers that drain in-flight batches before exit. Clients that lose the connection during drain receive a clean error and can reconnect once the daemon restarts.

Environment variables read at start-up

All variables below are read once per worker at process start. Change a value, then restart semvec serve for it to take effect. Grouped by concern.

Server & auth

Variable Default Purpose
DATABASE_URL sqlite:///semvec.db SQLAlchemy URL for the session / cluster / audit metadata store.
CORS_ORIGINS empty (no cross-origin access) Comma-separated list of allowed origins, e.g. https://app.example.com,http://localhost:5173. When unset, the CORS middleware is skipped entirely for a small per-request win.
SEMVEC_LICENSE_KEY Ed25519-signed license JWT (Pro / Enterprise features).
SEMVEC_ALLOW_ANONYMOUS unset Set to 1 to bypass license verification — development only, every request is treated as anonymous community-tier.
METRICS_USER / METRICS_PASSWORD Basic Auth for the /metrics endpoint. Must both be set to enable the endpoint.

Session lifecycle

Variable Default Purpose
SEMVEC_MAX_SESSIONS 10000 Hard cap on concurrent sessions per worker. Oldest-touched sessions are evicted on overflow.
SEMVEC_SESSION_IDLE_TTL_S 1800 (30 min) Sessions untouched for this long are evicted by the background sweeper. Set to 0 to disable.
SEMVEC_SESSION_SWEEP_S 60 How often the background task scans for idle sessions. Set to 0 to disable the sweeper entirely (useful in tests).

Embedder

Variable Default Purpose
SEMVEC_EMBEDDER_URL unset Same effect as --embedder. When set, the lifespan injects a SidecarEmbedderClient instead of loading the model in-process. Read by every worker.
SEMVEC_EMBEDDER_MODEL all-MiniLM-L6-v2 Default sentence-transformers model name the sidecar daemon loads when --model is not provided. Override per-deployment.
SEMVEC_EMBEDDER_DIM 384 Output dimension expected from the sidecar; must match the model the daemon was launched with.
SEMVEC_EMBEDDER_CACHE_SIZE 0 (disabled) When >0, wraps the injected embedder in a CachedEmbedder with this LRU capacity. Cache hits skip the model; concurrent submits for the same text dedup onto one underlying encode. Cheapest path to ×2–×3 RPS on chat traffic.
SEMVEC_USE_RUST_EMBEDDER / SEMVEC_EMBEDDER_BIN unset Opt-in switches that make --embedder-mode sidecar spawn the Rust semvec-embedder binary instead of the Python daemon. See Embedders guide.

Retrieval (/v1/run)

Variable Default Purpose
SEMVEC_RUN_TOP_K 5 Number of memories surfaced per /v1/run (used by the context block, short-circuit, and drift scoring). Raising it catches lexically-distant facts; lowering it keeps prompts tight.
SEMVEC_MMR_FETCH_K 0 (disabled) When > SEMVEC_RUN_TOP_K, fetch this many candidates and Maximal-Marginal-Relevance rerank down to SEMVEC_RUN_TOP_K. Demotes near-duplicate memories so diverse facts survive into the final set. 50–200 is a good starting range.
SEMVEC_MMR_LAMBDA 0.5 MMR relevance/diversity mix. 1.0 = pure cosine retrieval (no diversity), 0.0 = pure diversity (no relevance).
SEMVEC_CONTEXT_BUDGET_CHARS 4000 Total character budget for the context string returned by /v1/run, packed sum-as-you-go across retrieved memories. Replaces the legacy per-memory 150-char cap. Long memories use what they need; short ones don't waste budget.

BM25-hybrid retrieval (opt-in, needs semvec[hybrid])

Variable Default Purpose
SEMVEC_HYBRID_BM25 0 (off) Master switch. When 1, every session also maintains a per-session BM25 index and /v1/run fuses dense + lexical candidates via Reciprocal Rank Fusion.
SEMVEC_BM25_FETCH_K 50 BM25 top-K fed into the fusion.
SEMVEC_BM25_REBUILD_EVERY 64 Ingests between snapshot rebuilds of the per-session BM25 index. Lower = fresher BM25 at higher rebuild cost.
SEMVEC_RRF_K 60 RRF smoothing constant. The standard value from the RRF paper; rarely worth changing.
SEMVEC_RRF_WEIGHTS unset (uniform) Comma-separated per-list weights, e.g. "1.0,0.4" to halve the BM25 contribution. Useful when BM25 hurts single-fact precision.

Cross-encoder rerank (opt-in)

Variable Default Purpose
SEMVEC_RERANK_MODEL unset (off) HuggingFace model ID, e.g. cross-encoder/ms-marco-MiniLM-L-6-v2. When set, /v1/run reranks the BM25 / dense fusion output through this cross-encoder before returning the final top-K.
SEMVEC_RERANK_FETCH_K 50 Candidate pool fed into the cross-encoder.
SEMVEC_RERANK_BATCH 64 Cross-encoder batch size. Tune against the GPU/CPU running the worker.
SEMVEC_RERANK_FP16 0 Set 1 for FP16 inference on GPU — typically 1.5–2× faster with no observable quality loss.
SEMVEC_RERANK_THREADS os.cpu_count() Torch intra-op thread cap for CPU inference. Set lower if you co-locate the API with other CPU-heavy tasks.

Anchors & extraction

Variable Default Purpose
SEMVEC_TOPIC_SWITCH 1 Master switch for the topic-switch detector. 0 disables — useful for parity tests that must hold the state still.
PSS_TOPIC_SWITCH Deprecated legacy alias for SEMVEC_TOPIC_SWITCH read as a fallback by the session manager; scheduled for removal in 1.0. Prefer the SEMVEC_*-prefixed variable.
SEMVEC_AUTO_ANCHOR_ON_TOPIC_SWITCH 0 Set 1 to snapshot semantic_state as a fresh anchor every time the detector fires. Capped by SEMVEC_MAX_AUTO_ANCHORS.
SEMVEC_AUTO_ANCHOR_FROM_EXTRACT 0 Set 1 to also create anchors from extracted-entity embeddings (when auto-extract is on).
SEMVEC_MAX_AUTO_ANCHORS 8 Cap on the number of anchors created via either auto-anchor path.
SEMVEC_AUTO_EXTRACT 0 Set 1 to enable best-effort numeric / entity extraction from ingested text.
SEMVEC_AUTO_EXTRACT_BROAD 0 Broader extractor profile (more recall, more noise). Implies SEMVEC_AUTO_EXTRACT=1.
SEMVEC_ENABLE_NUMERIC_EXTRACTOR 1 Set 0 to disable the numeric extractor (IBAN, amounts, IDs) — useful when downstream code does its own extraction.

Compliance & event store (semvec[compliance])

Variable Default Purpose
SEMVEC_ENABLE_EVENT_STORE 0 Set 1 to write every state mutation into the append-only event store. Required for deterministic replay and signed deletion certificates.
SEMVEC_ENABLE_HMAC_SIGNING 0 Set 1 to sign every event-store entry with HMAC for tamper-evidence. Requires a key configured in the compliance config.
SEMVEC_ENABLE_RS256_JWT 0 Set 1 to issue RS256-signed user JWTs from the compliance routes (vs HS256). Requires a private key.
SEMVEC_ENABLE_RETENTION_SWEEPER 0 Set 1 to run the background retention sweeper that deletes events older than the configured retention horizon.
SEMVEC_RETENTION_DAYS_AUDIT 2557 (≈ 7 years) Retention horizon for audit events.
SEMVEC_RETENTION_DAYS_CHAT 365 Retention horizon for chat events.
SEMVEC_COMPLIANCE_PUBKEY_FILE unset Path to the compliance verifier public key (PEM). Used to verify signed deletion certificates and RS256 JWTs.
SEMVEC_COMPLIANCE_PUBKEY_PEM unset Inline PEM alternative to …_FILE.
SEMVEC_COMPLIANCE_PRIVKEY_FILE unset Path to the compliance signing private key. Issuer side only — never set on verifying instances.
SEMVEC_COMPLIANCE_PRIVKEY_PEM unset Inline PEM alternative to …_FILE.

Licensing internals

Variable Default Purpose
SEMVEC_LICENSE_KEY Ed25519-signed license JWT. Required for Pro / Enterprise features and quotas.
SEMVEC_LICENSE_LRU_SIZE 256 LRU cache size for verified JWTs. Higher = more memory, fewer signature verifies per second.

API process

Variable Default Purpose
SEMVEC_API_THREADPOOL 200 Size of the asyncio default executor thread pool. Cap that bounds in-flight blocking work.
SEMVEC_STATE_DIR .semvec Default directory CodingEngine and adjacent components use for persistent state.

The API contract version matches semvec.__version__ from the installed wheel (informational; not a runtime knob).

Build-time-only environment variables (wheel builders only)

Not consumed at runtime

The variables below are read only when building semvec from source (wheel / sdist construction). Setting them on a running semvec serve process has no effect — the installed wheel already has the relevant values baked in.

Variable Default Purpose
SEMVEC_BASE_URL unset Public base URL baked into the built artefact for absolute-link generation.
SEMVEC_EMBEDDED_PUBKEY_PATH build-time baked Override the embedded verifier public-key path picked up by the build.
SEMVEC_PROD_PUBKEY_FILE Path to a production public-key bundle. Used by the build to bake the correct verifier.
SEMVEC_PROD_PUBKEY_PEM Inline PEM alternative to …_FILE.
SEMVEC_BUILD_ALLOW_DEV_KEY 0 Set 1 to allow the dev verifier in a release build (refused by default).
SEMVEC_COMPLIANCE_PUBKEY_TARGET unset Path the build rewrites with the latest pubkey when rotating from a hot key registry.

Examples

Local development

export SEMVEC_ALLOW_ANONYMOUS=1
export DATABASE_URL="sqlite:///dev.db"
semvec serve --host 127.0.0.1 --port 8080 --reload --log-level debug

Production behind a reverse proxy

export DATABASE_URL="postgresql://semvec:pass@db/semvec"
export CORS_ORIGINS="https://app.example.com"
export METRICS_USER="prom"
export METRICS_PASSWORD="$(cat /run/secrets/metrics_password)"
semvec serve --host 0.0.0.0 --port 8080 --log-level info

Behind nginx / an ALB, the server trusts the X-Forwarded-For and X-Real-IP headers for client-IP resolution (used by the rate limiter and the audit log).

Programmatic start (without the CLI)

python -m uvicorn semvec.api:create_app --factory --host 0.0.0.0 --port 8080

Same effect, handy when you want to wire the factory into a larger ASGI app (e.g. mounted under a prefix).

See also