CLI (`semvec`)¶

The semvec command ships with the [api] extra and wraps uvicorn to run the REST API. Install it via:

pip install "semvec[api]"

Commands¶

`semvec serve`¶

Start the Semvec REST API server.

semvec serve [--host HOST] [--port PORT] [--workers N] \
             [--embedder URL] [--embedder-mode inline|sidecar] \
             [--reload] [--log-level LEVEL]

Flag	Type	Default	Description
`--host`	`str`	`0.0.0.0`	Bind address. Use `127.0.0.1` to restrict to localhost.
`--port`	`int`	`8080`	TCP port.
`--workers`	`int`	`1`	Number of uvicorn worker processes. With `>1` the model would normally load N times — combine with `--embedder` or `--embedder-mode sidecar` to share a single embedder across workers. Incompatible with `--reload`.
`--embedder`	`str`	unset	Sidecar URL (`unix:///abs/path.sock` or `tcp://host:port`). Workers inject a `SidecarEmbedderClient` instead of loading the model in-process. The daemon must already be running (`python -m semvec.embedder --listen ...`).
`--embedder-mode`	`inline` / `sidecar`	`inline`	`inline`: each worker loads its own model. `sidecar`: `semvec serve` spawns one embedder daemon, waits for READY, then starts the API workers and points them at it via UDS. Best for multi-worker deployments.
`--reload`	bool flag	off	Enable uvicorn auto-reload on source change. Development only.
`--log-level`	`critical` / `error` / `warning` / `info` / `debug`	`info`	Log level for both uvicorn and the semvec application.

The server loads semvec.api:create_app via uvicorn's --factory mode, so every process creates its own SessionManager, ClusterManager, etc. (state is in-memory and therefore per-worker — see the REST API for the SQLite metadata schema used for cross-worker persistence).

`python -m semvec.embedder` (sidecar daemon)¶

Stand-alone embedder daemon. The API workers connect to it over UDS or TCP. Use this when you want to scale API workers independently of the embedder, or run the embedder on a different host / GPU.

python -m semvec.embedder --listen unix:///run/semvec/embedder.sock \
                          --model all-MiniLM-L6-v2

Flag	Type	Default	Description
`--listen`	`str`	required	`unix:///abs/path.sock` (Linux/macOS) or `tcp://host:port` (Windows-friendly).
`--model`	`str`	`all-MiniLM-L6-v2`	Any sentence-transformers model name.
`--dimension`	`int`	`384`	Output dimension — must match the model.
`--batch-max`	`int`	`32`	Max texts coalesced per encode call.
`--batch-wait-ms`	`float`	`5.0`	Max wait (ms) for a batch to fill. Lower = lower latency, higher = better GPU utilisation.
`--ready-fd`	`int`	unset	Inheritable fd to write `READY\n` on once the listener is accepting. Used by `semvec serve --embedder-mode sidecar` for the parent/child handshake.
`--log-level`	`critical` / `error` / `warning` / `info` / `debug`	`info`	Daemon log level.

The daemon installs SIGTERM / SIGINT handlers that drain in-flight batches before exit. Clients that lose the connection during drain receive a clean error and can reconnect once the daemon restarts.

Environment variables read at start-up¶

All variables below are read once per worker at process start. Change a value, then restart semvec serve for it to take effect. Grouped by concern.

Server & auth¶

Variable	Default	Purpose
`DATABASE_URL`	`sqlite:///semvec.db`	SQLAlchemy URL for the session / cluster / audit metadata store.
`CORS_ORIGINS`	empty (no cross-origin access)	Comma-separated list of allowed origins, e.g. `https://app.example.com,http://localhost:5173`. When unset, the CORS middleware is skipped entirely for a small per-request win.
`SEMVEC_LICENSE_KEY`	—	Ed25519-signed license JWT (Pro / Enterprise features).
`SEMVEC_ALLOW_ANONYMOUS`	unset	Set to `1` to bypass license verification — development only, every request is treated as anonymous community-tier.
`METRICS_USER` / `METRICS_PASSWORD`	—	Basic Auth for the `/metrics` endpoint. Must both be set to enable the endpoint.

Session lifecycle¶

Variable	Default	Purpose
`SEMVEC_MAX_SESSIONS`	`10000`	Hard cap on concurrent sessions per worker. Oldest-touched sessions are evicted on overflow.
`SEMVEC_SESSION_IDLE_TTL_S`	`1800` (30 min)	Sessions untouched for this long are evicted by the background sweeper. Set to `0` to disable.
`SEMVEC_SESSION_SWEEP_S`	`60`	How often the background task scans for idle sessions. Set to `0` to disable the sweeper entirely (useful in tests).

Embedder¶

Variable	Default	Purpose
`SEMVEC_EMBEDDER_URL`	unset	Same effect as `--embedder`. When set, the lifespan injects a `SidecarEmbedderClient` instead of loading the model in-process. Read by every worker.
`SEMVEC_EMBEDDER_MODEL`	`all-MiniLM-L6-v2`	Default sentence-transformers model name the sidecar daemon loads when `--model` is not provided. Override per-deployment.
`SEMVEC_EMBEDDER_DIM`	`384`	Output dimension expected from the sidecar; must match the model the daemon was launched with.
`SEMVEC_EMBEDDER_CACHE_SIZE`	`0` (disabled)	When `>0`, wraps the injected embedder in a `CachedEmbedder` with this LRU capacity. Cache hits skip the model; concurrent submits for the same text dedup onto one underlying encode. Cheapest path to ×2–×3 RPS on chat traffic.
`SEMVEC_USE_RUST_EMBEDDER` / `SEMVEC_EMBEDDER_BIN`	unset	Opt-in switches that make `--embedder-mode sidecar` spawn the Rust `semvec-embedder` binary instead of the Python daemon. See Embedders guide.

Retrieval (`/v1/run`)¶

Variable	Default	Purpose
`SEMVEC_RUN_TOP_K`	`5`	Number of memories surfaced per `/v1/run` (used by the context block, short-circuit, and drift scoring). Raising it catches lexically-distant facts; lowering it keeps prompts tight.
`SEMVEC_MMR_FETCH_K`	`0` (disabled)	When `> SEMVEC_RUN_TOP_K`, fetch this many candidates and Maximal-Marginal-Relevance rerank down to `SEMVEC_RUN_TOP_K`. Demotes near-duplicate memories so diverse facts survive into the final set. 50–200 is a good starting range.
`SEMVEC_MMR_LAMBDA`	`0.5`	MMR relevance/diversity mix. `1.0` = pure cosine retrieval (no diversity), `0.0` = pure diversity (no relevance).
`SEMVEC_CONTEXT_BUDGET_CHARS`	`4000`	Total character budget for the `context` string returned by `/v1/run`, packed sum-as-you-go across retrieved memories. Replaces the legacy per-memory 150-char cap. Long memories use what they need; short ones don't waste budget.

BM25-hybrid retrieval (opt-in, needs `semvec[hybrid]`)¶

Variable	Default	Purpose
`SEMVEC_HYBRID_BM25`	`0` (off)	Master switch. When `1`, every session also maintains a per-session BM25 index and `/v1/run` fuses dense + lexical candidates via Reciprocal Rank Fusion.
`SEMVEC_BM25_FETCH_K`	`50`	BM25 top-K fed into the fusion.
`SEMVEC_BM25_REBUILD_EVERY`	`64`	Ingests between snapshot rebuilds of the per-session BM25 index. Lower = fresher BM25 at higher rebuild cost.
`SEMVEC_RRF_K`	`60`	RRF smoothing constant. The standard value from the RRF paper; rarely worth changing.
`SEMVEC_RRF_WEIGHTS`	unset (uniform)	Comma-separated per-list weights, e.g. `"1.0,0.4"` to halve the BM25 contribution. Useful when BM25 hurts single-fact precision.

Cross-encoder rerank (opt-in)¶

Variable	Default	Purpose
`SEMVEC_RERANK_MODEL`	unset (off)	HuggingFace model ID, e.g. `cross-encoder/ms-marco-MiniLM-L-6-v2`. When set, `/v1/run` reranks the BM25 / dense fusion output through this cross-encoder before returning the final top-K.
`SEMVEC_RERANK_FETCH_K`	`50`	Candidate pool fed into the cross-encoder.
`SEMVEC_RERANK_BATCH`	`64`	Cross-encoder batch size. Tune against the GPU/CPU running the worker.
`SEMVEC_RERANK_FP16`	`0`	Set `1` for FP16 inference on GPU — typically 1.5–2× faster with no observable quality loss.
`SEMVEC_RERANK_THREADS`	`os.cpu_count()`	Torch intra-op thread cap for CPU inference. Set lower if you co-locate the API with other CPU-heavy tasks.

Anchors & extraction¶

Variable	Default	Purpose
`SEMVEC_TOPIC_SWITCH`	`1`	Master switch for the topic-switch detector. `0` disables — useful for parity tests that must hold the state still.
`PSS_TOPIC_SWITCH`	—	Deprecated legacy alias for `SEMVEC_TOPIC_SWITCH` read as a fallback by the session manager; scheduled for removal in 1.0. Prefer the `SEMVEC_*`-prefixed variable.
`SEMVEC_AUTO_ANCHOR_ON_TOPIC_SWITCH`	`0`	Set `1` to snapshot `semantic_state` as a fresh anchor every time the detector fires. Capped by `SEMVEC_MAX_AUTO_ANCHORS`.
`SEMVEC_AUTO_ANCHOR_FROM_EXTRACT`	`0`	Set `1` to also create anchors from extracted-entity embeddings (when auto-extract is on).
`SEMVEC_MAX_AUTO_ANCHORS`	`8`	Cap on the number of anchors created via either auto-anchor path.
`SEMVEC_AUTO_EXTRACT`	`0`	Set `1` to enable best-effort numeric / entity extraction from ingested text.
`SEMVEC_AUTO_EXTRACT_BROAD`	`0`	Broader extractor profile (more recall, more noise). Implies `SEMVEC_AUTO_EXTRACT=1`.
`SEMVEC_ENABLE_NUMERIC_EXTRACTOR`	`1`	Set `0` to disable the numeric extractor (IBAN, amounts, IDs) — useful when downstream code does its own extraction.

Compliance & event store (`semvec[compliance]`)¶

Variable	Default	Purpose
`SEMVEC_ENABLE_EVENT_STORE`	`0`	Set `1` to write every state mutation into the append-only event store. Required for deterministic replay and signed deletion certificates.
`SEMVEC_ENABLE_HMAC_SIGNING`	`0`	Set `1` to sign every event-store entry with HMAC for tamper-evidence. Requires a key configured in the compliance config.
`SEMVEC_ENABLE_RS256_JWT`	`0`	Set `1` to issue RS256-signed user JWTs from the compliance routes (vs HS256). Requires a private key.
`SEMVEC_ENABLE_RETENTION_SWEEPER`	`0`	Set `1` to run the background retention sweeper that deletes events older than the configured retention horizon.
`SEMVEC_RETENTION_DAYS_AUDIT`	`2557` (≈ 7 years)	Retention horizon for audit events.
`SEMVEC_RETENTION_DAYS_CHAT`	`365`	Retention horizon for chat events.
`SEMVEC_COMPLIANCE_PUBKEY_FILE`	unset	Path to the compliance verifier public key (PEM). Used to verify signed deletion certificates and RS256 JWTs.
`SEMVEC_COMPLIANCE_PUBKEY_PEM`	unset	Inline PEM alternative to `…_FILE`.
`SEMVEC_COMPLIANCE_PRIVKEY_FILE`	unset	Path to the compliance signing private key. Issuer side only — never set on verifying instances.
`SEMVEC_COMPLIANCE_PRIVKEY_PEM`	unset	Inline PEM alternative to `…_FILE`.

Licensing internals¶

Variable	Default	Purpose
`SEMVEC_LICENSE_KEY`	—	Ed25519-signed license JWT. Required for Pro / Enterprise features and quotas.
`SEMVEC_LICENSE_LRU_SIZE`	`256`	LRU cache size for verified JWTs. Higher = more memory, fewer signature verifies per second.

API process¶

Variable	Default	Purpose
`SEMVEC_API_THREADPOOL`	`200`	Size of the asyncio default executor thread pool. Cap that bounds in-flight blocking work.
`SEMVEC_STATE_DIR`	`.semvec`	Default directory `CodingEngine` and adjacent components use for persistent state.

The API contract version matches semvec.__version__ from the installed wheel (informational; not a runtime knob).

Build-time-only environment variables (wheel builders only)¶

Not consumed at runtime

The variables below are read only when building semvec from source (wheel / sdist construction). Setting them on a running semvec serve process has no effect — the installed wheel already has the relevant values baked in.

Variable	Default	Purpose
`SEMVEC_BASE_URL`	unset	Public base URL baked into the built artefact for absolute-link generation.
`SEMVEC_EMBEDDED_PUBKEY_PATH`	build-time baked	Override the embedded verifier public-key path picked up by the build.
`SEMVEC_PROD_PUBKEY_FILE`	—	Path to a production public-key bundle. Used by the build to bake the correct verifier.
`SEMVEC_PROD_PUBKEY_PEM`	—	Inline PEM alternative to `…_FILE`.
`SEMVEC_BUILD_ALLOW_DEV_KEY`	`0`	Set `1` to allow the dev verifier in a release build (refused by default).
`SEMVEC_COMPLIANCE_PUBKEY_TARGET`	unset	Path the build rewrites with the latest pubkey when rotating from a hot key registry.

Examples¶

Local development¶

export SEMVEC_ALLOW_ANONYMOUS=1
export DATABASE_URL="sqlite:///dev.db"
semvec serve --host 127.0.0.1 --port 8080 --reload --log-level debug

Production behind a reverse proxy¶

export DATABASE_URL="postgresql://semvec:pass@db/semvec"
export CORS_ORIGINS="https://app.example.com"
export METRICS_USER="prom"
export METRICS_PASSWORD="$(cat /run/secrets/metrics_password)"
semvec serve --host 0.0.0.0 --port 8080 --log-level info

Behind nginx / an ALB, the server trusts the X-Forwarded-For and X-Real-IP headers for client-IP resolution (used by the rate limiter and the audit log).

Programmatic start (without the CLI)¶

python -m uvicorn semvec.api:create_app --factory --host 0.0.0.0 --port 8080

Same effect, handy when you want to wire the factory into a larger ASGI app (e.g. mounted under a prefix).

CLI (semvec)¶