Licensing¶

Tier selection¶

Use case	Tier	Notes
Evaluation, prototyping, open-source side projects	Community	No licence key required. Rate-limited per `SemvecState` instance.
Single product / single deployment, production traffic	Pro	Per-seat licence. Higher throughput; full feature surface.
Multi-tenant / multi-deployment, B2B redistribution	Enterprise	Per-deployment licence. SLA, indemnification, dedicated support.
Regulated workloads (audit, retention, signed deletion)	Pro or Enterprise	Compliance pack is unavailable on Community.

Tiers at a glance¶

Tier	Rate limit	Backends	Retrieval modes	Suitable for
Community (no key)	~5 calls/sec sustained, 50 burst	In-memory only	Base	One human user; bursty test loads.
Pro	~200 calls/sec sustained, 2000 burst	All	Extended	Production service, single team.
Enterprise	Unthrottled	All	All	Multi-tenant, regulated, distributed.

How the rate limit applies — library, REST, Cortex

The tier numbers above are enforced per SemvecState instance by an in-process token bucket inside the Rust core (see How the rate limiter works below). The bucket applies uniformly across every surface that drives a SemvecState:

Python library: every state.update(...) and state.calculate_*(...) call consumes one token.
REST API (semvec serve): every POST /v1/run, POST /v1/store, POST /v1/session/{id}/*, POST /v1/cluster/.../store, POST /v1/network/peer-transfer (and any other endpoint that touches a session) consumes one token from that session's bucket. An exhausted bucket surfaces as HTTP 429 with Retry-After in seconds — the same RateLimitError the library would raise, mapped by the FastAPI exception handler.
Cortex: each SemvecAgent owns a SemvecState, so per-agent buckets apply; aggregated network operations consume tokens from each participating agent's bucket.

What the bucket does not do: it is not an HTTP-level cross-session or cross-process throttle. A client opening N parallel sessions (or running N worker processes) gets N × the per-state quota — each SemvecState carries its own bucket. For multi-tenant DoS protection or per-JWT-subject HTTP rate-limiting, terminate that at a reverse proxy (nginx limit_req, Envoy local_ratelimit, or an API gateway keyed on the JWT sub claim).

Tier-specific behaviour: Community uses the 5 QPS / 50 burst bucket plus a sliding-window probe-defence layer (100/s on update, 30/s on calculate_*) intended for adversarial workloads; legitimate Community callers never reach the second layer because the bucket caps first. Pro uses a 200 QPS / 2000 burst bucket without the second layer. Enterprise is fully unthrottled — no bucket, no sliding window. The compliance event-replay path bypasses both layers regardless of tier.

Workload fit¶

Workload	Typical calls/sec	Fits Community
Single conversational user (one turn per 5–30 s)	0.05 – 0.2	yes
Coding-agent MCP server (per file save)	~0.1	yes
50-call quickstart smoke test	inside burst	yes
`pytest` suite (20 tests × 5 calls)	50 burst, then ~5/s sustained	yes
Production service, concurrent users	10 – 50	no — Pro
LOCOMO benchmark replay (~25 k calls)	sustained > 5/s	no — batch, shard, or Pro

For batch workloads use update_batch(), shard across multiple SemvecState instances (each has its own bucket), or move to Pro / Enterprise.

Activating a license¶

Set the environment variable before importing semvec:

export SEMVEC_LICENSE_KEY="eyJhbGciOiJFZERTQSI..."

Keys are Ed25519-signed JWTs with a 30-day TTL. The verifying public key is baked into the wheel at build time, so verification works fully offline.

How the rate limiter works (developers)¶

A single bucket per SemvecState covers both update() and the on-demand calculate_* aggregate methods. The throughput budget is the combined operations-per-second on that state:

state.update(emb, text) consumes one token.
state.calculate_fsm(...) / calculate_metrics(...) / calculate_advanced_metrics(...) each consume one token.
The bucket refills at the tier's sustained rate up to the burst cap.

When the bucket is empty, the next call raises RateLimitError with a retry_after hint. A second per-state safety layer applies on the Community tier only and is intended for adversarial workloads; legitimate Community callers never hit it because the bucket caps first. The compliance event-replay path bypasses both layers (replay must not lock itself out re-folding its own log).

Claims schema¶

{
  "products": ["semvec", "cortex", "coding"],
  "tier":     "pro",
  "exp":      1799999999
}

products: array of strings naming the products this key unlocks.
tier: "Community", "Pro", or "Enterprise".
exp: Unix timestamp (seconds) when the key expires.

Missing product, wrong signature, and expired timestamps all produce descriptive errors.

Error handling¶

snippet — assumes `state`, `embedding`, `text`, `time`, `logger` are set up in the surrounding scope

from semvec import RateLimitError, LicenseExpiredError

try:
    result = state.update(embedding, text)
except RateLimitError as e:
    # e.retry_after is a datetime.timedelta
    time.sleep(e.retry_after.total_seconds())
    result = state.update(embedding, text)
except LicenseExpiredError as e:
    logger.warning("semvec license expired — renew at %s", e.upgrade_url)
    raise

Both exceptions inherit from LicenseError, which inherits from the base SemvecError. See Troubleshooting for the full symptom table.

For regulated deployments¶

Need offline license validation or a custom public-key rotation schedule? Contact vertrieb@versino.de for Enterprise options including:

Air-gapped license issuance
Custom TTL policies
Hardware-backed signing
SBOM + provenance attestations