2026-05-13">
Skip to content

Licensing

Tier selection

Use case Tier Notes
Evaluation, prototyping, open-source side projects Community No licence key required. Rate-limited per SemvecState instance.
Single product / single deployment, production traffic Pro Per-seat licence. Higher throughput; full feature surface.
Multi-tenant / multi-deployment, B2B redistribution Enterprise Per-deployment licence. SLA, indemnification, dedicated support.
Regulated workloads (audit, retention, signed deletion) Pro or Enterprise Compliance pack is unavailable on Community.

Tiers at a glance

Tier Rate limit Backends Retrieval modes Suitable for
Community (no key) ~5 calls/sec sustained, 50 burst In-memory only Base One human user; bursty test loads.
Pro ~200 calls/sec sustained, 2000 burst All Extended Production service, single team.
Enterprise Unthrottled All All Multi-tenant, regulated, distributed.

How the rate limit applies — library, REST, Cortex

The tier numbers above are enforced per SemvecState instance by an in-process token bucket inside the Rust core (see How the rate limiter works below). The bucket applies uniformly across every surface that drives a SemvecState:

  • Python library: every state.update(...) and state.calculate_*(...) call consumes one token.
  • REST API (semvec serve): every POST /v1/run, POST /v1/store, POST /v1/session/{id}/*, POST /v1/cluster/.../store, POST /v1/network/peer-transfer (and any other endpoint that touches a session) consumes one token from that session's bucket. An exhausted bucket surfaces as HTTP 429 with Retry-After in seconds — the same RateLimitError the library would raise, mapped by the FastAPI exception handler.
  • Cortex: each SemvecAgent owns a SemvecState, so per-agent buckets apply; aggregated network operations consume tokens from each participating agent's bucket.

What the bucket does not do: it is not an HTTP-level cross-session or cross-process throttle. A client opening N parallel sessions (or running N worker processes) gets N × the per-state quota — each SemvecState carries its own bucket. For multi-tenant DoS protection or per-JWT-subject HTTP rate-limiting, terminate that at a reverse proxy (nginx limit_req, Envoy local_ratelimit, or an API gateway keyed on the JWT sub claim).

Tier-specific behaviour: Community uses the 5 QPS / 50 burst bucket plus a sliding-window probe-defence layer (100/s on update, 30/s on calculate_*) intended for adversarial workloads; legitimate Community callers never reach the second layer because the bucket caps first. Pro uses a 200 QPS / 2000 burst bucket without the second layer. Enterprise is fully unthrottled — no bucket, no sliding window. The compliance event-replay path bypasses both layers regardless of tier.

Workload fit

Workload Typical calls/sec Fits Community
Single conversational user (one turn per 5–30 s) 0.05 – 0.2 yes
Coding-agent MCP server (per file save) ~0.1 yes
50-call quickstart smoke test inside burst yes
pytest suite (20 tests × 5 calls) 50 burst, then ~5/s sustained yes
Production service, concurrent users 10 – 50 no — Pro
LOCOMO benchmark replay (~25 k calls) sustained > 5/s no — batch, shard, or Pro

For batch workloads use update_batch(), shard across multiple SemvecState instances (each has its own bucket), or move to Pro / Enterprise.

Activating a license

Set the environment variable before importing semvec:

export SEMVEC_LICENSE_KEY="eyJhbGciOiJFZERTQSI..."

Keys are Ed25519-signed JWTs with a 30-day TTL. The verifying public key is baked into the wheel at build time, so verification works fully offline.

How the rate limiter works (developers)

A single bucket per SemvecState covers both update() and the on-demand calculate_* aggregate methods. The throughput budget is the combined operations-per-second on that state:

  • state.update(emb, text) consumes one token.
  • state.calculate_fsm(...) / calculate_metrics(...) / calculate_advanced_metrics(...) each consume one token.
  • The bucket refills at the tier's sustained rate up to the burst cap.

When the bucket is empty, the next call raises RateLimitError with a retry_after hint. A second per-state safety layer applies on the Community tier only and is intended for adversarial workloads; legitimate Community callers never hit it because the bucket caps first. The compliance event-replay path bypasses both layers (replay must not lock itself out re-folding its own log).

Claims schema

{
  "products": ["semvec", "cortex", "coding"],
  "tier":     "pro",
  "exp":      1799999999
}
  • products: array of strings naming the products this key unlocks.
  • tier: "Community", "Pro", or "Enterprise".
  • exp: Unix timestamp (seconds) when the key expires.

Missing product, wrong signature, and expired timestamps all produce descriptive errors.

Error handling

snippet — assumes `state`, `embedding`, `text`, `time`, `logger` are set up in the surrounding scope
from semvec import RateLimitError, LicenseExpiredError

try:
    result = state.update(embedding, text)
except RateLimitError as e:
    # e.retry_after is a datetime.timedelta
    time.sleep(e.retry_after.total_seconds())
    result = state.update(embedding, text)
except LicenseExpiredError as e:
    logger.warning("semvec license expired — renew at %s", e.upgrade_url)
    raise

Both exceptions inherit from LicenseError, which inherits from the base SemvecError. See Troubleshooting for the full symptom table.

For regulated deployments

Need offline license validation or a custom public-key rotation schedule? Contact vertrieb@versino.de for Enterprise options including:

  • Air-gapped license issuance
  • Custom TTL policies
  • Hardware-backed signing
  • SBOM + provenance attestations