2026-05-13">
Skip to content

Full tour (15 min)

What you will build

A working understanding of every entry point Semvec offers, in the recommended order — REST first (lowest setup friction), then the in-process library and its extensions:

  1. REST API server (semvec serve) + a curl probe.
  2. Cortex over REST (clusters, regions).
  3. In-process SemvecState with a real embedder.
  4. Token-reduction proxy (drop-in replacement for an LLM call).
  5. Cortex in-process (SemvecAgent / SemvecAgentNetwork).
  6. Coding-agent compaction pipeline.
  7. A benchmark run on a tiny LOCOMO slice.

This tour intentionally repeats imports so each block is self-contained — copy any block on its own and it should run.

1. REST API

pip install 'semvec[api]' sentence-transformers
semvec serve --port 8001 &
curl -s localhost:8001/v1/health

semvec serve loads all-MiniLM-L6-v2 in-process by default. Switch to a sidecar daemon (single embedder shared across workers) via --embedder-mode sidecar or --embedder unix:///path/to.sock.

curl -s -X POST localhost:8001/v1/run \
  -H 'Content-Type: application/json' \
  -d '{"session_id": "demo", "input": "Hello from curl."}'

/v1/run returns the same compact context block that the in-process library produces — just over HTTP. See the REST API reference for the full endpoint catalogue and the CLI reference for semvec serve flags + SEMVEC_* environment variables.

2. Cortex over REST

Multi-agent coordination without leaving the HTTP boundary — create a cluster, register peers, and exchange state-vector packets via the /v1/cluster/* and /v1/network/* endpoints.

# Create a cluster
curl -s -X POST localhost:8001/v1/cluster/create \
  -H 'Content-Type: application/json' \
  -d '{"cluster_id": "team-eu"}'

# Join two sessions to it
curl -s -X POST localhost:8001/v1/cluster/team-eu/join \
  -H 'Content-Type: application/json' \
  -d '{"session_id": "agent-a"}'
curl -s -X POST localhost:8001/v1/cluster/team-eu/join \
  -H 'Content-Type: application/json' \
  -d '{"session_id": "agent-b"}'

For region pinning, observers, and trust-score consensus see Cortex over REST.

3. In-process SemvecState

When the integration outgrows the HTTP boundary — tight per-turn latency, custom embedders inside the host process, shared in-process state — drop into the library directly.

from semvec import SemvecState, SemvecConfig
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
state = SemvecState(config=SemvecConfig(dimension=384))

text = "User: I want to discuss our European customer onboarding flow."
result = state.update(model.encode(text), text)
print(result["phase"], result["fsm"])

4. Token-reduction proxy

import os
from semvec.token_reduction import SemvecChatProxy, create_llm_client

# create_llm_client reads OPENAI_API_KEY / OPENAI_BASE_URL / OPENAI_MODEL
# (or OLLAMA_* equivalents) from the environment.
os.environ.setdefault("OPENAI_MODEL", "gpt-4o")
llm = create_llm_client("openai")

# SemvecChatProxy owns its own SemvecState and SerializerConfig.
# Pass an embedder if you have one, or install sentence-transformers
# and let the proxy auto-load `all-MiniLM-L6-v2`.
proxy = SemvecChatProxy(
    llm_call=llm,
    system_prompt="You are a helpful assistant.",
)

result = proxy.chat("Continue the customer onboarding discussion.")
print(result.response)

The proxy replaces your full chat history with a Semvec-compressed context block. Your LLM call uses constant input tokens regardless of how long the conversation has run. See the Token-reduction API reference for every constructor kwarg.

5. Cortex in-process

SemvecAgent is the per-agent primitive — give it an instance_id and an embedder, then drive it with process_input(text):

import numpy as np
from sentence_transformers import SentenceTransformer
from semvec.cortex import SemvecAgent

model = SentenceTransformer("all-MiniLM-L6-v2")

class STEmbedder:
    def get_dimension(self): return 384
    def get_embedding(self, text):
        return np.asarray(model.encode(text), dtype=np.float64)

embedder = STEmbedder()
alice = SemvecAgent(instance_id="alice", embedder=embedder)

result = alice.process_input("Met with European customer; they need SEPA support.")
print(result["result"]["phase"], "coherence:", alice.local_coherence)

For multiple coordinated agents register them in a SemvecAgentNetwork and drive turns via network.process_input(instance_id, text) — see Cortex for aggregation, feedback vectors, and consensus voting.

6. Coding-agent compaction

CodingEngine needs an explicit state_dir and an embedder (no hash-based fallback in 0.6.1):

import numpy as np
from sentence_transformers import SentenceTransformer
from semvec.coding import CodingEngine

model = SentenceTransformer("all-MiniLM-L6-v2")

class STEmbedder:
    def get_dimension(self): return 384
    def get_embedding(self, text):
        if not text.strip(): return np.zeros(384)
        return np.asarray(model.encode(text), dtype=np.float64)

engine = CodingEngine(state_dir="./.semvec", embedder=STEmbedder())
engine.register_code_change("auth.py", "JWT validation", "def verify(token)")
engine.record_error("TypeError: None has no attribute 'split'", source="runtime_error")
ctx = engine.get_compacted_context(
    "implement password reset flow",
    invariants=["never log plaintext passwords"],
)
print(ctx)
engine.save_state()

The compacted context is Markdown ready to paste into a fresh agent's system prompt — see Coding agents.

7. Benchmarks (LOCOMO slice)

cd benchmarks
python run_locomo.py --conversations conv-1 --judge gpt-4o

The harness runs the same pipeline end-to-end and reports F1 + token efficiency vs the published baselines. See Benchmarks.

Next