Full tour (15 min)¶
What you will build¶
A working understanding of every entry point Semvec offers, in the recommended order — REST first (lowest setup friction), then the in-process library and its extensions:
- REST API server (
semvec serve) + a curl probe. - Cortex over REST (clusters, regions).
- In-process
SemvecStatewith a real embedder. - Token-reduction proxy (drop-in replacement for an LLM call).
- Cortex in-process (
SemvecAgent/SemvecAgentNetwork). - Coding-agent compaction pipeline.
- A benchmark run on a tiny LOCOMO slice.
This tour intentionally repeats imports so each block is self-contained — copy any block on its own and it should run.
1. REST API¶
pip install 'semvec[api]' sentence-transformers
semvec serve --port 8001 &
curl -s localhost:8001/v1/health
semvec serve loads all-MiniLM-L6-v2 in-process by default. Switch
to a sidecar daemon (single embedder shared across workers) via
--embedder-mode sidecar or --embedder unix:///path/to.sock.
curl -s -X POST localhost:8001/v1/run \
-H 'Content-Type: application/json' \
-d '{"session_id": "demo", "input": "Hello from curl."}'
/v1/run returns the same compact context block that the in-process
library produces — just over HTTP. See the
REST API reference for the full endpoint
catalogue and the CLI reference for semvec
serve flags + SEMVEC_* environment variables.
2. Cortex over REST¶
Multi-agent coordination without leaving the HTTP boundary — create a
cluster, register peers, and exchange state-vector packets via the
/v1/cluster/* and /v1/network/* endpoints.
# Create a cluster
curl -s -X POST localhost:8001/v1/cluster/create \
-H 'Content-Type: application/json' \
-d '{"cluster_id": "team-eu"}'
# Join two sessions to it
curl -s -X POST localhost:8001/v1/cluster/team-eu/join \
-H 'Content-Type: application/json' \
-d '{"session_id": "agent-a"}'
curl -s -X POST localhost:8001/v1/cluster/team-eu/join \
-H 'Content-Type: application/json' \
-d '{"session_id": "agent-b"}'
For region pinning, observers, and trust-score consensus see Cortex over REST.
3. In-process SemvecState¶
When the integration outgrows the HTTP boundary — tight per-turn latency, custom embedders inside the host process, shared in-process state — drop into the library directly.
from semvec import SemvecState, SemvecConfig
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
state = SemvecState(config=SemvecConfig(dimension=384))
text = "User: I want to discuss our European customer onboarding flow."
result = state.update(model.encode(text), text)
print(result["phase"], result["fsm"])
4. Token-reduction proxy¶
import os
from semvec.token_reduction import SemvecChatProxy, create_llm_client
# create_llm_client reads OPENAI_API_KEY / OPENAI_BASE_URL / OPENAI_MODEL
# (or OLLAMA_* equivalents) from the environment.
os.environ.setdefault("OPENAI_MODEL", "gpt-4o")
llm = create_llm_client("openai")
# SemvecChatProxy owns its own SemvecState and SerializerConfig.
# Pass an embedder if you have one, or install sentence-transformers
# and let the proxy auto-load `all-MiniLM-L6-v2`.
proxy = SemvecChatProxy(
llm_call=llm,
system_prompt="You are a helpful assistant.",
)
result = proxy.chat("Continue the customer onboarding discussion.")
print(result.response)
The proxy replaces your full chat history with a Semvec-compressed context block. Your LLM call uses constant input tokens regardless of how long the conversation has run. See the Token-reduction API reference for every constructor kwarg.
5. Cortex in-process¶
SemvecAgent is the per-agent primitive — give it an instance_id and
an embedder, then drive it with process_input(text):
import numpy as np
from sentence_transformers import SentenceTransformer
from semvec.cortex import SemvecAgent
model = SentenceTransformer("all-MiniLM-L6-v2")
class STEmbedder:
def get_dimension(self): return 384
def get_embedding(self, text):
return np.asarray(model.encode(text), dtype=np.float64)
embedder = STEmbedder()
alice = SemvecAgent(instance_id="alice", embedder=embedder)
result = alice.process_input("Met with European customer; they need SEPA support.")
print(result["result"]["phase"], "coherence:", alice.local_coherence)
For multiple coordinated agents register them in a SemvecAgentNetwork
and drive turns via network.process_input(instance_id, text) — see
Cortex for aggregation, feedback vectors,
and consensus voting.
6. Coding-agent compaction¶
CodingEngine needs an explicit state_dir and an embedder (no
hash-based fallback in 0.6.1):
import numpy as np
from sentence_transformers import SentenceTransformer
from semvec.coding import CodingEngine
model = SentenceTransformer("all-MiniLM-L6-v2")
class STEmbedder:
def get_dimension(self): return 384
def get_embedding(self, text):
if not text.strip(): return np.zeros(384)
return np.asarray(model.encode(text), dtype=np.float64)
engine = CodingEngine(state_dir="./.semvec", embedder=STEmbedder())
engine.register_code_change("auth.py", "JWT validation", "def verify(token)")
engine.record_error("TypeError: None has no attribute 'split'", source="runtime_error")
ctx = engine.get_compacted_context(
"implement password reset flow",
invariants=["never log plaintext passwords"],
)
print(ctx)
engine.save_state()
The compacted context is Markdown ready to paste into a fresh agent's system prompt — see Coding agents.
7. Benchmarks (LOCOMO slice)¶
The harness runs the same pipeline end-to-end and reports F1 + token efficiency vs the published baselines. See Benchmarks.
Next¶
- Pick your usage path: Choose your path
- Tune retrieval: Correcting memories
- Pick the right embedder: Embedders
- Productionise with audit / retention: Compliance pack