Cortex (overview)¶
semvec.cortex is the multi-agent coordination layer. Multiple agents — analyst, planner, critic, per-tenant agents, per-task agents — share an aggregated view, exchange checksummed state vectors, and vote on proposals through a five-level consensus engine. There are three usage paths; the right one depends on how many agents you run, where they live, and whether anyone outside your process needs to talk to them.
Pick a path¶
| You want to… | Path | Pulls in |
|---|---|---|
| Coordinate 2–10 agents inside one Python process | SemvecAgentNetwork (in-process) |
[cortex] (marker — primitives are always available) |
| Plug a custom persistent store under the cortex (e.g. async fetch from Postgres / Redis / pgvector) | SemvecCortexService with pss_store= |
[cortex] plus your store |
| Expose multi-agent coordination across machines, services, or tenants — clusters, regions, observers, drift events, trust scores | REST API: /v1/cluster/, /v1/region/, /v1/observer/, /v1/network/ |
[api] (FastAPI, JWT) |
The REST path covers a much larger surface than the in-process API — it gates everything behind Ed25519 JWT auth, tracks ownership per license subject, persists session / cluster / region metadata in SQLite or Postgres, and adds machinery (drift bus, trust scores, anomaly detection) that does not exist in the in-process API. Treat it as the production surface when more than one process is involved. → Detailed walk-through: REST-hosted Cortex guide.
Path 1 — SemvecAgentNetwork (in-process)¶
Lightweight container for several SemvecAgent objects, aggregated into a single SemvecCortexObserver. The right shape when an analyst, planner, and critic all live in the same process and just need to share state and exchange feedback.
from semvec.cortex import SemvecAgentNetwork, AttentionAggregation
network = SemvecAgentNetwork(
aggregation_strategy=AttentionAggregation(dimension=768),
enable_feedback=True,
feedback_strength=0.3,
max_instances=10,
dimension=768,
)
network.add_local_instance("analyst")
network.add_local_instance("planner")
network.add_local_instance("critic")
network.process_input("analyst", "quarterly revenue is up 23%")
network.process_input("planner", "we should redirect Q4 spend to retention")
state = network.get_network_state()
print(f"active agents: {state['active_instances']}/{state['total_instances']}")
# Pull per-agent feedback the next turn can blend into the embedding
feedback = network.get_feedback_for_agent("analyst")
Aggregation strategies: WeightedAverageAggregation, AttentionAggregation. The ConsensusEngine adds proposal voting at five levels (SIMPLE_MAJORITY, QUALIFIED_MAJORITY, UNANIMOUS, WEIGHTED_VOTE, ADAPTIVE_THRESHOLD); quorum is measured against the registered voter pool, not just votes-cast-so-far. StateVectorPacket round-trips bit-exactly via serialize()/deserialize() and verify_integrity() confirms byte equality.
→ API: semvec.cortex reference.
Path 2 — SemvecCortexService with a custom store¶
SemvecCortexService is the service-shaped facade — it accepts an async pss_store and aggregates whatever active states the store exposes. Use it when your agents are persisted somewhere other than process memory (Postgres, Redis, pgvector, your own session DB) and you want the cortex to reflect all active sessions, not just those registered locally.
from semvec.cortex import SemvecCortexService
class PostgresStore:
async def list_active_states(self):
"""Async iterable of (agent_id, SemvecState) tuples."""
async for row in fetch_active_sessions():
yield row.agent_id, SemvecState.from_dict(row.snapshot)
svc = SemvecCortexService(
pss_store=PostgresStore(),
aggregation="attention", # or "weighted"
dimension=768,
)
result = await svc.update_global_state()
# {global_state, global_coherence, network_resonance, active_instances}
feedback = svc.get_feedback_for_agent("session_42")
# Pass into agent.process_input(text, global_feedback=feedback)
The service runs without a store too — when pss_store=None, it falls back to the in-memory cache populated via register_agent() + process_input(). Pick this path when your control plane is async and the cortex needs to see across process boundaries inside a single service.
→ API: SemvecCortexService reference.
Path 3 — REST API for multi-tenant Cortex¶
When clusters span machines, when several teams need their own region, when you want drift events fanned out automatically and a global observer watching for anomalies — switch to the REST surface. Every primitive in the in-process API has a REST counterpart, plus several that exist only at the REST layer:
| In-process | REST | What's added at REST |
|---|---|---|
SemvecAgentNetwork |
/v1/cluster/ |
JWT-gated ownership, persistent membership, weighted-average or attention aggregation per cluster |
ConsensusEngine |
/v1/region/ |
Region groups multiple clusters; consensus realignment fires on aggregated drift events |
| (none) | /v1/observer/ |
Cross-cluster anomaly detection (cross_cluster_convergence, systemic_drift, cluster_divergence) |
StateVectorPacket |
/v1/network/transfer |
Per-tenant user partitions, trust-score-weighted consensus, network-wide consensus proposals |
| (none) | /v1/cluster/{id}/feedback |
One call blends the cluster aggregate back into all member sessions |
Auth: Authorization: Bearer <jwt> or X-API-Key: <jwt>. Ownership is per license subject — the server never leaks resource existence across tenants (404 on owned-by-another vs 200 on owned-by-me).
pip install "semvec[api]"
export SEMVEC_LICENSE_KEY="eyJhbGciOiJFZERTQSI..."
semvec serve --host 0.0.0.0 --port 8080
→ Full walk-through with curl + httpx examples for every endpoint group: REST-hosted Cortex guide.
Cross-frontend dedup (shared cluster session)¶
Every frontend in a cluster writes into one backing session
(cluster_id == session_id). So when frontend A stores a fact, frontend B
querying the same cluster gets a dedup_signal flagging the overlap — the
{is_update, max_sim, matched_id} triple (dedup-signal guide)
computed cross-frontend instead of within a single session.
Make a fact visible to other frontends. A fact is only visible once it is written into the shared session:
# Frontend A stores a fact (lands in the shared session pool)
curl -X POST localhost:8080/v1/cluster/$CID/store \
-H "Authorization: Bearer $JWT" \
-d '{"message": "what is the SLA?", "response": "the SLA is 99.95% uptime"}'
# Frontend B paraphrases it on the same cluster — dedup_signal flags the overlap
curl -X POST localhost:8080/v1/cluster/$CID/store \
-H "Authorization: Bearer $JWT" \
-d '{"message": "uptime target?", "response": "we guarantee 99.95% availability"}'
# -> {"dedup_signal": {"is_update": true, "max_sim": 0.97, "matched_id": "019..."}, ...}
The matched_id is a durable handle to the matched fact: it stays the
same identifier across the shared session's to_dict()/from_dict() and
across a snapshot reload, so a frontend can store it and correlate later
(dedup-signal guide).
A cluster run that carries a response (POST /v1/cluster/{id}/run with
response=) writes into the same pool. A run without a response only reads.
Per-call threshold. Pass dedup_threshold (a cosine value in [-1, 1]) on
the store call to override the config default (dedup_update_threshold,
typically 0.85) for that one is_update decision. Storage is append-only
regardless — the override only flips the informational signal:
curl -X POST localhost:8080/v1/cluster/$CID/store -H "Authorization: Bearer $JWT" \
-d '{"message": "...", "response": "...", "dedup_threshold": 0.92}'
Pin a cluster against realignment. Create a cluster with
{"drift_exempt": true} to shield its shared session from regional
realignment — useful when the cluster is your durable dedup index. A pinned
cluster cannot be added to a region (the request is refused with 409).
Caveats (read before relying on it)¶
- Cross-FRONTEND, not cross-RAG. Only facts written through a Semvec frontend (cluster store / run-with-response) are visible. Batch jobs or direct-RAG writes that bypass Semvec never enter the shared session and are invisible to the dedup signal.
- No contradiction detection.
is_update == truemeans the embeddings look alike — not that the two facts agree. Detecting contradictions is out of scope; the signal is a similarity hint, nothing more. - Storage stays append-only. The signal never suppresses a write. Your frontend decides what to do (skip a re-store, merge into RAG, just log it).
- Durability needs the gate. A cross-frontend index that survives a worker
restart requires
SEMVEC_STATE_PERSISTplus a backing store (production hardening → state persistence). Without it the shared session is in-memory and per-worker — two workers see two separate pools, and dedup only works within each.
In-process equivalent¶
Cross-frontend dedup is not REST-only. The REST cluster shared session is
just a managed wrapper over a single shared SemvecSession — you get the same
behaviour in-process by holding one session and feeding every frontend's
turns through it:
from semvec import SemvecSession, SemvecState, SemvecConfig
# ONE shared session is the "cluster" — every frontend writes into it.
cfg = SemvecConfig(dimension=768)
shared = SemvecSession(SemvecState(config=cfg), my_embedder, cfg)
# Frontend A stores a fact (mirrors POST /v1/cluster/{id}/store).
res_a = shared.store_qa("what is the SLA?", "the SLA is 99.95% uptime")
# Frontend B paraphrases it on the SAME shared session.
res_b = shared.store_qa("uptime target?", "we guarantee 99.95% availability")
print(res_b["dedup_signal"])
# {'is_update': True, 'max_sim': 0.97, 'matched_id': '019...'}
- Read the signal.
store_qa(...)returns the update metrics with adedup_signal({is_update, max_sim, matched_id}); a full turn viarun_sync(...)/run(...)returns aTurnResultwhose.dedup_signalcarries the same triple (Nonewhen the turn didn't store). See the DedupSignal guide. - Per-call threshold. Override the config default for a single
is_updatedecision on the store path:shared.store_qa(..., dedup_threshold=0.92)(also onupdate_state(...)and the*_asyncvariants). It is not onrun()— mirroring REST, only the store path takes the override. - Durability. Persist the shared session's underlying state with
shared.state.to_bytes(compress=True)and reload viaSemvecState.from_bytes(...)(rebuild theSemvecSessionaround the restored state); thematched_idsurvives the round-trip. See Persisting state in-process.
The mapping is direct: POST /v1/cluster/{id}/store ↔ store_qa,
POST /v1/cluster/{id}/run (with a response) ↔ run_sync, and the cluster's
backing session ↔ this one shared SemvecSession.
Common building blocks (every path)¶
| Concept | What it does | Where it lives |
|---|---|---|
SemvecAgent |
Per-agent state with embedder + process_input(text) |
API: SemvecAgent |
SemvecCortexObserver |
Aggregator turning N agent states into one global state | API: SemvecCortexObserver |
| Aggregation strategies | WeightedAverageAggregation, AttentionAggregation |
API: aggregation |
ConsensusEngine + ConsensusLevel |
Proposal voting (5 levels), quorum-aware finalisation | API: ConsensusEngine |
StateVectorPacket + TransferType |
Inter-agent state transfer with checksummed integrity | API: StateVectorPacket |
When to choose which¶
- Two analysts on one developer's laptop →
SemvecAgentNetworkin-process. Keep it simple. - One service hosting a cortex on top of an existing session store →
SemvecCortexServicewith the store you already have. - Production deployment with several services / tenants / regions → REST API. Drift events, observer anomalies, ownership boundaries, trust scores, and cluster realignment only exist at the REST layer.
Where to next¶
- REST-hosted Cortex guide — the deep dive on clusters, regions, observers, network endpoints.
semvec.cortexAPI reference — every class and method.- REST API reference — endpoint catalogue.
- Coding (overview) — sister-guide for
semvec.coding.