Quickstart¶
A one-page tour of the semvec public API. Every snippet runs against a bare pip install "semvec[cortex,coding]" sentence-transformers environment.
1. Core PSS state¶
import numpy as np
from semvec import SemvecState, SemvecConfig
state = SemvecState(config=SemvecConfig(dimension=384))
for text, embedding in conversation:
result = state.update(embedding, text)
print(
f"phase={result['phase']:14} "
f"similarity={result['similarity']:.3f} "
f"beta={result['beta']:.3f} "
f"norm={result['norm']:.3f}"
)
# Serialise a full checkpoint (SHA-256 integrity-checked, JSON-safe)
checkpoint = state.to_dict()
restored = SemvecState.from_dict(checkpoint)
See the Core API reference for every available metric and method.
2. Token-reduced LLM context¶
from semvec.token_reduction import SemvecStateSerializer
serializer = SemvecStateSerializer()
context = serializer.serialize(state, query_text="what did we decide about auth?")
# 150–350-token compact string suitable for an LLM system prompt.
3. Drop-in chat proxy¶
SemvecChatProxy wraps any callable LLM behind PSS-compressed context and tracks per-turn token accounting.
from semvec.token_reduction import SemvecChatProxy, create_llm_client
llm = create_llm_client("openai") # reads OPENAI_BASE_URL/MODEL/API_KEY from env
proxy = SemvecChatProxy(llm_call=llm, system_prompt="You are a helpful assistant.")
for question in ["summarise Q3", "compare with Q2", "what was the biggest miss?"]:
result = proxy.chat(question)
print(f"turn {result.turn_number} ({result.phase}): {result.response}")
print(proxy.get_summary())
Explicit embedder required
SemvecChatProxy, CodingEngine, LongMemEvalRunner, and SemvecAgent.process_input all require an explicit embedder. There is no hash-based fallback. Pass embedding_service= / embedder= with any object that exposes get_embedding(text) -> np.ndarray and get_dimension() -> int, or install sentence-transformers and let the class build one from SemvecConfig.model_name + SemvecConfig.device.
4. Multi-agent coordination (Cortex)¶
from semvec.cortex import SemvecAgentNetwork, AttentionAggregation
network = SemvecAgentNetwork(aggregation_strategy=AttentionAggregation())
network.add_local_instance("analyst")
network.add_local_instance("planner")
network.process_input("analyst", "quarterly revenue is up 23%")
state = network.get_network_state()
print(f"active agents: {state['active_instances']}/{state['total_instances']}")
See Cortex API for the full set of aggregation strategies and the ConsensusEngine.
5. Coding-agent compaction¶
from semvec.coding import CodingEngine
engine = CodingEngine(state_dir="~/.semvec/project-x", embedder=my_embedder)
engine.ingest_transcript("path/to/claude_code_session.jsonl")
context = engine.get_compacted_context(
"implement password reset flow",
invariants=["never log plaintext passwords"],
)
6. Claude Code integration (MCP + hooks)¶
Add to .claude/settings.json:
{
"mcpServers": {
"semvec": {
"command": "python",
"args": ["-m", "semvec.coding.mcp_server"],
"env": {
"SEMVEC_STATE_DIR": ".semvec",
"SEMVEC_EMBED_MODEL": "all-MiniLM-L6-v2"
}
}
},
"hooks": {
"PreCompact": [{"command": "python -m semvec.coding.hooks.pre_compact", "timeout": 30000}],
"SessionStart":[{"command": "python -m semvec.coding.hooks.session_start", "timeout": 10000}]
}
}
See Coding API for the six MCP tools and their argument shapes.
7. LongMemEval benchmark¶
.venv/bin/python -m semvec.benchmarks.longmemeval \
--variant S --multi-pss --temperature 0.0 \
--embed-device cuda \
--per-type 10 --n-judges 3 \
--output results/semvec_full.json
Flags mirror the pss reference. See Benchmarks overview.
Next steps¶
- Migrating from pss — import-table + API diffs
- Licensing — tier flags,
SEMVEC_LICENSE_KEY, error handling - API Reference — every public symbol with its signature