Concepts & Glossary¶
What update() returns¶
Every call to state.update(embedding, text) returns a metric dict and updates rolling histories on the state object (beta_history, similarity_history, …). You consume these to drive UI, dispatch, or logging.
| Key | What it means for your app | Typical range |
|---|---|---|
similarity |
How close the new input is to the current state, before the update. Low values mean the user just changed direction. | [-1, 1] |
beta |
How much of the previous state survives this turn. High = stable, low = absorbing aggressively. Treat as an opaque indicator. | (0, 1) |
pattern_strength |
How strongly retrieved memories pulled the new state. Higher = more memory influence. | [0, ~1.5] |
norm |
L2 norm of the state vector after the update. Stays bounded automatically. | [0, 1.2] |
fsm |
Stability score in [0, 1]. Low = state is oscillating, high = converged. Useful for dispatching: gate expensive actions on fsm > 0.7. |
[0, 1] |
phase |
One of six labels (see below). Use it to switch prompts, log session breakpoints, or skip retrieval when the state is still warming up. | enum |
topic_switch |
Magnitude of a detected topic switch. Non-zero = the user just pivoted. | [0, 1] |
novelty_score |
How surprising the new input was. High novelty boosts attention to the input on subsequent turns. | [0, 1] |
The six conversation phases¶
Phase detection is fully automatic — you do not configure it, you just consume it. Read it from result["phase"] after every update.
| Phase | What you might do when you see it |
|---|---|
initialization |
Skip "summarise prior work" prompts — there is none yet. |
exploration |
Lean on the LLM's general knowledge; retrieval has little to add. |
convergence |
Start surfacing relevant prior context aggressively. |
resonance |
Cheap turn — short context block is fine. |
stability |
Promote checkpoint persistence; this is a good moment to save. |
instability |
Consider injecting drift anchors or letting the user clarify. |
The detector decides phases internally from interaction history. Treat the output as an opaque classification; do not assume any specific transition rule.
Phase changes are tracked in state.phase_history. The current phase is always state.phase_detector.current_phase.
Memory tiers¶
state.memory is a three-tier MultiResolutionMemory:
| Tier | Default capacity | Promotion rule |
|---|---|---|
| Short-term | 15 slots | every turn lands here |
| Medium-term | 50 slots | promoted on access + importance |
| Long-term | 200 slots | consolidated clusters; built up gradually |
Capacities are configurable via SemvecConfig(short_term_size=…, medium_term_size=…, long_term_size=…).
Selective forgetting¶
When a tier overflows, Semvec keeps memories with the higher retention score — a composite scoring function that takes importance, recency, and access count into account. A frequently-accessed older memory therefore survives over a never-touched newer one. The exact weighting is tuned for production workloads and is not user-configurable.
use_selective_forgetting=False falls back to FIFO if you genuinely want
pure recency.
NegativeAttractor¶
NegativeAttractor — an internal stability safeguard against pathological
state drift. Configurable via negative_attractor_penalty (default 0.5);
only change after running benchmarks.
Retrieval¶
state.memory.get_relevant_memories(query_vec, top_k=N) returns the most
relevant memories across all three tiers. Scoring takes cosine similarity
and per-tier weighting into account, with optional anchor / trigger boosts.
| Knob | Default | What it does |
|---|---|---|
short_term_weight |
1.0 |
scoring weight for the most recent tier |
medium_term_weight |
0.95 |
medium-term tier — almost flat with short-term |
long_term_weight |
0.9 |
long-term tier — kept competitive so older domains stay reachable |
cluster_fallback_threshold |
0.85 |
controls retrieval breadth for uncertain matches. Higher values keep older domains reachable; lower values stay narrow. |
anchor_retrieval_boost (α) |
0.6 |
scoring boost applied when registered anchors align with the candidate; tune in [0.1, 0.6] |
trigger_retrieval_boost (γ) |
0.3 |
scoring boost applied when a registered ResonanceTrigger matches; tune in [0.1, 0.6] |
Anchor and trigger boosts are combined so that redundant matches do not double-count. Exact composition is implementation-defined; you only see the user-visible effect through retrieval order.
Anchors and triggers¶
Two complementary tools for shaping retrieval.
Drift anchors¶
Reference embeddings that pull retrieval toward known domains:
state.add_anchor(embed("SAP Business One Service Layer OData REST API"))
state.add_anchor(embed("italienische Kueche Kochen Pasta Pizza"))
After a few turns, candidate memories that align with one of your anchors win the tie-break against generic phrases. Register one anchor per domain you care about. The current alignment score is exposed as state.anchor_score (mean cosine of state vs all anchors); when the score falls below drift_threshold, realignment begins over the next few turns. Exact realignment dynamics are implementation-defined.
auto_anchor_on_topic_switch=True (opt-in) snapshots semantic_state as a fresh anchor whenever a topic switch fires, capped by max_auto_anchors (default 8). Useful when your domain has clean topic boundaries; off by default because it tends to capture per-turn noise on real-world embeddings.
Resonance triggers¶
Boost memories on a specific keyword or vector match:
from semvec import ResonanceTrigger
state.add_resonance_trigger(ResonanceTrigger(
keyword="security review",
embedding=embed("security audit threat model"),
threshold=0.7,
))
A trigger fires when either:
- the trigger's keyword appears as a substring in the input text, OR
- cosine of the input embedding to trigger_embedding ≥ threshold.
When a trigger fires, matched memories receive the trigger retrieval boost and the input is treated as high-salience for the current turn. Exact effect on the state update is implementation-defined.
Choosing between them¶
| Goal | Use |
|---|---|
| Bias retrieval toward known domains, prototype-style | Anchors (one per domain) |
| Boost memories on a specific keyword or hard-match phrase | Triggers (keyword) |
| Boost memories whose embeddings are near a reference point but the user has no specific keyword | Triggers (embedding + threshold) |
| Both anchor-style and keyword-style signals | Anchors + Triggers — safe to combine; composition is implementation-defined |
When in doubt: start with anchors only and add triggers later if you
have a clear keyword or embedding cue separate from your anchor
prototypes. Defaults for both boosts are user-tunable on
SemvecConfig.
Topic-switch detection¶
When enable_topic_switch=True (default), Semvec watches for the user
pivoting to a different topic. Every detected switch lands on
state.topic_switch_history with {timestamp, magnitude, phase,
auto_anchored} — bounded list, useful for diagnostics and UI cues
regardless of whether you opt in to auto-anchoring.
The detector exposes two coarse tunables on SemvecConfig
(topic_switch_threshold, topic_switch_window). Defaults are calibrated
for production workloads; treat them as opaque knobs and only adjust if
you observe the detector is consistently too sensitive or too dull on
your data.
Short-circuit (serving layer)¶
When you run Semvec through the semvec serve SessionManager (or the REST API),
each query is checked against the session's cached memories before the reader is
called. compute_short_circuit(query_embedding) returns (top_similarity,
short_circuit_flag):
top_similarity— cosine similarity to the closest cached memory (surfaced as a/v1/runresponse field).short_circuit—Truewhentop_similarityclears a configurable threshold, meaning the incoming query is a paraphrase of one already answered and the cached answer can be served without an LLM call. Cluster runs set it viashort_circuit_threshold(e.g.0.85; higher = stricter).
This is a serving-layer feature, not part of the core update() return dict above.
See the token-reduction reference.
Input isolation (QUARANTINE)¶
A session can be put behind an input-isolation filter so off-domain or injected
inputs are caught before they reach the reader. set_isolation(session_id, level=…)
takes one of four levels:
| Level | Effect |
|---|---|
OPEN |
no filtering (default) |
FILTER |
off-domain inputs down-weighted before the update |
QUARANTINE |
off-domain inputs held back from the LLM and recorded; lift with release_quarantine() |
LOCKDOWN |
strictest — only allowlisted inputs pass |
Domain membership is judged by embedding similarity against exclusion_embeddings
/ allowlist_embeddings, with a similarity_threshold (default 0.7). Anchor a
session to, say, oncology medication safety, and a question about credit-card
numbers falls below the threshold and is filtered before it hits the LLM — a
semantic-layer complement to the prompt-level guards. The underlying primitive is
the Rust-core InputIsolationFilter; the same ResonanceTrigger machinery with
weight = 0.0 participates in the input filter (see
Correcting memories).
Over REST: PUT /v1/session/{id}/isolation + POST /v1/session/{id}/isolation/release.
Persistence¶
Two persistence formats; both round-trip the full state including memories, anchors, topic-switch history, and the entire LiteralCache.
| Format | Use for | Pros | Cons |
|---|---|---|---|
to_dict() / from_dict() |
Systems that only speak JSON | Human-readable, JSON-safe | Largest |
to_bytes(compress=True) |
Cold-storage checkpoints | ~ 2.4× smaller than JSON | Slowest (gzip cost) |
to_bytes(compress=False) |
Hot-path persistence | Same size as JSON, only ~ 1.9× slower than json.dumps |
Binary (still self-describing + corruption-checked) |
Both formats include an integrity checksum; tampered snapshots raise StateCorruptionError on restore rather than corrupting silently.
The LiteralCache¶
state.literal_cache is a structured-memory layer for things that should survive verbatim across sessions: design decisions, invariants, recurring error patterns with fixes, per-checkpoint test diffs, and parsed code structures. The full surface is documented in Coding API. The headline method is build_handoff_context(next_checkpoint) — produces a Markdown block ready to paste into the next session's system prompt.
Embedding interface¶
Every API that takes an embedder= parameter expects an object exposing two methods:
embedder.get_embedding(text: str) -> np.ndarray # shape (dimension,), preferably L2-normalised
embedder.get_dimension() -> int # must match SemvecConfig.dimension
See the embedders guide for ready-made wrappers (SentenceTransformers, OpenAI, ONNX int8).
Further reading¶
- Quickstart — three-line examples for every surface
- Embedders — pick the right model
- REST API — every endpoint
- Coding API — full
LiteralCachesurface and the MCP tools