2026-05-13">
Skip to content

Core API (semvec)

The semantic-state engine. All symbols listed here are importable directly from semvec.

SemvecState

The persistent semantic state.

from semvec import SemvecState, SemvecConfig

state = SemvecState(config=SemvecConfig(dimension=384))

Constructor

Parameter Type Default Description
config SemvecConfig \| None SemvecConfig() Configuration bundle (see below).

update(input_embedding, text, *, meta=None) -> dict

Fold a single (embedding, text) pair into the state. meta is an optional dict attached to the resulting memory entry.

Returns a metric dict:

Key Type Meaning
similarity float Cosine similarity between input and current state, pre-update.
beta float Adaptive blending coefficient for this turn.
pattern_strength float How strongly retrieved memories pulled the state.
fsm float Stability score in [0, 1] (high = converged, low = oscillating).
phase str Detected phase (initialization / exploration / convergence / resonance / stability / instability).
norm float L2 norm of the post-update state vector.
topic_switch float Signalled topic-switch magnitude (0 = none).
novelty_score float How semantically novel this input was.

Other methods

Method Purpose
add_anchor(embedding) Register a drift anchor — biases retrieval toward the anchor's domain.
add_resonance_trigger(trigger) Register a pre-built ResonanceTrigger(keyword=..., embedding=..., threshold=0.7, weight=1.0). Boosts memories on keyword or embedding match during retrieval.
add_resonance_trigger(trigger) Low-level: register a pre-built ResonanceTrigger instance (e.g. one you've round-tripped through a snapshot or constructed manually).
to_dict(*, include_memory_text=True, include_literal_cache_text=True, include_adaptive_params=True) -> dict Checksummed, JSON-safe full-state snapshot. The three keyword-only privacy toggles are independent; see "Snapshot redaction" below.
from_dict(data) -> SemvecState Restore from a to_dict() snapshot. Raises StateCorruptionError on checksum mismatch. Tolerates pre-redaction snapshots that lack the optional sections.
to_bytes(compress=True, *, include_memory_text=True, include_literal_cache_text=True, include_adaptive_params=True) -> bytes Compact binary checkpoint with magic header + corruption check. Same redaction kwargs as to_dict(). compress=False skips gzip for hot-path persistence. (provided by the Rust core; not surfaced in Python type stubs)
from_bytes(blob) -> SemvecState Restore from a to_bytes() blob (auto-detects compressed vs uncompressed via the version byte). (provided by the Rust core; not surfaced in Python type stubs)
set_retrieval_projection_weights(matrix) Inject a custom retrieval projection matrix (advanced; pins retrieval scoring across replays).
get_retrieval_projection_weights() -> list[list[float]] Snapshot the current projection matrix. Useful for parity tests; treat the contents as opaque.

Snapshot redaction

to_dict / to_bytes carry three independent privacy toggles for sharing snapshots outside trusted hands. All three default to True (back-compat) and can be combined freely:

Toggle Default When False, redacts
include_memory_text True The user-prose text on every memory entry — embedding, hash, scores stay so retrieval against the snapshot remains functional.
include_literal_cache_text True The verbatim text inside LiteralCache extracted facts (Decisions / Errors / Code structures). The structured fields and hashes stay.
include_adaptive_params True The adaptive_params block (beta_max, decay_rate, reinforcement_threshold, diversity_penalty, learning_rate) plus the internal config block (beta_basis, beta_max_default, alpha_attention_scalar, state_norm_setpoint). Strip when sharing snapshots outside trusted hands.

from_dict / from_bytes accept redacted snapshots — missing fields fall back to SemvecConfig defaults, and the integrity checksum is computed against those same defaults so the verification still succeeds.

# A snapshot you can hand to a third-party support engineer:
blob = state.to_bytes(
    include_memory_text=False,           # no user prose
    include_literal_cache_text=False,    # no verbatim facts
    include_adaptive_params=False,       # no internal tuning state
)

Aggregate diagnostics methods

SemvecState exposes a small set of on-demand diagnostics methods for dashboards and ops monitoring beyond what update() returns. Treat their return values as opaque indicators — useful for UI / monitoring / dispatch logic in your application, but not as a window into engine mechanics.

Method Returns Purpose
calculate_fsm(...) float in [0, 1] Overall stability score (higher = more stable). Useful for gating expensive actions on > 0.7.
calculate_metrics(...) dict[str, float] Internal diagnostics dict for ops dashboards.
calculate_advanced_metrics(...) dict[str, float] Extended diagnostics dict (super-set of calculate_metrics).

Keys and argument signatures are unstable

The exact keys in the diagnostics dicts, their interpretation, and the argument signatures of these methods are implementation details and may change between releases without notice.

Iterate the returned dict defensively (for k, v in d.items()) instead of hard-coding key names so your code stays forward-compatible.

The values are deterministic for a given (subject, dimension, input) tuple within a release; do not assume cross-release or cross-instance comparability. Outputs are licensing-bound — see licensing.

Attributes

Attribute Type Notes
semantic_state np.ndarray The live state vector, shape (dim,). Readable and writable.
interaction_count int Total update() calls since construction or reset. (runtime attribute, not in stub)
timestamp int Monotonic tick counter. (runtime attribute, not in stub)
memory MultiResolutionMemory Short-term / medium-term / long-term tiers. (runtime attribute, not in stub)
phase_detector PhaseDetector Automatic phase detector. (runtime attribute, not in stub)
literal_cache LiteralCache Verbatim structured-memory layer. (runtime attribute, not in stub)
anchor_count int Number of registered drift anchors. (runtime attribute, not in stub)
anchor_score float Mean cosine of state vs all registered anchors. (runtime attribute, not in stub)
topic_switch_history list[dict] Bounded list of detected switches. (runtime attribute, not in stub)
phase_history list[str] Phase transitions recorded since construction.

Additional methods

Further public methods on SemvecState, surfaced here for completeness. Argument lists reflect the installed wheel; treat return values as opaque indicators (see the diagnostics warning above).

Method Purpose
inject_memory(embedding, text, tier, importance=1.0, timestamp=0.0, access_count=0, meta=None, protection_score=0.0) Manually seed a memory at a specific tier; useful for warmstarts and migration.
consolidate_long_term() Run a single consolidation pass over the long-term tier.
get_all_memories_flat() Flat list view of every memory across all tiers.
get_total_stored() Total number of memories currently held across all tiers.
get_metrics() Snapshot of the last computed metric dict.
get_phase() Current phase label.
get_dynamic_top_k() Suggested retrieval top-k for the current state.
query_similarities_vectorized(query_embedding) Batch similarity scoring against the live state.
add_negative_attractor(error_vector, description=..., source=..., severity=1.0) Register a negative attractor to demote in retrieval.
clear_negative_attractors() Remove all registered negative attractors.
clear_resonance_triggers() Remove all registered resonance triggers.
set_isolation_filter(filter_) Restrict retrieval to memories matching the supplied filter.
release_quarantine_count() Number of memories released from quarantine since construction.
update_batch(...) Batched variant of update() for bulk ingestion.

Diagnostic attributes

Rolling-history and counter attributes exposed for dashboards and ops monitoring. Treat their contents as opaque indicators — useful for UI / monitoring in your application, not as a window into engine mechanics. Lengths are bounded by SemvecConfig.history_length.

Attribute Type Notes
similarity_history list[float] Rolling history of per-turn similarity values.
norm_history list[float] Rolling history of post-update state-vector norms.
beta_history list[float] Rolling history of adaptive blending coefficients.
fsm_history list[float] Rolling history of stability scores.
drift_threshold float Current drift threshold used by the detector.
negative_attractor_count int Number of currently registered negative attractors.
resonance_trigger_count int Number of currently registered resonance triggers.
realignment_remaining int Remaining steps in any active realignment.
operator_state_vector np.ndarray Companion vector tracking operator-side state.
config SemvecConfig The config object the state was constructed with.

SemvecConfig

Immutable configuration dataclass passed into SemvecState(config=…). Every field is a keyword argument; everything has a default so SemvecConfig() is a valid call.

from semvec import SemvecConfig

cfg = SemvecConfig(dimension=384)

Fields

Every SemvecConfig field is a keyword argument; everything has a default so SemvecConfig() is a valid call. Out-of-range values raise ConfigurationError in the constructor.

Field descriptions describe what each knob is for, not the underlying mechanism. The mechanism is out of scope for this reference.

Identity & embedder

Field Type Default Purpose
model_name str "all-MiniLM-L6-v2" Preferred embedder label (informational hint — the state never loads a model itself).
dimension int 384 Embedding dimension. Must match your embedder's get_dimension().
device str "cpu" Device hint for the embedder (informational).
debug bool False Enable verbose core logging.

Memory tiers & retention

Field Type Default Purpose
short_term_size int 15 Short-term memory capacity.
medium_term_size int 50 Medium-term memory capacity.
long_term_size int 200 Long-term memory capacity.
use_selective_forgetting bool True Score-based eviction when a tier overflows vs FIFO.
compression_ratio float 0.3 Text-compression ratio on promotion between tiers. Lower = shorter compressed output per turn, less retained nuance. Tune in [0.1, 0.5].

Phase detector & rolling windows

Field Type Default Purpose
phase_detection_window int 50 Sliding-window size the phase detector consumes.
context_window int 20 Recent-input window kept for novelty / topic-switch scoring.
history_length int 20 Cap on rolling history arrays (norm_history, similarity_history, beta_history).

Topic-switch detector

Field Type Default Purpose
enable_topic_switch bool True Master switch for the topic-switch detector.
topic_switch_threshold float 0.3 Sensitivity knob — raise to make the detector less twitchy on noisy domains, lower to make it fire sooner.
topic_switch_window int 5 Number of consecutive turns the detector watches.
auto_anchor_on_topic_switch bool False When True, snapshot the current semantic_state as a fresh anchor on every detected switch.
max_auto_anchors int 8 Cap on anchors created via auto_anchor_on_topic_switch.

Retrieval boosts & tier weights

Field Type Default Purpose
anchor_retrieval_boost float 0.6 Score boost applied when registered anchors align with the candidate. Tune in [0.1, 0.6].
trigger_retrieval_boost float 0.3 Score boost applied when a ResonanceTrigger matches. Tune in [0.1, 0.6].
short_term_weight float 1.0 Tier weight for short-term memories during retrieval.
medium_term_weight float 0.95 Tier weight for medium-term memories.
long_term_weight float 0.9 Tier weight for long-term memories.
negative_attractor_penalty float 0.5 Overall strength of NegativeAttractor demotion in retrieval ([0, 1]). (advanced; rarely changed — internal stability safeguard, leave at default unless a benchmark instructs otherwise)
negative_attractor_threshold float 0.3 Cosine floor below which attractors are ignored.
cluster_fallback_threshold float 0.85 Controls retrieval breadth for uncertain matches against the long-term tier. Higher values keep older domains reachable; lower values stay narrow.

Internal tuning constants

The following constants are used internally and exposed only for reproducibility of benchmark runs. They are not user-tunable — do not change them outside a benchmark harness. Names are listed without describing the underlying mechanism; see llms.txt for the disclosure policy.

Constant Default
beta_basis 0.05
beta_max_default 0.35
alpha_attention_scalar 0.3
state_norm_setpoint 1.2

These four fields plus the adaptive_params block (beta_max, decay_rate, reinforcement_threshold, diversity_penalty, learning_rate) are stripped from snapshots when you pass include_adaptive_params=False to to_dict() / to_bytes() — use that toggle whenever you need to share a snapshot outside trusted hands.

MultiResolutionMemory

Three-tier episodic memory. Exposed via state.memory.

Method Returns Purpose
get_relevant_memories(query_embedding, top_k=10, *, meta_filter=None) list[MemoryUnit] Cosine-similarity retrieval across all tiers. Optional meta_filter dict restricts results to memories whose meta matches.
short_term / medium_term / long_term list[MemoryUnit] Direct tier access.

MemoryUnit

Attribute Type
embedding np.ndarray
text str
importance float
access_count int
timestamp float
semantic_hash str (8-hex-char content hash)
protection_score float (retention bias for selective forgetting)

PhaseDetector

Exposed via state.phase_detector.

  • current_phase: str
  • phase_transitions: list[dict] — historical transitions with timestamps and metric snapshots.
  • update(metrics, timestamp) -> Optional[str] — returns the new phase if a transition occurred.
  • signal_topic_switch(magnitude) — bias the next rule-scoring pass toward exploration.

LiteralCache

Exact-text structured-memory layer. See Coding API for the full surface — recording, querying (query(text, max_results=10)), and build_handoff_context() for cross-session prompts.

safe_cosine_similarity(a, b, eps=1e-8) -> float

Cosine similarity that returns 0.0 on zero-norm vectors instead of NaN.

Exceptions

from semvec import (
    SemvecError,                # base class
    ConfigurationError,
    EmbeddingError,
    StateCorruptionError,
    LicenseError,            # base for licensing issues
    LicenseExpiredError,
    RateLimitError,
)

All inherit from SemvecError. License-related exceptions inherit from LicenseError → SemvecError.

RateLimitError exceptions carry the standard Python args tuple; parse for retry hints if needed. (A dedicated retry_after attribute is planned for a future release.)

See also