Skip to content

Token Reduction (semvec.token_reduction)

Utilities for serialising PSS state into compact LLM context and wiring real LLM endpoints behind SemvecChatProxy.

SemvecStateSerializer

Formats a SemvecState into a 150-350-token context string.

from semvec.token_reduction import SemvecStateSerializer, SerializerConfig

ser = SemvecStateSerializer(SerializerConfig(top_k=10, max_memory_chars=500))
context = ser.serialize(state, query_embedding=emb, last_response=prev)

SerializerConfig

Field Default Purpose
top_k 5 Number of retrieved memories included.
max_memory_chars 300 Per-memory truncation budget.
include_phase True Prepend phase + phase-specific prompt.

serialize(state, *, query_embedding=None, query_text=None, last_response=None) -> str

  • At least one of query_embedding / query_text should be provided for relevance-sorted retrieval. Without either, the serializer falls back to recency-sorted top-k.
  • last_response is included verbatim in the context (budget-checked).

SemvecChatProxy

Production-shaped chat loop: routes each turn through PSS-compressed context, stores Q&A chunks, tracks token counts.

from semvec.token_reduction import SemvecChatProxy, create_llm_client

llm = create_llm_client("openai")
proxy = SemvecChatProxy(llm_call=llm, system_prompt="You are a helpful assistant.")
result = proxy.chat("what's up with Q3?")

Constructor

Parameter Type Default Description
llm_call Callable[[list[ChatMessage]], str] | None built-in echo mock Your LLM callable.
system_prompt str "You are a helpful assistant." Injected before every turn's context.
pss_config SemvecConfig | None SemvecConfig() Internal state config.
serializer_config SerializerConfig | None defaults Context assembly config.
embedding_service object | None auto SentenceTransformer Any object with get_embedding(text) + get_dimension().

Missing embedding_service with no SentenceTransformer installed raises RuntimeError — see the module docstring for the full exception message and copy-paste wrapper.

chat(user_message) -> TurnResult

Returns a dataclass:

Field Type
response str
pss_input_tokens int \| None (from llm_call.last_usage, else None)
baseline_input_tokens int \| None (always None in this shape)
pss_prompt str
phase str
turn_number int

ChatMessage

ChatMessage(role="user", content="hello")

Dataclass with two fields: role ("system" / "user" / "assistant") and content.

LLMConfig

from semvec.token_reduction import LLMConfig

cfg = LLMConfig.from_env("openai")
cfg.validate()
  • LLMConfig.from_env(provider, prefix="") — reads [PREFIX_]PROVIDER_BASE_URL, [PREFIX_]PROVIDER_MODEL, [PREFIX_]PROVIDER_API_KEY.
  • prefix="JUDGE" flips every variable to JUDGE_OPENAI_* with graceful fallback to the unprefixed variant.
  • validate() raises ValueError for missing required fields.

OpenAIClient / OllamaClient

from semvec.token_reduction import OpenAIClient, LLMConfig

client = OpenAIClient(LLMConfig(
    provider="openai",
    base_url="https://api.example.com/v1",
    model="gpt-4",
    api_key="sk-...",
    temperature=0.3,
    max_tokens=512,
))
text = client([ChatMessage(role="user", content="hi")])
usage = client.last_usage  # {"prompt_tokens": ..., "completion_tokens": ...}

Both accept a single list[ChatMessage] and return a string. They populate last_usage from the provider's usage field when present.

create_llm_client(provider="openai", prefix="") -> BaseLLMClient

Factory that calls LLMConfig.from_env + validate() and returns the right subclass.

TokenCounter / TurnTokens / estimate_tokens

Utility helpers for tracking and estimating token counts per turn. estimate_tokens(text: str) uses a simple chars/4 heuristic (compatible with pss).

get_phase_prompt(phase) -> str / PHASE_PROMPTS

Phase-specific instruction snippets used by the serializer when include_phase=True.