Skip to content

Parity envelope

The values below are the release gate for every semvec build. They cover engine-internal parity (Rust core vs the pure-Python reference implementation) and API determinism. They are independent of the LOCOMO numbers in Benchmarks, which target end-to-end answer quality.

Structural parity (must hold)

Property Envelope
Phase-detector decision Bit-identical on identical input across the parity test suite
Serializer output (short haystack, < 100 chunks) Byte-identical
Consensus decision (all 5 levels) 100 % agreement over 25 LLM-driven rounds
network_resonance parity ≤ 1.1 × 10⁻¹⁶ (machine epsilon)
BM25-hybrid index → cosine-only fallback Identical when SEMVEC_HYBRID_BM25=0
/v1/run context block (same input) Byte-identical within a single process

Per-turn numeric deltas (documented drift)

Per-turn metric values are deterministic within a release and trend-comparable across releases. Exact numeric tolerances are not published. For parity testing you can pin the retrieval projection matrix via the methods documented in the Core API.

LLM-call stochasticity (NOT parity)

When measuring end-to-end answer quality (LOCOMO et al.) note that gpt-4o via OpenRouter is non-deterministic even at temperature=0 because OpenRouter routes between providers (OpenAI direct, Azure, …) that produce minutely different outputs. Empirically:

  • ~60 % of LOCOMO QA predictions are byte-identical across repeat runs
  • ~40 % drift on punctuation / casing / "no info" vs guess
  • aggregate F1 drifts ≤ ±0.5 pp

Plan for this when reading bench reports: drift inside ±1 pp on the aggregate is expected, not a regression.

Test suite

The parity assertions live in tests/test_core_port.py, tests/test_compaction_port.py, tests/test_cortex_port.py and tests/test_audit*.py. They run under pytest without external dependencies (no LLM, no embedder).