Skip to content

Parity envelope

The values below are the release gate for every semvec build. See MIGRATION.md §Numerical Fidelity for the full technical writeup of each drift source.

Structural parity (must hold)

Property Envelope
Token-savings ratio Identical to 3 decimals on LongBench-v2 (98.778% vs 98.778%) and LongMemEval-S (99.661% vs 99.661%)
Phase-detector decision Bit-identical on identical input (282 parity tests + parity_compare.py)
Serializer output (short haystack, < 100 chunks) Byte-identical
Consensus decision (all 5 levels) 100% agreement over 25 LLM-driven rounds
network_resonance parity ≤ 1.1 × 10⁻¹⁶ (machine epsilon)

Per-turn numeric deltas (documented drift)

Measured over 20-turn identical-embedding runs:

Metric Max Δ
similarity 3.3 × 10⁻⁵
beta 3.1 × 10⁻⁷
fsm 1.1 × 10⁻⁶
norm 8.4 × 10⁻⁵
pattern_strength 5.6 × 10⁻⁴ (grows to ≈0.15 over 200+ turns due to W_down RNG bootstrap)

The pattern_strength drift is the only structural drift. It comes from NumPy PCG64 (pss) vs Rust StdRng (semvec) using the same seed but different algorithms for the initial W_down matrix. Eliminate it with:

state.set_retrieval_projection_weights(pss_state._retrieval_projection.W_down)

After injection the first 5 turns are bit-identical (< 1 × 10⁻¹⁴). Turns 6+ stay within ~1 × 10⁻² due to tie-break amplification on sort-by-similarity.

Known caveats

  • K-means++ cluster init in consolidate_long_term: same RNG-mismatch story. Fires only when long-term memory reaches 80% capacity — most short-lived conversations never hit it.
  • semantic_hash: pss hashes the embedding's raw numpy tobytes(); semvec hashes the little-endian float64 bytes of the same values. Agrees when input is np.float64; differs when pss received float32 (semvec up-casts at the API boundary).
  • LLM non-determinism at temperature > 0: shifts serializer output, then retrieval order. Use --temperature 0.0 for reproducibility; with ensemble judging (--n-judges 3), use temperature 0.2-0.3 to reduce judge variance.
  • SentenceTransformer non-determinism on GPU: --embed-device cuda may introduce ~10⁻⁶ noise per embedding vs CPU. For fair pss-vs-semvec comparison, use the same --embed-device on both runs.

What the pss reference guarantees

From the pss internal reference run (n = 500, single-PSS):

Question type PSS accuracy Baseline accuracy Δ
single-session-user 60% 26% +34 pp
multi-session 28% 15% +13 pp
temporal-reasoning 29% 18% +11 pp
knowledge-update 54% 46% +8 pp
single-session-assistant 66% 57% +9 pp
single-session-preference 27% 23% +4 pp
Overall 40.8% 27.4% +13.4 pp

A semvec release is expected to reproduce these numbers within ±2 pp at --n-judges 3 --temperature 0.2 --embed-device cuda. The token-savings ratio must stay at 99.6-99.7% on every configuration.