Parity envelope¶
The values below are the release gate for every semvec build. See MIGRATION.md §Numerical Fidelity for the full technical writeup of each drift source.
Structural parity (must hold)¶
| Property | Envelope |
|---|---|
| Token-savings ratio | Identical to 3 decimals on LongBench-v2 (98.778% vs 98.778%) and LongMemEval-S (99.661% vs 99.661%) |
| Phase-detector decision | Bit-identical on identical input (282 parity tests + parity_compare.py) |
| Serializer output (short haystack, < 100 chunks) | Byte-identical |
| Consensus decision (all 5 levels) | 100% agreement over 25 LLM-driven rounds |
network_resonance parity |
≤ 1.1 × 10⁻¹⁶ (machine epsilon) |
Per-turn numeric deltas (documented drift)¶
Measured over 20-turn identical-embedding runs:
| Metric | Max Δ |
|---|---|
similarity |
3.3 × 10⁻⁵ |
beta |
3.1 × 10⁻⁷ |
fsm |
1.1 × 10⁻⁶ |
norm |
8.4 × 10⁻⁵ |
pattern_strength |
5.6 × 10⁻⁴ (grows to ≈0.15 over 200+ turns due to W_down RNG bootstrap) |
The pattern_strength drift is the only structural drift. It comes from NumPy PCG64 (pss) vs Rust StdRng (semvec) using the same seed but different algorithms for the initial W_down matrix. Eliminate it with:
After injection the first 5 turns are bit-identical (< 1 × 10⁻¹⁴). Turns 6+ stay within ~1 × 10⁻² due to tie-break amplification on sort-by-similarity.
Known caveats¶
- K-means++ cluster init in
consolidate_long_term: same RNG-mismatch story. Fires only when long-term memory reaches 80% capacity — most short-lived conversations never hit it. semantic_hash: pss hashes the embedding's raw numpytobytes(); semvec hashes the little-endian float64 bytes of the same values. Agrees when input isnp.float64; differs when pss receivedfloat32(semvec up-casts at the API boundary).- LLM non-determinism at
temperature > 0: shifts serializer output, then retrieval order. Use--temperature 0.0for reproducibility; with ensemble judging (--n-judges 3), usetemperature 0.2-0.3to reduce judge variance. - SentenceTransformer non-determinism on GPU:
--embed-device cudamay introduce ~10⁻⁶ noise per embedding vs CPU. For fair pss-vs-semvec comparison, use the same--embed-deviceon both runs.
What the pss reference guarantees¶
From the pss internal reference run (n = 500, single-PSS):
| Question type | PSS accuracy | Baseline accuracy | Δ |
|---|---|---|---|
| single-session-user | 60% | 26% | +34 pp |
| multi-session | 28% | 15% | +13 pp |
| temporal-reasoning | 29% | 18% | +11 pp |
| knowledge-update | 54% | 46% | +8 pp |
| single-session-assistant | 66% | 57% | +9 pp |
| single-session-preference | 27% | 23% | +4 pp |
| Overall | 40.8% | 27.4% | +13.4 pp |
A semvec release is expected to reproduce these numbers within ±2 pp at --n-judges 3 --temperature 0.2 --embed-device cuda. The token-savings ratio must stay at 99.6-99.7% on every configuration.