Correcting Memories¶
"How do I make sure that when an old stored memory contains something wrong or outdated and a new piece of information comes in, only the newer (correct) information is retrieved? The newest isn't automatically the correct one."
This is the classic factual-correction problem. Pure semantic memory knows nothing about true vs. false — it only knows about similarity. "Most recent wins" is a poor default, because it silently overrides perfectly valid older facts every time you mention something tangentially related.
Semvec gives you five independent mechanisms for this. They compose; pick the cheapest one that actually solves your problem, and use the heavier ones for the specific cases that need them.
The five mechanisms¶
| # | Mechanism | Pure Python | Persistent | Best for |
|---|---|---|---|---|
| 1 | Recency-Bias (default) | yes | yes | "newest tends to be right" cases — billions of trivial updates |
| 2 | Resonance Triggers + per-trigger weight | yes | yes | A specific corrected fact must outrank the topic default |
| 3 | NegativeAttractor in retrieval | yes | yes | Old answer was definitively wrong and must never resurface |
| 4 | Per-call meta= Source/Confidence |
needs [compliance] |
yes (via event store) | Multiple sources of varying trust (User vs. ERP vs. Web) |
| 5 | Hard delete from event store | needs [compliance] |
yes | Falschinfo must legally / compliance-wise be gone |
The decision flow:
┌───────────────────────────────────┐
│ Old fact superseded by new one? │
└────────────────┬──────────────────┘
│ yes
┌──────────────────────┼─────────────────────┐
│ │ │
"Most recent wins "Specific "Old answer was
is fine here" correction must definitively wrong;
outrank topic" never resurface"
│ │ │
▼ ▼ ▼
[#1 Recency] [#2 Trigger weight] [#3 NegativeAttractor]
│
│ legal / DSGVO scope
▼
[#5 Hard event-store delete]
│
│ multiple competing sources
▼
[#4 Source / Confidence meta]
1. Recency-Bias (default, no API needed)¶
Semvec assigns every memory a retention score that combines
importance, recency, and access frequency. Frequently-accessed
older memories outlive never-touched newer ones, and selective
forgetting (SemvecConfig(use_selective_forgetting=True), default)
prunes low-score entries when the long-term tier fills up.
Use this when: the newest version of a fact is usually the right one, and the cost of an occasional mismatch is low.
Don't rely on this alone when: the wrong answer is correlated with something the user mentions often (it gets boosted into the short-term tier and stays in retrieval).
2. Resonance Triggers + per-trigger weight¶
A ResonanceTrigger boosts memories that match a keyword (substring
in the memory text) or an embedding (cosine ≥ threshold). The
per-trigger weight lets you say "this corrected fact should
outrank topic-default triggers". The boost is multiplicative on the
candidate score:
$$\text{boost} = \gamma \cdot \max_t (\text{strength}_t \cdot \text{weight}_t)$$
from semvec import ResonanceTrigger, SemvecConfig, SemvecState
state = SemvecState(SemvecConfig(dimension=384, trigger_retrieval_boost=0.5))
# Write the corrected fact normally.
state.update(my_embedder.get_embedding("Account balance: 1,247.38 EUR"),
"Account balance: 1,247.38 EUR")
# Topic-level baseline trigger (everything mentioning "balance" gets a
# small lift):
state.add_resonance_trigger(ResonanceTrigger(keyword="balance", weight=1.0))
# Heavy-weight trigger that promotes the corrected memory above older
# rivals, even if their cosine to the query is higher:
state.add_resonance_trigger(
ResonanceTrigger(keyword="1,247.38", weight=5.0)
)
weight defaults to 1.0 and is bounded to [0, 10]. weight=0
silences the boost contribution while leaving the trigger active for
input-isolation purposes. weight > 1 is the explicit
"override-the-baseline" lever.
Trigger boosts the topic, not the value
A keyword trigger like keyword="balance" boosts every memory
containing "balance" — old AND new. To make the corrected memory
rank specifically, give the specific value its own trigger
(keyword="1,247.38" or an embedding match against the corrected
text), not the topic.
3. NegativeAttractor in standard retrieval (new in 0.4.4)¶
The opposite direction: register a region the retrieval should push away from. Each registered attractor demotes any memory whose embedding aligns with it above a configurable threshold:
$$\text{score}(\text{memory}) \cdot= (1 - \delta \cdot \max_a (\text{strength}_a))$$
# A previous answer turned out to be wrong: store it as an attractor
# so retrieval avoids it from now on.
state.add_negative_attractor(
embedder.get_embedding("Account balance: 850 EUR"),
description="Old wrong balance from last month's stale ERP sync",
source="user_correction",
severity=1.0,
)
# Inspect, clear when the situation changes:
print(state.negative_attractor_count) # 1
state.clear_negative_attractors()
Tuning knobs (on SemvecConfig):
negative_attractor_penalty: float— overall strength δ (default0.5, range[0, 1]). At δ = 0.5, a perfectly-aligned attractor halves the candidate score; at δ = 1.0 it zeros it.negative_attractor_threshold: float— cosine floor below which attractors are ignored (default0.3). Stops noise vectors from acting as a stealth deny-list.
Attractor vs. delete
An attractor demotes — the memory is still in storage, just
pushed down in retrieval. If the legal requirement is that the
information must be gone (DSGVO Art. 17), use mechanism #5.
Attractors are also not persisted across process restarts by
default (they live on MrmState only) — wire them into your own
bootstrap if you need persistence.
4. Per-call meta= Source / Confidence (new in 0.4.4)¶
When multiple sources push facts into the same memory and you need
the application layer to disambiguate, attach metadata per update.
Requires the [compliance] extra so each update lands as a
MemoryEvent in the event store.
from semvec import SemvecConfig
from semvec.compliance.event_store import SqliteEventStore
from semvec.compliance.state_proxy import ComplianceState
store = SqliteEventStore(path="events.sqlite")
store.init_schema()
state = ComplianceState(
SemvecConfig(dimension=384),
event_store=store,
user_id="customer-42",
default_meta={"channel": "chat"},
)
# User self-reports their balance — moderate trust:
state.update(
embedder.get_embedding("My balance is 1,200 EUR"),
"My balance is 1,200 EUR",
meta={"source": "user", "confidence": 0.6},
)
# Authoritative ERP push — highest trust:
state.update(
embedder.get_embedding("Account balance: 1,247.38 EUR"),
"Account balance: 1,247.38 EUR",
meta={"source": "erp", "confidence": 1.0},
)
The merge order is {**default_meta, **meta} — the per-call dict
wins on key conflicts, since the call site is the more specific
source. The wrapper copies the merged dict before storing, so
mutating the kwarg dict after the call doesn't retroactively rewrite
the previous event.
At retrieval time the application reads event.meta["confidence"]
and either filters or weights — Semvec doesn't impose a policy, but
the typical pattern is "if a confidence ≥ 0.9 event exists for
this topic, ignore everything below 0.5".
5. Hard event-store delete (DSGVO Art. 17)¶
When the wrong information must be gone — not demoted, not
deprecated, but physically removed and verifiable in an audit —
delete the event from the store. Requires the [compliance] extra.
# 5a — single event by id (exact correction)
DELETE /v1/compliance/users/customer-42/memory/{event_id}
# 5b — full DSGVO Art. 17 forget with signed certificate
POST /v1/compliance/users/customer-42/forget
In 0.4.4 these endpoints automatically enqueue a vector rebuild via
the InMemoryRebuildWorker you registered:
from semvec.api.compliance_routes import (
compliance_router,
set_compliance_dependencies,
)
from semvec.compliance.workers.vector_rebuild import (
InMemoryRebuildWorker,
)
worker = InMemoryRebuildWorker(
store=store,
sink=my_session_manager.write_rebuilt_state,
config_factory=lambda: SemvecConfig(dimension=384),
subject="anonymous",
)
set_compliance_dependencies(
store=store,
audit_log=audit,
rebuild_worker=worker,
)
After a DELETE//forget returns, the running SemvecState is
asynchronously rebuilt against the remaining events. Without a
worker, the row is gone from the store but in-process retrieval
keeps the deleted memory until the next process restart.
Hard delete is destructive
forget_user always wipes the cryptographic certificate's
reason field server-side to "user_request", even when the
HTTP body sends a different value. The certificate is an
operator-issued attestation; user-supplied prose would dilute
its evidentiary value. If you need a custom reason
(e.g. "ttl_expired" from a sweeper), call the
forget_user() Python API directly.
Pattern: combining the mechanisms¶
A real correction pipeline usually layers a few of these:
# 1. Write the new authoritative fact (per-call meta = high
# confidence)
state.update(emb_new, text_new, meta={"source": "erp", "confidence": 1.0})
# 2. Demote the old fact in retrieval (NegativeAttractor)
state.add_negative_attractor(emb_old, description="superseded")
# 3. Boost the new fact specifically (per-trigger weight)
state.add_resonance_trigger(ResonanceTrigger(keyword="1,247.38", weight=5.0))
# 4. (optional) Hard-delete the old event for compliance
client.delete(f"/v1/compliance/users/customer-42/memory/{old_event_id}")
The order matters less than picking one mechanism per concern:
- What's retrieved — Trigger weight, NegativeAttractor, Recency.
- What's stored — Per-call meta, Hard delete.
If you mix concerns in the same mechanism (e.g. trying to use a trigger for both promotion and "this old fact is wrong"), you end up with policies that are hard to reason about. Keep the layers separate.
Cross-references¶
- Concepts & Glossary — phases, tiers, anchors, triggers explained from first principles.
- Compliance Pack — full event-sourcing, retention, GDPR forget, signed certificates.
- REST API — HTTP surface for the compliance / DELETE / forget endpoints.