Skip to content

Correcting Memories

"How do I make sure that when an old stored memory contains something wrong or outdated and a new piece of information comes in, only the newer (correct) information is retrieved? The newest isn't automatically the correct one."

This is the classic factual-correction problem. Pure semantic memory knows nothing about true vs. false — it only knows about similarity. "Most recent wins" is a poor default, because it silently overrides perfectly valid older facts every time you mention something tangentially related.

Semvec gives you five independent mechanisms for this. They compose; pick the cheapest one that actually solves your problem, and use the heavier ones for the specific cases that need them.

The five mechanisms

# Mechanism Pure Python Persistent Best for
1 Recency-Bias (default) yes yes "newest tends to be right" cases — billions of trivial updates
2 Resonance Triggers + per-trigger weight yes yes A specific corrected fact must outrank the topic default
3 NegativeAttractor in retrieval yes yes Old answer was definitively wrong and must never resurface
4 Per-call meta= Source/Confidence needs [compliance] yes (via event store) Multiple sources of varying trust (User vs. ERP vs. Web)
5 Hard delete from event store needs [compliance] yes Falschinfo must legally / compliance-wise be gone

The decision flow:

                  ┌───────────────────────────────────┐
                  │  Old fact superseded by new one?  │
                  └────────────────┬──────────────────┘
                                   │ yes
            ┌──────────────────────┼─────────────────────┐
            │                      │                     │
   "Most recent wins              "Specific           "Old answer was
    is fine here"               correction must      definitively wrong;
                                 outrank topic"      never resurface"
            │                      │                     │
            ▼                      ▼                     ▼
      [#1 Recency]         [#2 Trigger weight]   [#3 NegativeAttractor]

                                   │ legal / DSGVO scope
                        [#5 Hard event-store delete]
                                   │ multiple competing sources
                        [#4 Source / Confidence meta]

1. Recency-Bias (default, no API needed)

Semvec assigns every memory a retention score that combines importance, recency, and access frequency. Frequently-accessed older memories outlive never-touched newer ones, and selective forgetting (SemvecConfig(use_selective_forgetting=True), default) prunes low-score entries when the long-term tier fills up.

Use this when: the newest version of a fact is usually the right one, and the cost of an occasional mismatch is low.

Don't rely on this alone when: the wrong answer is correlated with something the user mentions often (it gets boosted into the short-term tier and stays in retrieval).


2. Resonance Triggers + per-trigger weight

A ResonanceTrigger boosts memories that match a keyword (substring in the memory text) or an embedding (cosine ≥ threshold). The per-trigger weight lets you say "this corrected fact should outrank topic-default triggers". The boost is multiplicative on the candidate score:

$$\text{boost} = \gamma \cdot \max_t (\text{strength}_t \cdot \text{weight}_t)$$

from semvec import ResonanceTrigger, SemvecConfig, SemvecState

state = SemvecState(SemvecConfig(dimension=384, trigger_retrieval_boost=0.5))

# Write the corrected fact normally.
state.update(my_embedder.get_embedding("Account balance: 1,247.38 EUR"),
             "Account balance: 1,247.38 EUR")

# Topic-level baseline trigger (everything mentioning "balance" gets a
# small lift):
state.add_resonance_trigger(ResonanceTrigger(keyword="balance", weight=1.0))

# Heavy-weight trigger that promotes the corrected memory above older
# rivals, even if their cosine to the query is higher:
state.add_resonance_trigger(
    ResonanceTrigger(keyword="1,247.38", weight=5.0)
)

weight defaults to 1.0 and is bounded to [0, 10]. weight=0 silences the boost contribution while leaving the trigger active for input-isolation purposes. weight > 1 is the explicit "override-the-baseline" lever.

Trigger boosts the topic, not the value

A keyword trigger like keyword="balance" boosts every memory containing "balance" — old AND new. To make the corrected memory rank specifically, give the specific value its own trigger (keyword="1,247.38" or an embedding match against the corrected text), not the topic.


3. NegativeAttractor in standard retrieval (new in 0.4.4)

The opposite direction: register a region the retrieval should push away from. Each registered attractor demotes any memory whose embedding aligns with it above a configurable threshold:

$$\text{score}(\text{memory}) \cdot= (1 - \delta \cdot \max_a (\text{strength}_a))$$

# A previous answer turned out to be wrong: store it as an attractor
# so retrieval avoids it from now on.
state.add_negative_attractor(
    embedder.get_embedding("Account balance: 850 EUR"),
    description="Old wrong balance from last month's stale ERP sync",
    source="user_correction",
    severity=1.0,
)

# Inspect, clear when the situation changes:
print(state.negative_attractor_count)   # 1
state.clear_negative_attractors()

Tuning knobs (on SemvecConfig):

  • negative_attractor_penalty: float — overall strength δ (default 0.5, range [0, 1]). At δ = 0.5, a perfectly-aligned attractor halves the candidate score; at δ = 1.0 it zeros it.
  • negative_attractor_threshold: float — cosine floor below which attractors are ignored (default 0.3). Stops noise vectors from acting as a stealth deny-list.

Attractor vs. delete

An attractor demotes — the memory is still in storage, just pushed down in retrieval. If the legal requirement is that the information must be gone (DSGVO Art. 17), use mechanism #5. Attractors are also not persisted across process restarts by default (they live on MrmState only) — wire them into your own bootstrap if you need persistence.


4. Per-call meta= Source / Confidence (new in 0.4.4)

When multiple sources push facts into the same memory and you need the application layer to disambiguate, attach metadata per update. Requires the [compliance] extra so each update lands as a MemoryEvent in the event store.

from semvec import SemvecConfig
from semvec.compliance.event_store import SqliteEventStore
from semvec.compliance.state_proxy import ComplianceState

store = SqliteEventStore(path="events.sqlite")
store.init_schema()

state = ComplianceState(
    SemvecConfig(dimension=384),
    event_store=store,
    user_id="customer-42",
    default_meta={"channel": "chat"},
)

# User self-reports their balance — moderate trust:
state.update(
    embedder.get_embedding("My balance is 1,200 EUR"),
    "My balance is 1,200 EUR",
    meta={"source": "user", "confidence": 0.6},
)

# Authoritative ERP push — highest trust:
state.update(
    embedder.get_embedding("Account balance: 1,247.38 EUR"),
    "Account balance: 1,247.38 EUR",
    meta={"source": "erp", "confidence": 1.0},
)

The merge order is {**default_meta, **meta} — the per-call dict wins on key conflicts, since the call site is the more specific source. The wrapper copies the merged dict before storing, so mutating the kwarg dict after the call doesn't retroactively rewrite the previous event.

At retrieval time the application reads event.meta["confidence"] and either filters or weights — Semvec doesn't impose a policy, but the typical pattern is "if a confidence ≥ 0.9 event exists for this topic, ignore everything below 0.5".


5. Hard event-store delete (DSGVO Art. 17)

When the wrong information must be gone — not demoted, not deprecated, but physically removed and verifiable in an audit — delete the event from the store. Requires the [compliance] extra.

# 5a — single event by id (exact correction)
DELETE /v1/compliance/users/customer-42/memory/{event_id}

# 5b — full DSGVO Art. 17 forget with signed certificate
POST   /v1/compliance/users/customer-42/forget

In 0.4.4 these endpoints automatically enqueue a vector rebuild via the InMemoryRebuildWorker you registered:

from semvec.api.compliance_routes import (
    compliance_router,
    set_compliance_dependencies,
)
from semvec.compliance.workers.vector_rebuild import (
    InMemoryRebuildWorker,
)

worker = InMemoryRebuildWorker(
    store=store,
    sink=my_session_manager.write_rebuilt_state,
    config_factory=lambda: SemvecConfig(dimension=384),
    subject="anonymous",
)
set_compliance_dependencies(
    store=store,
    audit_log=audit,
    rebuild_worker=worker,
)

After a DELETE//forget returns, the running SemvecState is asynchronously rebuilt against the remaining events. Without a worker, the row is gone from the store but in-process retrieval keeps the deleted memory until the next process restart.

Hard delete is destructive

forget_user always wipes the cryptographic certificate's reason field server-side to "user_request", even when the HTTP body sends a different value. The certificate is an operator-issued attestation; user-supplied prose would dilute its evidentiary value. If you need a custom reason (e.g. "ttl_expired" from a sweeper), call the forget_user() Python API directly.


Pattern: combining the mechanisms

A real correction pipeline usually layers a few of these:

# 1. Write the new authoritative fact (per-call meta = high
#    confidence)
state.update(emb_new, text_new, meta={"source": "erp", "confidence": 1.0})

# 2. Demote the old fact in retrieval (NegativeAttractor)
state.add_negative_attractor(emb_old, description="superseded")

# 3. Boost the new fact specifically (per-trigger weight)
state.add_resonance_trigger(ResonanceTrigger(keyword="1,247.38", weight=5.0))

# 4. (optional) Hard-delete the old event for compliance
client.delete(f"/v1/compliance/users/customer-42/memory/{old_event_id}")

The order matters less than picking one mechanism per concern:

  • What's retrieved — Trigger weight, NegativeAttractor, Recency.
  • What's stored — Per-call meta, Hard delete.

If you mix concerns in the same mechanism (e.g. trying to use a trigger for both promotion and "this old fact is wrong"), you end up with policies that are hard to reason about. Keep the layers separate.


Cross-references

  • Concepts & Glossary — phases, tiers, anchors, triggers explained from first principles.
  • Compliance Pack — full event-sourcing, retention, GDPR forget, signed certificates.
  • REST API — HTTP surface for the compliance / DELETE / forget endpoints.