Skip to content

Semvec vs. LangChain Memory

LangChain ships a family of Memory classes — ConversationBufferMemory, ConversationSummaryMemory, ConversationSummaryBufferMemory, VectorStoreRetrieverMemory, ConversationKGMemory, and others. Each has different ingest and retrieval semantics and different per-turn input dynamics. This page compares Semvec to that family at the architectural level. We have not run a head-to-head benchmark against any of the LangChain Memory classes.

Architectural differences

Property Semvec LangChain Memory (across classes)
Per-turn input footprint Constant by construction (~150–350 tokens) Depends on the chosen class: buffer ≈ O(n), summary ≈ bounded but lossy, vector retriever ≈ O(retrieved-k)
LLM calls during ingest 0 (deterministic EMA over embeddings) None for buffer / vector classes; per-turn LLM calls for summary classes (predict_new_summary)
Ingest determinism Bit-exact across replays Deterministic for buffer / vector; probabilistic for summary classes
Numeric / exact-value layer Verbatim cache with Decimal precision Not addressed at the memory-layer level — values stored as text in buffer / summary / KG entries
Multi-agent coordination Built-in (Cortex: aggregations + 5-level consensus) Composed by the developer (chains / agents / state-passing)
What you ingest A list of (text, embedding) pairs A chat_message_history plus per-class hooks
What you get out A compact system-prompt block via SemvecStateSerializer Per-class — buffer text, summary text, retrieved documents, or KG triples
Licensing Proprietary (commercial) MIT
Self-hosted Yes (proprietary license, on-prem) Yes (OSS, your hosting)

The fundamental split: Semvec is one purpose-built layer with a single set of guarantees (O(1) input, deterministic recall, exact-value preservation). LangChain Memory is a toolbox of memory styles you wire into chains; each class has its own trade-off and you pick per use case.

Per-class comparison

LangChain class Per-turn input Ingest LLM Where Semvec differs structurally
ConversationBufferMemory Full history concat — O(n) None Semvec compresses to fixed-size; same use case but cost stops growing
ConversationBufferWindowMemory Last k turns — bounded but lossy at the window boundary None Semvec keeps the whole history's signal in the state vector, not just the last k
ConversationSummaryMemory Summary text — bounded LLM call per turn (predict_new_summary) Semvec ingest does not call any LLM; deterministic on replay
ConversationSummaryBufferMemory Summary + last k tokens — bounded LLM call when buffer overflows As above
VectorStoreRetrieverMemory Top-k retrieved messages — O(k) None (embedding only) Semvec keeps the global state in the vector; retrieval is over compressed memory tiers, not the raw history
ConversationKGMemory KG triples relevant to query — bounded LLM call per turn (entity / triple extraction) Semvec ingest is LLM-free; LiteralCache covers the structured-fact use case deterministically

If you currently use:

  • ConversationBufferMemory / BufferWindowMemory: Semvec is the closest direct upgrade — same "wrap a chat" use case, but constant input cost.
  • ConversationSummaryMemory / SummaryBufferMemory: Semvec covers the same "fit long conversations into a small prompt" goal without the per-turn LLM call. The compressed state carries semantic signal; for explicit textual summaries you can still run an LLM on demand.
  • VectorStoreRetrieverMemory: Semvec is not a vector database — its long-term tier is bounded, not unbounded. If you're storing million-document corpora as memory, keep your vector DB and use Semvec for the conversational layer on top.
  • ConversationKGMemory: Semvec's LiteralCache covers structured facts (decisions, invariants, error patterns, code structures) without LLM extraction.

Code shape comparison

Semvec — single layer, drop into any LLM call:

from semvec import SemvecState, SemvecConfig
from semvec.token_reduction import SemvecStateSerializer

state = SemvecState(config=SemvecConfig(dimension=768))
for text, embedding in conversation:
    state.update(embedding, text)             # zero LLM calls, deterministic

context = SemvecStateSerializer().serialize(state, query_text="what did we decide?")
# Plug `context` into any chat completion as a system prompt block.

LangChain — pick a class, wire it into a chain:

from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()
memory = ConversationSummaryMemory(llm=llm)   # LLM-call per turn
chain = ConversationChain(llm=llm, memory=memory)

chain.predict(input="what's the Q3 plan?")

The user-facing difference: Semvec hands you a string you paste anywhere. LangChain hands you a chain that owns the LLM call.

When to pick which

Pick Semvec when:

  • per-turn input cost must be O(1) regardless of conversation length,
  • ingest must be free of LLM cost and deterministic across replays,
  • exact-value preservation matters (numbers, IBANs, dates, IDs),
  • you want a single memory layer you can compose with any LLM client, not a chain framework.

Pick LangChain Memory when:

  • you're already standardised on LCEL chains / LangGraph and want memory inside that abstraction,
  • you need the breadth of memory styles (buffer / summary / KG / vector retriever) to mix and match per chain,
  • you want OSS-licensed memory primitives.

The two compose. Many users wrap a LangChain chain whose system prompt includes a Semvec compressed-context block, getting LangChain's chain ergonomics with Semvec's constant-cost memory layer underneath.

Sources

  • LangChain Memory documentation: https://python.langchain.com/docs/versions/migrating_memory/
  • LangChain (GitHub): https://github.com/langchain-ai/langchain