Semvec vs. LangChain Memory¶
LangChain ships a family of Memory classes — ConversationBufferMemory,
ConversationSummaryMemory, ConversationSummaryBufferMemory,
VectorStoreRetrieverMemory, ConversationKGMemory, and others. Each has different ingest and
retrieval semantics and different per-turn input dynamics. This page compares Semvec to that
family at the architectural level. We have not run a head-to-head benchmark against any of
the LangChain Memory classes.
Architectural differences¶
| Property | Semvec | LangChain Memory (across classes) |
|---|---|---|
| Per-turn input footprint | Constant by construction (~150–350 tokens) | Depends on the chosen class: buffer ≈ O(n), summary ≈ bounded but lossy, vector retriever ≈ O(retrieved-k) |
| LLM calls during ingest | 0 (deterministic EMA over embeddings) | None for buffer / vector classes; per-turn LLM calls for summary classes (predict_new_summary) |
| Ingest determinism | Bit-exact across replays | Deterministic for buffer / vector; probabilistic for summary classes |
| Numeric / exact-value layer | Verbatim cache with Decimal precision |
Not addressed at the memory-layer level — values stored as text in buffer / summary / KG entries |
| Multi-agent coordination | Built-in (Cortex: aggregations + 5-level consensus) | Composed by the developer (chains / agents / state-passing) |
| What you ingest | A list of (text, embedding) pairs |
A chat_message_history plus per-class hooks |
| What you get out | A compact system-prompt block via SemvecStateSerializer |
Per-class — buffer text, summary text, retrieved documents, or KG triples |
| Licensing | Proprietary (commercial) | MIT |
| Self-hosted | Yes (proprietary license, on-prem) | Yes (OSS, your hosting) |
The fundamental split: Semvec is one purpose-built layer with a single set of guarantees (O(1) input, deterministic recall, exact-value preservation). LangChain Memory is a toolbox of memory styles you wire into chains; each class has its own trade-off and you pick per use case.
Per-class comparison¶
| LangChain class | Per-turn input | Ingest LLM | Where Semvec differs structurally |
|---|---|---|---|
ConversationBufferMemory |
Full history concat — O(n) | None | Semvec compresses to fixed-size; same use case but cost stops growing |
ConversationBufferWindowMemory |
Last k turns — bounded but lossy at the window boundary | None | Semvec keeps the whole history's signal in the state vector, not just the last k |
ConversationSummaryMemory |
Summary text — bounded | LLM call per turn (predict_new_summary) |
Semvec ingest does not call any LLM; deterministic on replay |
ConversationSummaryBufferMemory |
Summary + last k tokens — bounded | LLM call when buffer overflows | As above |
VectorStoreRetrieverMemory |
Top-k retrieved messages — O(k) | None (embedding only) | Semvec keeps the global state in the vector; retrieval is over compressed memory tiers, not the raw history |
ConversationKGMemory |
KG triples relevant to query — bounded | LLM call per turn (entity / triple extraction) | Semvec ingest is LLM-free; LiteralCache covers the structured-fact use case deterministically |
If you currently use:
ConversationBufferMemory/BufferWindowMemory: Semvec is the closest direct upgrade — same "wrap a chat" use case, but constant input cost.ConversationSummaryMemory/SummaryBufferMemory: Semvec covers the same "fit long conversations into a small prompt" goal without the per-turn LLM call. The compressed state carries semantic signal; for explicit textual summaries you can still run an LLM on demand.VectorStoreRetrieverMemory: Semvec is not a vector database — its long-term tier is bounded, not unbounded. If you're storing million-document corpora as memory, keep your vector DB and use Semvec for the conversational layer on top.ConversationKGMemory: Semvec'sLiteralCachecovers structured facts (decisions, invariants, error patterns, code structures) without LLM extraction.
Code shape comparison¶
Semvec — single layer, drop into any LLM call:
from semvec import SemvecState, SemvecConfig
from semvec.token_reduction import SemvecStateSerializer
state = SemvecState(config=SemvecConfig(dimension=768))
for text, embedding in conversation:
state.update(embedding, text) # zero LLM calls, deterministic
context = SemvecStateSerializer().serialize(state, query_text="what did we decide?")
# Plug `context` into any chat completion as a system prompt block.
LangChain — pick a class, wire it into a chain:
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()
memory = ConversationSummaryMemory(llm=llm) # LLM-call per turn
chain = ConversationChain(llm=llm, memory=memory)
chain.predict(input="what's the Q3 plan?")
The user-facing difference: Semvec hands you a string you paste anywhere. LangChain hands you a chain that owns the LLM call.
When to pick which¶
Pick Semvec when:
- per-turn input cost must be O(1) regardless of conversation length,
- ingest must be free of LLM cost and deterministic across replays,
- exact-value preservation matters (numbers, IBANs, dates, IDs),
- you want a single memory layer you can compose with any LLM client, not a chain framework.
Pick LangChain Memory when:
- you're already standardised on LCEL chains / LangGraph and want memory inside that abstraction,
- you need the breadth of memory styles (buffer / summary / KG / vector retriever) to mix and match per chain,
- you want OSS-licensed memory primitives.
The two compose. Many users wrap a LangChain chain whose system prompt includes a Semvec compressed-context block, getting LangChain's chain ergonomics with Semvec's constant-cost memory layer underneath.
Sources¶
- LangChain Memory documentation: https://python.langchain.com/docs/versions/migrating_memory/
- LangChain (GitHub): https://github.com/langchain-ai/langchain