Semvec vs. LangChain Memory¶

LangChain ships a family of Memory classes — ConversationBufferMemory, ConversationSummaryMemory, ConversationSummaryBufferMemory, VectorStoreRetrieverMemory, ConversationKGMemory, and others. Each has different ingest and retrieval semantics and different per-turn input dynamics. This page compares Semvec to that family at the architectural level. We have not run a head-to-head benchmark against any of the LangChain Memory classes.

Architectural differences¶

Property	Semvec	LangChain Memory (across classes)
Per-turn input footprint	Constant by construction (~150–350 tokens)	Depends on the chosen class: buffer ≈ O(n), summary ≈ bounded but lossy, vector retriever ≈ O(retrieved-k)
LLM calls during ingest	0 (deterministic EMA over embeddings)	None for buffer / vector classes; per-turn LLM calls for summary classes (`predict_new_summary`)
Ingest determinism	Bit-exact across replays	Deterministic for buffer / vector; probabilistic for summary classes
Numeric / exact-value layer	Verbatim cache with `Decimal` precision	Not addressed at the memory-layer level — values stored as text in buffer / summary / KG entries
Multi-agent coordination	Built-in (Cortex: aggregations + 5-level consensus)	Composed by the developer (chains / agents / state-passing)
What you ingest	A list of `(text, embedding)` pairs	A `chat_message_history` plus per-class hooks
What you get out	A compact system-prompt block via `SemvecStateSerializer`	Per-class — buffer text, summary text, retrieved documents, or KG triples
Licensing	Proprietary (commercial)	MIT
Self-hosted	Yes (proprietary license, on-prem)	Yes (OSS, your hosting)

The fundamental split: Semvec is one purpose-built layer with a single set of guarantees (O(1) input, deterministic recall, exact-value preservation). LangChain Memory is a toolbox of memory styles you wire into chains; each class has its own trade-off and you pick per use case.

Per-class comparison¶

LangChain class	Per-turn input	Ingest LLM	Where Semvec differs structurally
`ConversationBufferMemory`	Full history concat — O(n)	None	Semvec compresses to fixed-size; same use case but cost stops growing
`ConversationBufferWindowMemory`	Last k turns — bounded but lossy at the window boundary	None	Semvec keeps the whole history's signal in the state vector, not just the last k
`ConversationSummaryMemory`	Summary text — bounded	LLM call per turn (`predict_new_summary`)	Semvec ingest does not call any LLM; deterministic on replay
`ConversationSummaryBufferMemory`	Summary + last k tokens — bounded	LLM call when buffer overflows	As above
`VectorStoreRetrieverMemory`	Top-k retrieved messages — O(k)	None (embedding only)	Semvec keeps the global state in the vector; retrieval is over compressed memory tiers, not the raw history
`ConversationKGMemory`	KG triples relevant to query — bounded	LLM call per turn (entity / triple extraction)	Semvec ingest is LLM-free; `LiteralCache` covers the structured-fact use case deterministically

If you currently use:

ConversationBufferMemory / BufferWindowMemory: Semvec is the closest direct upgrade — same "wrap a chat" use case, but constant input cost.
ConversationSummaryMemory / SummaryBufferMemory: Semvec covers the same "fit long conversations into a small prompt" goal without the per-turn LLM call. The compressed state carries semantic signal; for explicit textual summaries you can still run an LLM on demand.
VectorStoreRetrieverMemory: Semvec is not a vector database — its long-term tier is bounded, not unbounded. If you're storing million-document corpora as memory, keep your vector DB and use Semvec for the conversational layer on top.
ConversationKGMemory: Semvec's LiteralCache covers structured facts (decisions, invariants, error patterns, code structures) without LLM extraction.

Code shape comparison¶

Semvec — single layer, drop into any LLM call:

from semvec import SemvecState, SemvecConfig
from semvec.token_reduction import SemvecStateSerializer

state = SemvecState(config=SemvecConfig(dimension=768))
for text, embedding in conversation:
    state.update(embedding, text)             # zero LLM calls, deterministic

context = SemvecStateSerializer().serialize(state, query_text="what did we decide?")
# Plug `context` into any chat completion as a system prompt block.

LangChain — pick a class, wire it into a chain:

from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()
memory = ConversationSummaryMemory(llm=llm)   # LLM-call per turn
chain = ConversationChain(llm=llm, memory=memory)

chain.predict(input="what's the Q3 plan?")

The user-facing difference: Semvec hands you a string you paste anywhere. LangChain hands you a chain that owns the LLM call.

When to pick which¶

Pick Semvec when:

per-turn input cost must be O(1) regardless of conversation length,
ingest must be free of LLM cost and deterministic across replays,
exact-value preservation matters (numbers, IBANs, dates, IDs),
you want a single memory layer you can compose with any LLM client, not a chain framework.

Pick LangChain Memory when:

you're already standardised on LCEL chains / LangGraph and want memory inside that abstraction,
you need the breadth of memory styles (buffer / summary / KG / vector retriever) to mix and match per chain,
you want OSS-licensed memory primitives.

The two compose. Many users wrap a LangChain chain whose system prompt includes a Semvec compressed-context block, getting LangChain's chain ergonomics with Semvec's constant-cost memory layer underneath.

Sources¶

LangChain Memory documentation: https://python.langchain.com/docs/versions/migrating_memory/
LangChain (GitHub): https://github.com/langchain-ai/langchain