Memory Systems
Context windows are short-term memory; durable agent state needs more
A model's context window is the agent's working memory. Anything that needs to persist beyond a single session — user preferences, prior decisions, learned facts — has to live somewhere else and be retrieved on demand.
Short-Term vs Long-Term Memory
- Short-term — what's in the current context. Lasts one session, capped by the window.
- Long-term — stored externally, retrieved into context when needed. The persistent layer.
The interesting design choices are about long-term: what to store, how to retrieve it, and when to evict.
What to Store
A practical taxonomy:
- Facts about the user — preferences, identity, prior choices.
- Facts about the world — entities the agent has interacted with, decisions it has made.
- Procedural memory — learned recipes for how to handle recurring situations.
- Episodic memory — full transcripts or summaries of past sessions.
Most production systems start with just user-fact memory (a key-value store of preferences and profile data) and add episodic memory only when the use case warrants it.
Storage Choices
- Key-value store — for explicit facts. Cheap, easy.
- Vector store — for fuzzy retrieval over many memories. The default once memory grows.
- Graph — for memories that have rich relationships. More complex; only reach for it when relationships matter.
- Document store — for full transcripts you'll re-read whole.
Most memory systems end up combining a key-value store for structured facts and a vector store for fuzzy text retrieval.
Writing Memory
When does the agent remember something? Two common patterns:
- Explicit save — the model decides what to save by calling a
remember(fact)tool. - Background extraction — a separate process reviews session transcripts and extracts facts.
Explicit save is more controllable; background extraction catches things the model didn't notice were important.
Retrieving Memory
The naive pattern — embed the user message, fetch top-K memories — works surprisingly well. Improvements:
- Query rewriting — let the model rewrite the user's message into a better retrieval query.
- Hybrid retrieval — combine vector search with keyword search for better recall on rare entities.
- Recency weighting — bias toward recent memories when relevance is otherwise tied.
- Filtered retrieval — restrict by user, by topic, by time.
Eviction and Conflict
Memories rot. Old preferences get superseded; old facts become wrong. A serious memory system has to:
- Detect contradictions between new and old memories.
- Update or supersede rather than just append.
- Bound storage — old, unused memories should age out.
Most early systems skip this and pay for it later when stale memories start poisoning answers.
Privacy
Long-term memory means storing user data over time. Plan for: per-user encryption, retention limits, user-visible memory ("here's what I remember about you"), and one-click delete. Memory features without these end up in ugly places.