Memory Systems

A model's context window is the agent's working memory. Anything that needs to persist beyond a single session — user preferences, prior decisions, learned facts — has to live somewhere else and be retrieved on demand.

Short-Term vs Long-Term Memory

Short-term — what's in the current context. Lasts one session, capped by the window.
Long-term — stored externally, retrieved into context when needed. The persistent layer.

The interesting design choices are about long-term: what to store, how to retrieve it, and when to evict.

What to Store

A practical taxonomy:

Facts about the user — preferences, identity, prior choices.
Facts about the world — entities the agent has interacted with, decisions it has made.
Procedural memory — learned recipes for how to handle recurring situations.
Episodic memory — full transcripts or summaries of past sessions.

Most production systems start with just user-fact memory (a key-value store of preferences and profile data) and add episodic memory only when the use case warrants it.

Storage Choices

Key-value store — for explicit facts. Cheap, easy.
Vector store — for fuzzy retrieval over many memories. The default once memory grows.
Graph — for memories that have rich relationships. More complex; only reach for it when relationships matter.
Document store — for full transcripts you'll re-read whole.

Most memory systems end up combining a key-value store for structured facts and a vector store for fuzzy text retrieval.

Writing Memory

When does the agent remember something? Two common patterns:

Explicit save — the model decides what to save by calling a remember(fact) tool.
Background extraction — a separate process reviews session transcripts and extracts facts.

Explicit save is more controllable; background extraction catches things the model didn't notice were important.

Retrieving Memory

The naive pattern — embed the user message, fetch top-K memories — works surprisingly well. Improvements:

Query rewriting — let the model rewrite the user's message into a better retrieval query.
Hybrid retrieval — combine vector search with keyword search for better recall on rare entities.
Recency weighting — bias toward recent memories when relevance is otherwise tied.
Filtered retrieval — restrict by user, by topic, by time.

Eviction and Conflict

Memories rot. Old preferences get superseded; old facts become wrong. A serious memory system has to:

Detect contradictions between new and old memories.
Update or supersede rather than just append.
Bound storage — old, unused memories should age out.

Most early systems skip this and pay for it later when stale memories start poisoning answers.

Privacy

Long-term memory means storing user data over time. Plan for: per-user encryption, retention limits, user-visible memory ("here's what I remember about you"), and one-click delete. Memory features without these end up in ugly places.