Every LLM agent framework does stop-the-world compaction when context fills — pause, summarize, resume. The agent freezes, the user waits, and the post-compaction agent wakes up with a lossy summary.
You can avoid this with double buffering. At ~70% capacity, summarize into a checkpoint and start a back buffer. Keep working. Append new messages to both. When the active context hits the wall, swap. The new context has compressed old history + full-fidelity recent messages.
Same single summarization call you'd make anyway, just earlier — when the model isn't at the attention cliff. 40-year-old technique (graphics, databases, stream processing). Nobody had applied it to LLM context. Worst case degrades to exactly today's status quo.
Covers Mem0, Letta, Cognee, Graphiti, Hindsight, EverMemOS, Tacnode, and Hyperspell. Built a small prototype with each and traced open-source implementations end-to-end from API through storage through retrieval. Tacnode is closed-source; Hyperspell is a managed platform — both analyzed from documentation and open-source client code.
A few threads in there:
- The design spectrum from minimal structure (Mem0: two LLM calls, no schema) to rich structure (Graphiti: bi-temporal edges, two-phase entity dedup, per-edge contradiction detection)
- Hindsight running 4-way parallel retrieval with cross-encoder reranking on a single PostgreSQL database
- Hyperspell prioritizing data access over knowledge construction — 43 OAuth integrations, zero extraction. Not a different approach to the same problem, a bet that the bottleneck is upstream.
- Structure without selection pressure is art. Many of these systems build elaborate relationship schemas with no mechanism to decide what's worth remembering.
- Tacnode thinking from the infrastructure layer up — ACID, time travel, multi-modal storage. Nobody else is really working from that depth. Both layers matter.
- The article also asks what memory actually is for agents that need to plan and adapt, not just recall. Most systems converge on extract-store-retrieve. Some are hinting at something deeper.
All systems at pinned versions. Point-in-time, not a ranking.
Filled in my slots, let's do something with this information.Looks like most folks are available on Saturday. Guessing we'd want to be in Santa Monica?
You can avoid this with double buffering. At ~70% capacity, summarize into a checkpoint and start a back buffer. Keep working. Append new messages to both. When the active context hits the wall, swap. The new context has compressed old history + full-fidelity recent messages.
Same single summarization call you'd make anyway, just earlier — when the model isn't at the attention cliff. 40-year-old technique (graphics, databases, stream processing). Nobody had applied it to LLM context. Worst case degrades to exactly today's status quo.