5.52M events/sec
The ingestion path captures live observations, tool calls, cache events, and model-call telemetry without turning the agent stack into a separate logging system.
Agent memory performance matters because recall sits directly in the agent loop. If every turn waits on slow context retrieval, the agent feels slow before the model is even called.
ZeptoDB benchmarks the memory layer separately from the time-series core, then shows how both fit together: microsecond evidence retrieval from live tables, millisecond memory search, exact/semantic cache lookup, and zero-copy Python for model-side workflows.
For any external comparison, use the benchmark criteria first: scope, hardware, build flags, dataset shape, cache state, run protocol, and tail latency must be disclosed with the number.
The Agent Memory benchmark uses client-supplied 128-dimensional float32 embeddings. It measures:
The memory layer ranks candidates by tenant/session filters, embedding similarity, importance, pinned boost, recency, and access count. Context assembly deduplicates repeated content and respects an optional token budget.
| Operation | p50 | p95 |
|---|---|---|
| Memory search top-K | 1.23ms | 1.40ms |
| Context assembly | 1.34ms | 1.41ms |
| Exact cache lookup | 0.00ms | 0.00ms |
| Semantic cache lookup | 0.07ms | 0.07ms |
| Snapshot save | 5.79ms | - |
| Snapshot load | 11.60ms | - |
For many operational agents, 10K scoped memories is already a meaningful working set: current user/session memory, incident summaries, pinned runbooks, prior diagnoses, and cache entries.
Sparse-projection ANN is a derived in-memory candidate index. It can reduce filtered-search latency at larger memory counts, but it is recall-sensitive and can fall back to filtered scan when it cannot produce enough filtered candidates.
| Records | Search p50 | Search p95 | Context p50 | Context p95 | ANN rebuild |
|---|---|---|---|---|---|
| 10K | 0.19ms | 0.41ms | 0.38ms | 0.52ms | 12.36ms |
| 100K | 2.41ms | 4.68ms | 2.77ms | 2.98ms | 138.37ms |
| 1M | 32.03ms | 36.27ms | 25.48ms | 29.96ms | 1691.56ms |
This is useful as a current baseline, not the final word on million-memory search. Stronger ANN index families remain a follow-up area.
5.52M events/sec
The ingestion path captures live observations, tool calls, cache events, and model-call telemetry without turning the agent stack into a separate logging system.
272us query on 1M rows
Evidence retrieval stays fast enough to happen before the agent acts, not only after an incident review.
522ns Python zero-copy
Query results can move into Python, NumPy, Pandas, and PyTorch without serialization overhead.
0.07ms semantic cache lookup
Repeated operational prompts can reuse prior responses when application policy allows it.
Raw p50 latency is only one part of the picture. For an agent workload, measure the full turn:
That is the workload ZeptoDB is designed around: one timeline for facts, context, cache, and decisions.