Agent-scoped memory
Store durable memories with tenant, namespace, user, session, agent, type, content, metadata JSON, token count, importance, TTL, pinned status, access count, and timestamps.
ZeptoDB has two cooperating layers: a microsecond time-series engine for live operational facts, and an Agent Memory layer for the context agents need to decide, reuse, and explain.
Agent-scoped memory
Store durable memories with tenant, namespace, user, session, agent, type, content, metadata JSON, token count, importance, TTL, pinned status, access count, and timestamps.
Client-supplied embeddings
Applications provide float32 embeddings. ZeptoDB stores, validates, filters, and ranks them without calling embedding providers or LLMs from the server.
Context assembly
Retrieve top-K memories under a token budget, deduplicate repeated content, and rank by semantic similarity, importance, pinned boost, recency, and access count.
Exact + semantic cache
Check normalized prompt matches and embedding-similar responses before calling an external model provider. Useful for repeated operational questions.
Sidecar snapshots
Persist memory records and vectors to records.bin and vectors.bin, with configurable mutation-count flushing and stop-time force flush.
AgentOps telemetry
Track agent runs, retrieval events, cache events, LLM calls, and tool calls in ordinary ZeptoDB time-series tables beside the memory subsystem.
5.52M events/sec
Lock-free MPMC ring buffer with Highway SIMD batch copy. Zero allocation on the hot path for telemetry, ticks, traces, tool calls, and sensor streams.
272μs query on 1M rows
LLVM JIT compiled execution and SIMD aggregation for live facts the agent must retrieve before acting.
ASOF JOIN
Point-in-time correct joins across heterogeneous streams. Foundational for sensor fusion, market data, incidents, and agent action replay.
Window JOIN
Join within a time window for sensor clock drift, event correlation, tool-call timelines, and trade/quote alignment.
Temporal analytics
EMA, VWAP, xbar, mavg, percentile, GROUP BY, window functions, CTEs, and subqueries in standard SQL.
Standard SQL
Analysts, quants, ML engineers, and platform teams can query the live timeline without learning a proprietary DSL.
| Need | Time-series core | Agent Memory layer |
|---|---|---|
| Live observations | Ingest sensors, ticks, traces, logs, tool calls | Store what the agent learned from them |
| Temporal recall | Query by timestamp, entity, window, ASOF relation | Retrieve prior episodes, summaries, and decisions |
| Grounding | Keep raw evidence and exact event order | Assemble relevant context under a token budget |
| Provider cost control | Store cache telemetry as events | Exact and semantic prompt cache lookup |
| Debugging | Replay the event timeline | Replay retrieved context, model calls, and agent decisions |
| Feature | Description |
|---|---|
| In-memory column store | Columnar layout optimized for SIMD scan and aggregation |
| Arena allocator | Custom memory management with no malloc on the hot path |
| Partition manager | Symbol and table scoped partitioning for locality and parallelism |
| Parquet HDB | Historical database on S3 / GCS / NFS. Hot + cold in one query |
| Agent Memory sidecar | records.bin for scalar metadata and vectors.bin for row-major float32 embeddings |
| Compression | LZ4 for WAL, Parquet columnar compression for HDB |
| API | Typical use |
|---|---|
| Python zero-copy | ML pipelines, notebooks, agent loops, quant research |
Python connection.memory | Put/search memories and assemble context |
Python connection.cache | Store and lookup exact/semantic prompt cache entries |
| HTTP REST | SQL queries plus /api/ai/* memory and cache endpoints |
| C++ API | Embedded low-latency applications and custom services |
| Arrow Flight | Batch data transfer and distributed training |
| SQL CLI | Ad-hoc queries, operations, debugging |
X-Zepto-Tenant-Id request header with conflict rejectionEdition note: The current Agent Memory implementation is single-node. In cluster mode, route
/api/ai/*to a sticky pod or treat the layer as a per-pod cache until cluster-consistent memory routing lands.
See the Security Operations Guide for configuration.
Multi-node time-series clustering requires an Enterprise license. Agent Memory cluster-wide search and replicated memory writes are a follow-up design area.
See Multi-Node Cluster for setup.
/api/ai/* production pilotsSee Production Deployment for reference architectures.