Features

ZeptoDB has two cooperating layers: a microsecond time-series engine for live operational facts, and an Agent Memory layer for the context agents need to decide, reuse, and explain.

Agent Memory layer

Agent-scoped memory

Store durable memories with tenant, namespace, user, session, agent, type, content, metadata JSON, token count, importance, TTL, pinned status, access count, and timestamps.

Client-supplied embeddings

Applications provide float32 embeddings. ZeptoDB stores, validates, filters, and ranks them without calling embedding providers or LLMs from the server.

Context assembly

Retrieve top-K memories under a token budget, deduplicate repeated content, and rank by semantic similarity, importance, pinned boost, recency, and access count.

Exact + semantic cache

Check normalized prompt matches and embedding-similar responses before calling an external model provider. Useful for repeated operational questions.

Sidecar snapshots

Persist memory records and vectors to records.bin and vectors.bin, with configurable mutation-count flushing and stop-time force flush.

AgentOps telemetry

Track agent runs, retrieval events, cache events, LLM calls, and tool calls in ordinary ZeptoDB time-series tables beside the memory subsystem.

Time-series engine

5.52M events/sec

Lock-free MPMC ring buffer with Highway SIMD batch copy. Zero allocation on the hot path for telemetry, ticks, traces, tool calls, and sensor streams.

272μs query on 1M rows

LLVM JIT compiled execution and SIMD aggregation for live facts the agent must retrieve before acting.

ASOF JOIN

Point-in-time correct joins across heterogeneous streams. Foundational for sensor fusion, market data, incidents, and agent action replay.

Window JOIN

Join within a time window for sensor clock drift, event correlation, tool-call timelines, and trade/quote alignment.

Temporal analytics

EMA, VWAP, xbar, mavg, percentile, GROUP BY, window functions, CTEs, and subqueries in standard SQL.

Standard SQL

Analysts, quants, ML engineers, and platform teams can query the live timeline without learning a proprietary DSL.

How the layers work together

Need	Time-series core	Agent Memory layer
Live observations	Ingest sensors, ticks, traces, logs, tool calls	Store what the agent learned from them
Temporal recall	Query by timestamp, entity, window, ASOF relation	Retrieve prior episodes, summaries, and decisions
Grounding	Keep raw evidence and exact event order	Assemble relevant context under a token budget
Provider cost control	Store cache telemetry as events	Exact and semantic prompt cache lookup
Debugging	Replay the event timeline	Replay retrieved context, model calls, and agent decisions

Storage

Feature	Description
In-memory column store	Columnar layout optimized for SIMD scan and aggregation
Arena allocator	Custom memory management with no malloc on the hot path
Partition manager	Symbol and table scoped partitioning for locality and parallelism
Parquet HDB	Historical database on S3 / GCS / NFS. Hot + cold in one query
Agent Memory sidecar	`records.bin` for scalar metadata and `vectors.bin` for row-major `float32` embeddings
Compression	LZ4 for WAL, Parquet columnar compression for HDB

Client APIs

API	Typical use
Python zero-copy	ML pipelines, notebooks, agent loops, quant research
Python `connection.memory`	Put/search memories and assemble context
Python `connection.cache`	Store and lookup exact/semantic prompt cache entries
HTTP REST	SQL queries plus `/api/ai/*` memory and cache endpoints
C++ API	Embedded low-latency applications and custom services
Arrow Flight	Batch data transfer and distributed training
SQL CLI	Ad-hoc queries, operations, debugging

Security and isolation

Tenant and namespace fields on every memory and cache entry
Optional X-Zepto-Tenant-Id request header with conflict rejection
Memory content, prompts, responses, and metadata omitted from aggregate stats
TLS 1.3 with optional mTLS on endpoints and cluster RPC
RBAC with 5 built-in roles: admin, writer, reader, analyst, monitor
JWT / OIDC authentication and API key management
Audit logging for auth events, queries, and admin actions

Edition note: The current Agent Memory implementation is single-node. In cluster mode, route /api/ai/* to a sticky pod or treat the layer as a per-pod cache until cluster-consistent memory routing lands.

See the Security Operations Guide for configuration.

Clustering

Multi-node time-series storage with automatic sharding by symbol
Consistent hashing for partition placement
Health monitoring and automatic failover
Rolling upgrades with zero downtime
Cross-node time-series query routing and aggregation

Multi-node time-series clustering requires an Enterprise license. Agent Memory cluster-wide search and replicated memory writes are a follow-up design area.

See Multi-Node Cluster for setup.

Deployment

Docker for local development, CI, and single-node Agent Memory trials
Kubernetes / Helm for clustered time-series deployments
Sticky routing option for current /api/ai/* production pilots
Bare metal with NUMA / CPU pinning optimization
ARM Graviton for cost-efficient edge and cloud
Parquet HDB on S3, GCS, or local NFS

See Production Deployment for reference architectures.