Skip to content

Features

ZeptoDB has two cooperating layers: a microsecond time-series engine for live operational facts, and an Agent Memory layer for the context agents need to decide, reuse, and explain.


Agent-scoped memory

Store durable memories with tenant, namespace, user, session, agent, type, content, metadata JSON, token count, importance, TTL, pinned status, access count, and timestamps.

Client-supplied embeddings

Applications provide float32 embeddings. ZeptoDB stores, validates, filters, and ranks them without calling embedding providers or LLMs from the server.

Context assembly

Retrieve top-K memories under a token budget, deduplicate repeated content, and rank by semantic similarity, importance, pinned boost, recency, and access count.

Exact + semantic cache

Check normalized prompt matches and embedding-similar responses before calling an external model provider. Useful for repeated operational questions.

Sidecar snapshots

Persist memory records and vectors to records.bin and vectors.bin, with configurable mutation-count flushing and stop-time force flush.

AgentOps telemetry

Track agent runs, retrieval events, cache events, LLM calls, and tool calls in ordinary ZeptoDB time-series tables beside the memory subsystem.


5.52M events/sec

Lock-free MPMC ring buffer with Highway SIMD batch copy. Zero allocation on the hot path for telemetry, ticks, traces, tool calls, and sensor streams.

272μs query on 1M rows

LLVM JIT compiled execution and SIMD aggregation for live facts the agent must retrieve before acting.

ASOF JOIN

Point-in-time correct joins across heterogeneous streams. Foundational for sensor fusion, market data, incidents, and agent action replay.

Window JOIN

Join within a time window for sensor clock drift, event correlation, tool-call timelines, and trade/quote alignment.

Temporal analytics

EMA, VWAP, xbar, mavg, percentile, GROUP BY, window functions, CTEs, and subqueries in standard SQL.

Standard SQL

Analysts, quants, ML engineers, and platform teams can query the live timeline without learning a proprietary DSL.


NeedTime-series coreAgent Memory layer
Live observationsIngest sensors, ticks, traces, logs, tool callsStore what the agent learned from them
Temporal recallQuery by timestamp, entity, window, ASOF relationRetrieve prior episodes, summaries, and decisions
GroundingKeep raw evidence and exact event orderAssemble relevant context under a token budget
Provider cost controlStore cache telemetry as eventsExact and semantic prompt cache lookup
DebuggingReplay the event timelineReplay retrieved context, model calls, and agent decisions

FeatureDescription
In-memory column storeColumnar layout optimized for SIMD scan and aggregation
Arena allocatorCustom memory management with no malloc on the hot path
Partition managerSymbol and table scoped partitioning for locality and parallelism
Parquet HDBHistorical database on S3 / GCS / NFS. Hot + cold in one query
Agent Memory sidecarrecords.bin for scalar metadata and vectors.bin for row-major float32 embeddings
CompressionLZ4 for WAL, Parquet columnar compression for HDB

APITypical use
Python zero-copyML pipelines, notebooks, agent loops, quant research
Python connection.memoryPut/search memories and assemble context
Python connection.cacheStore and lookup exact/semantic prompt cache entries
HTTP RESTSQL queries plus /api/ai/* memory and cache endpoints
C++ APIEmbedded low-latency applications and custom services
Arrow FlightBatch data transfer and distributed training
SQL CLIAd-hoc queries, operations, debugging

  • Tenant and namespace fields on every memory and cache entry
  • Optional X-Zepto-Tenant-Id request header with conflict rejection
  • Memory content, prompts, responses, and metadata omitted from aggregate stats
  • TLS 1.3 with optional mTLS on endpoints and cluster RPC
  • RBAC with 5 built-in roles: admin, writer, reader, analyst, monitor
  • JWT / OIDC authentication and API key management
  • Audit logging for auth events, queries, and admin actions

Edition note: The current Agent Memory implementation is single-node. In cluster mode, route /api/ai/* to a sticky pod or treat the layer as a per-pod cache until cluster-consistent memory routing lands.

See the Security Operations Guide for configuration.


  • Multi-node time-series storage with automatic sharding by symbol
  • Consistent hashing for partition placement
  • Health monitoring and automatic failover
  • Rolling upgrades with zero downtime
  • Cross-node time-series query routing and aggregation

Multi-node time-series clustering requires an Enterprise license. Agent Memory cluster-wide search and replicated memory writes are a follow-up design area.

See Multi-Node Cluster for setup.


  • Docker for local development, CI, and single-node Agent Memory trials
  • Kubernetes / Helm for clustered time-series deployments
  • Sticky routing option for current /api/ai/* production pilots
  • Bare metal with NUMA / CPU pinning optimization
  • ARM Graviton for cost-efficient edge and cloud
  • Parquet HDB on S3, GCS, or local NFS

See Production Deployment for reference architectures.