Benchmarks

Reproducible benchmarks on commodity hardware. The same engine, the same numbers — whether the input is tick data, PMU streams, factory sensors, or vehicle telemetry. All numbers measured on a single node unless the section explicitly says otherwise.

Benchmark Criteria

These criteria are part of the benchmark claim. If a ZeptoDB number is copied without the test shape, hardware, build, and measurement method, treat it as an incomplete claim rather than a comparable result.

A comparable benchmark must disclose:

Scope: engine-only, HTTP, Python, EKS, or RDMA path; whether network, parsing, serialization, WAL, snapshot, or provider/model time is included.
Build: ZeptoDB commit or release, compiler, optimization flags, SIMD target, CMake options, and whether LTO/PGO/tcmalloc/hugepages were enabled.
Hardware: CPU model, core count, RAM, storage, kernel, cloud instance type or bare-metal host, NUMA placement, and CPU governor when available.
Dataset shape: row count, table schema, symbol/session/tenant cardinality, timestamp distribution, batch size, embedding dimensions, memory-record count, and cache state.
Run protocol: warm-up count, measured iterations, thread count, client count, duration, whether data is preloaded, and whether results are cold, warm, or cache-hit runs.
Metrics: p50, p95, and p99 where applicable; throughput plus tail latency for ingestion; rebuild/load/save time for derived indexes and snapshots.
Failure conditions: dropped rows, rejected requests, fallback-to-scan counts, out-of-memory behavior, and any retries or timeout exclusions.

For ZeptoDB-published numbers, the result is valid only for the scope shown in the section. Single-node numbers must not be reused as distributed claims. Engine-only numbers must not be presented as end-to-end HTTP or Python numbers. Cache-hit latency must not be presented as model-call latency. ANN results must report the index rebuild cost and whether search fell back to filtered scan.

Comparison tables are directional unless the same workload is rerun on the same hardware with equivalent durability, batching, schema, concurrency, and query semantics. Public third-party claims that omit those details are useful context, but they are not audited ZeptoDB benchmark results.

Hardware

Component	Spec
CPU	AMD EPYC 9654 (96 cores) / Intel Xeon Platinum 8488C
RAM	256 GB DDR5-4800 ECC
Storage	NVMe Gen4 (for WAL & Parquet HDB)
OS	Amazon Linux 2023, kernel 6.1
Compiler	Clang 19, `-O3 -march=native`

Ingestion Throughput

Scenario	Events/sec	Latency (p99)
Single stream (tick / sensor)	5.52M	181ns
Multi-symbol (1,000 streams)	4.8M	210ns
Kafka consumer (batch 10K)	3.2M	850μs batch
FIX 4.4 market data	1.1M	420ns parse+ingest

Lock-free MPMC ring buffer with Highway SIMD batch copy. Zero allocation on hot path. The ingestion path does not care whether a row came from an exchange, a PMU, a robot, or a vehicle bus.

Physical AI and Logistics Proof Workloads

The P9 logistics proof suite defines repeatable workload shapes for AGV, sorter, RFID, and cold-chain systems. These are benchmark shapes and pass criteria, not blanket production guarantees.

Workload	Rows/sec target	Query proof
2K AGV pose streams	200,000	Geofence and proximity filter
1M sorter lane events	1,000,000	Per-lane jam and anomaly aggregate
50K RFID reads	50,000	Entity timeline reconstruction
Cold-chain sensors	100,000	Audit range scan by shipment

Pass criteria include sustained target ingest for 10 minutes, no decode or ingest failures, p50/p99 query latency per workload, deterministic result parity, and matching result counts across x86_64 and aarch64 runs.

Factory 10KHz live competitor proof

The factory 10KHz proof was rerun against ZeptoDB, InfluxDB, and TimescaleDB with a fixed 10,000 rows/sec target for 60 seconds. This is a correctness and sustained-rate proof, not a maximum-throughput shootout.

System	Result	Duration	Inserted	Verified	Observed rows/sec
ZeptoDB	PASS	60.000s	600,000	600,000	9,999.98
InfluxDB	PASS	60.000s	600,000	600,000	9,999.98
TimescaleDB	PASS	60.008s	600,000	600,000	9,998.68

For logistics query patterns, see Logistics & Edge Automation.

Query Latency

All queries on 1M-row in-memory table, single thread. Table names (trades, quotes, sensors) are illustrative — the engine treats them identically.

Query	Latency
`SELECT * FROM trades WHERE sym='AAPL' AND ts > now()-1h`	272μs
`SELECT avg(price), max(volume) FROM trades GROUP BY sym`	185μs
`SELECT * FROM trades ASOF JOIN quotes USING(sym, ts)`	410μs
`SELECT sensor_id, ema(vibration, 100) FROM sensors`	320μs
`SELECT xbar(1m, ts) AS bucket, avg(reading) FROM sensors GROUP BY bucket`	290μs
Window JOIN (±500ms, sensor fusion)	580μs

LLVM JIT compilation. Vectorized execution with SIMD aggregation.

Python Zero-Copy

Operation	Latency
`conn.query("SELECT * FROM trades")` → NumPy array	522ns
DataFrame view (1M rows × 5 cols)	1.2μs
PyTorch tensor from query result	890ns

Direct memory-mapped view. No serialization, no copy, no Arrow conversion.

Agent Memory

Agent Memory benchmarks use client-supplied 128-dimensional embeddings and measure memory search, context assembly, exact cache lookup, semantic cache lookup, and sidecar snapshot save/load.

Embedding generation and LLM/provider calls are not included in these timings. Applications own embedding/model providers; ZeptoDB measures the database-side memory, cache, context, and snapshot paths.

10K memory records

Operation	p50	p95
Memory search top-K	1.23ms	1.40ms
Context assembly	1.34ms	1.41ms
Exact cache lookup	0.00ms	0.00ms
Semantic cache lookup	0.07ms	0.07ms
Snapshot save	5.79ms	—
Snapshot load	11.60ms	—

The memory layer ranks candidates by tenant/session filters, embedding similarity, importance, pinned boost, recency, and access count. Context assembly deduplicates repeated content and respects an optional token budget.

ANN modes and fixtures

On the current 8 vCPU benchmark instance, sparse-projection ANN reduced filtered-search latency at larger memory counts:

Records	Search p50	Search p95	Context p50	Context p95	ANN rebuild
10K	0.19ms	0.41ms	0.38ms	0.52ms	12.36ms
100K	2.41ms	4.68ms	2.77ms	2.98ms	138.37ms
1M	32.03ms	36.27ms	25.48ms	29.96ms	1691.56ms

The ANN path now includes sparse projection, HNSW, and IVF candidate modes, plus clustered and real-embedding fixtures. The index remains derived in-memory state: final filtering/ranking still applies, stats expose rebuilds, fallbacks, memory bytes, tombstone entries, and sidecar byte counts, and the system can fall back to filtered scan when an index cannot produce enough filtered candidates.

Comparison

These numbers summarize the operating envelope, not an audited vendor bake-off. Use the benchmark criteria above before comparing external results or republishing a single metric.

	ZeptoDB	kdb+	ClickHouse	TimescaleDB	InfluxDB
Ingestion (events/sec)	5.52M	~5M	100K	50K	50K
Point query latency	272μs	~300μs	~5ms	~10ms	~15ms
ASOF JOIN	✓	✓	✗	✗	✗
SQL	Standard	q lang	✓	✓	InfluxQL
Python zero-copy	522ns	IPC (~ms)	—	—	—
License cost	Free Community (BUSL-1.1)	$100K+/yr	Free	Free	Free

EKS Multi-Node (3× r7i.2xlarge)

Distributed benchmarks on EKS with 3 data nodes + 1 load generator, single AZ placement. Representative of fleet-scale telemetry, multi-venue tick capture, or multi-line sensor ingestion.

Scenario	Target	Notes
Distributed ingestion (3 nodes)	>12M events/sec	Linear scale from 4M/node
Per-node ingestion	>4M events/sec	Lock-free MPMC + consistent hash routing
Scatter-gather query (Tier A, single-node routing)	<1ms overhead	Direct routing via partition map
Scatter-gather query (Tier B, 3-node fan-out)	<5ms total	Fan-out <1ms + merge <1ms
Distributed ASOF JOIN	Sub-ms overhead	Cross-node timestamp alignment
Failover recovery	<15s	HealthMonitor dead_timeout=10s + pod restart
Linear scalability (1→2→3 nodes)	Near-linear	GROUP BY throughput scales with node count

Cluster: EKS zepto-bench (ap-northeast-2), K8s v1.35, Helm chart deployment. Cost: ~$12/run (2 hours) or ~$1.17/run with sleep/wake automation.

The full EKS rebalance integrity run now passes Stage 5/6 across amd64 and arm64: each architecture verified 50/50 symbols after rebalance using cluster HTTP SELECT, stable table identifiers, and the QueryCoordinator path.

amd64 vs arm64 (Graviton)

Tested on EKS with 6× amd64 (r7i/m7i/c7i) + 5× arm64 (m7g, Karpenter). All K8s tests passed 38/38 on both architectures. Graviton validation also covered Arrow IPC and Flight paths: Arrow IPC unit coverage passed 14/14, the full unit suite passed with the S3-only skip, live S3 checks passed 2/2, Arrow smoke inserted 3 rows with 0 failures, and the rebalance smoke passed.

Ingestion Throughput

Metric	amd64	arm64	Winner
Single-thread (batch=1)	4.39M/s	4.49M/s	arm64 +2%
Single-thread (batch=64)	4.85M/s	4.48M/s	amd64 +8%
Concurrent (1 thread)	1.73M/s	2.46M/s	arm64 +42%
Concurrent (4 threads)	1.88M/s	2.20M/s	arm64 +17%
E2E query throughput	983.7M rows/s	1608.1M rows/s	arm64 +63%
E2E query latency	10,166μs	6,218μs	arm64 −39%

SIMD Performance (Highway)

Operation (1M rows)	amd64 (AVX2)	arm64 (NEON)	Winner
sum_i64	264μs	241μs	arm64
filter_gt_i64	1,387μs	4,847μs	amd64 3.5×
vwap	530μs	466μs	arm64

amd64 (AVX2) has a significant advantage on filter/scan operations (BitMask). sum/vwap are comparable.

SQL Performance

Query	amd64	arm64	Winner
ASOF JOIN (parse)	10.37μs	7.41μs	arm64 −29%
VWAP (execute)	161.93μs	382.45μs	amd64 2.4×
Filter price (execute)	2,873μs	5,820μs	amd64 2.0×

SQL parsing is faster on arm64 (branch prediction). SQL execution is 2–2.4× faster on amd64 (SIMD vectorized scan).

Recommendation

Workload	Best Architecture	Why
Ingestion-heavy	arm64 (Graviton)	+17–42% concurrent throughput, ~20% cheaper
Query-heavy with filters	amd64	AVX2 SIMD 2–4× advantage on scan/filter
Mixed workloads	arm64	Better cost-performance; NEON gap closing with SVE2

RDMA / AWS EFA

UCX transport on AWS EFA (Elastic Fabric Adapter) for kernel-bypass networking.

Transport	64B Write Latency	4KB Bulk Write	Ingestion (3 nodes)
TCP RPC	~60μs	~3 GB/s	~12M events/sec
UCX/EFA RDMA	~2–5μs	~20 GB/s	~20–25M events/sec

Cost: ~$2.25/run (4× m7a.4xlarge Spot, 2 hours). See EKS Cluster Requirements for setup details.

Reproduce

git clone https://github.com/zeptodb/zeptodb.git && cd zeptodb
mkdir -p build && cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_C_COMPILER=clang-19 -DCMAKE_CXX_COMPILER=clang++-19
ninja -j$(nproc)

# Ingestion benchmark
./bench/bench_ingestion --symbols 1 --duration 10s

# Query benchmark
./bench/bench_query --rows 1000000 --iterations 100

# Python zero-copy
python3 ../bench/bench_python_zerocopy.py

See the ZeptoDB repository for source code, benchmark entry points, and release context.