Skip to content

ZeptoDB — Completed Features

Last updated: 2026-04-07


  • Phase E — E2E Pipeline MVP (5.52M ticks/sec)
  • Phase B — SIMD + JIT (BitMask 11x, filter within kdb+ range)
  • Phase A — HDB Tiered Storage (LZ4, 4.8GB/s flush)
  • Phase D — Python Bridge (zero-copy, 4x vs Polars)
  • Phase C — Distributed Cluster (UCX transport, 2ns routing)
  • SQL + HTTP — Parser (1.5~4.5μs) + ClickHouse API (port 8123)
  • SQL Phase 1 — IN operator, IS NULL/NOT NULL, NOT, HAVING clause
  • SQL Phase 2 — SELECT arithmetic (price * volume AS notional), CASE WHEN, multi-column GROUP BY
  • SQL Phase 3 — Date/time functions (DATE_TRUNC/NOW/EPOCH_S/EPOCH_MS), LIKE/NOT LIKE, UNION ALL/DISTINCT/INTERSECT/EXCEPT
  • SQL subqueries / CTE — WITH clause, FROM subquery, chained CTEs, distributed CTE — 12 tests
  • SQL INSERT — INSERT INTO table VALUES, multi-row, column list, HTTP API (ClickHouse Compatible)
  • SQL UPDATE / DELETE — UPDATE SET WHERE, DELETE FROM WHERE, in-place compaction
  • JOIN — ASOF, Hash, LEFT, RIGHT, FULL OUTER, Window JOIN
  • FlatHashMap for joins — CRC32 intrinsic open-addressing hash map, replaces std::unordered_map in all join operators (ASOF, Hash, Window) — 9 unit tests
  • Window functions — EMA, DELTA, RATIO, SUM, AVG, MIN, MAX, LAG, LEAD, ROW_NUMBER, RANK, DENSE_RANK
  • Financial functions — xbar, FIRST, LAST, Window JOIN, UNION JOIN (uj), PLUS JOIN (pj), AJ0
  • Parallel query — LocalQueryScheduler (scatter/gather, 3.48x@8T), CHUNKED mode
  • Time range index — O(log n) binary search within partitions, O(1) partition skip
  • Sorted column indexp#/g# style sorted attribute, O(log n) binary search range scan, 269x vs full scan — 13 tests
  • Materialized View — CREATE/DROP MATERIALIZED VIEW, incremental aggregation on ingest, OHLCV/SUM/COUNT/MIN/MAX/FIRST/LAST, xbar time bucket
  • MV query rewrite — Automatic rewrite of SELECT GROUP BY into direct MV lookup when matching MV exists. O(n) → O(1) for aggregation queries — 6 tests (devlog 064)
  • Parquet HDB — SNAPPY/ZSTD/LZ4_RAW, DuckDB/Polars/Spark direct query (Arrow C++ API)
  • S3 HDB Flush — async upload, MinIO compatible, cloud data lake
  • Storage tiering — Hot (memory) → Warm (SSD) → Cold (S3) → Drop, ALTER TABLE SET STORAGE POLICY, FlushManager auto-tiering
  • DDL / Schema Management — CREATE TABLE, DROP TABLE (IF EXISTS), ALTER TABLE (ADD/DROP COLUMN, SET TTL), TTL auto-eviction — 8 tests
  • Data Durability — Intra-day auto-snapshot (60s default), recovery replays on restart — max data loss ≤ 60s
  • Feed Handlers — FIX, NASDAQ ITCH (350ns parsing)
  • Kafka consumer — JSON/binary/human-readable decode, backpressure retry, Prometheus metrics, commit modes — 26 tests
  • Connection hooks & session tracking — on_connect/on_disconnect callbacks, session list, idle eviction, query count — 7 tests
  • Python Ecosystem — zepto_py: from_pandas/polars/arrow, ArrowSession, StreamingSession, ApexConnection — 208 tests
  • Python execute() — Full SQL access (SELECT, INSERT, UPDATE, DELETE, DDL, MV)
  • Enterprise Security — TLS/HTTPS, API Key + JWT/OIDC, RBAC, Rate Limiting, Admin REST API, Query Timeout/Kill, Secrets Management (Vault/File/Env), Audit Log (SOC2/EMIR/MiFID II) — 69 tests
  • Vault-backed API Key Store — Write-through sync of API keys to HashiCorp Vault KV v2, multi-node key sharing via Vault, graceful degradation when Vault unavailable — 8 tests
  • Multi-tenancy — TenantManager, per-tenant query concurrency quota, table namespace isolation, usage tracking
  • Cluster Integrity — Unified PartitionRouter, FencingToken in RPC (24-byte header), split-brain defense (K8s Lease), CoordinatorHA auto re-registration — 13 tests
  • Distributed DML routing — INSERT routes to symbol node, UPDATE/DELETE broadcast, DDL broadcast
  • RingConsensus (P8-Critical)RingConsensus abstract interface + EpochBroadcastConsensus implementation. Coordinator epoch broadcast synchronizes the ring across all nodes. RING_UPDATE/RING_ACK RPC messages. ClusterConfig::is_coordinator flag. Plugin architecture replaceable with Raft (set_consensus())
  • CoordinatorHA ↔ K8sLease integration (P8-Critical) — K8sLease acquisition required on standby→active promotion path (require_lease), FencingToken::advance() + RPC client epoch propagation, automatic demote on lease loss
  • WalReplicator replication guarantee (P8-Critical) — Quorum write (quorum_w), failure retry queue (max_retries/retry_queue_capacity), backpressure (backpressure — producer block), backward compatible with existing async/sync modes
  • Failover data recovery (P8-Critical) — Auto re-replication built into FailoverManager (auto_re_replicate/async_re_replicate). PartitionMigrator integration, node registration via register_node(), graceful fallback when unregistered
  • Internal RPC security (P8-Critical)RpcSecurityConfig shared-secret HMAC authentication. AUTH_HANDSHAKE/AUTH_OK/AUTH_REJECT protocol. mTLS configuration structure prepared
  • HealthMonitor DEAD recovery (P8-High)REJOINING state added (DEAD→REJOINING→ACTIVE). on_rejoin() callback for data resynchronization control. Router auto-readds node on REJOINING→ACTIVE transition in ClusterNode
  • HealthMonitor UDP fault tolerance (P8-High) — Consecutive miss verification (default 3 times), fatal error on bind failure, secondary TCP heartbeat (dual verification with TCP probe before SUSPECT→DEAD transition)
  • TcpRpcServer resource management (P8-High) — Thread pool conversion (detach→fixed worker pool+task queue), payload size limit (64MB), graceful drain (30-second timeout), concurrent connection limit (1024)
  • PartitionRouter concurrency (P8-High) — Built-in ring_mutex_ (shared_mutex). add/remove uses unique_lock, route/plan uses shared_lock. TOCTOU eliminated
  • TcpRpcClient::ping() connection leak (P8-High) — connect_to_server()+close() → acquire()/release() pool recycling
  • GossipNodeRegistry data race (P8-Medium)bool running_std::atomic<bool>. Multithreaded UB eliminated
  • K8sNodeRegistry deadlock (P8-Medium)fire_event_unlocked() removed. Changed to release lock before invoking callbacks
  • ClusterNode node rejoin (P8-Medium) — Seed connection success count, std::runtime_error on total failure. Bootstrap (no seeds) allowed normally
  • SnapshotCoordinator consistency (P8-Medium) — 2PC (PREPARE→COMMIT/ABORT). Pauses ingest on all nodes then flushes at a consistent point-in-time. ABORT on all nodes on failure. take_snapshot_legacy() backward compatible
  • K8sNodeRegistry actual implementation (P8-Medium) — poll_loop() performs K8s Endpoints API HTTP GET. Auto-detects environment variables, SA token authentication, parse_endpoints_json()+reconcile() diff→JOINED/LEFT events
  • PartitionMigrator atomicity (P8-Medium) — MoveState state machine (PENDING→DUAL_WRITE→COPYING→COMMITTED/FAILED), MigrationCheckpoint JSON disk persistence (save/load), resume_plan() retry (max_retries=3), rollback_move() — sends DELETE to dest on failure
  • Dual-write ingestion wiring (P8-Feature)ClusterNode::ingest_tick() checks migration_target() before routing; during partition migration, ticks are sent to both source and destination nodes to prevent data loss
  • Live rebalancing (P8-Feature)RebalanceManager orchestrates zero-downtime partition migration on node add/remove. Background thread with pause/resume/cancel, checkpoint support, sequential move execution via PartitionMigrator
  • Load-based auto-rebalancing (P8-Feature)RebalancePolicy with configurable imbalance ratio, check interval, and cooldown. Background policy thread monitors per-node partition counts via LoadProvider callback and auto-triggers start_remove_node() on overloaded nodes
  • Rebalance admin HTTP API (P8-Feature) — 5 REST endpoints (/admin/rebalance/{status,start,pause,resume,cancel}) for live rebalance control. Admin RBAC enforced, JSON request/response, 503 when not in cluster mode
  • Rebalance hardening: peer_rpc_clients_ thread safety (P8-Feature)std::shared_mutex protects peer_rpc_clients_ map in ClusterNode. shared_lock for reads in remote_ingest() hot path, unique_lock for writes. Race-safe lazy client creation — 1 test
  • Rebalance hardening: move timeout (P8-Feature)move_timeout_sec in RebalanceConfig (default 300s). PartitionMigrator::execute_move() wraps migrate_symbol() in std::async + wait_for. On timeout: FAILED + dual-write ended — 2 tests
  • Rebalance hardening: query routing safety (P8-Feature)recently_migrated_ map in PartitionRouter. After end_migration(), recently_migrated(symbol) returns {from, to} during grace period (default 30s). Auto-expires. Query layer reads from both nodes during transition — 5 tests
  • Partial-move rebalance API (P8-Feature)start_move_partitions(vector<Move>) moves specific symbols between existing nodes without full drain. HTTP move_partitions action in /admin/rebalance/start. No ring topology broadcast — 6 tests
  • Rebalance progress in Web UI (P8-Feature) — cluster dashboard panel showing live rebalance state, progress bar, completed/failed/total moves, current symbol. Auto-refreshes every 2s via /admin/rebalance/status
  • Rebalance history endpoint (P8-Feature)GET /admin/rebalance/history returns past rebalance events (action, node, moves, duration, cancelled). In-memory ring buffer (max 50). Web UI history table on cluster dashboard — 5 tests
  • Rebalance ring broadcast (P8-Feature)RebalanceManager calls RingConsensus::propose_add/remove() after all moves complete, synchronizing hash ring across all cluster nodes. Skipped on cancel. set_consensus() setter, RebalanceAction enum — 3 tests
  • Rebalance bandwidth throttling (P8-Feature)BandwidthThrottler rate-limits partition migration data transfer. Configurable max_bandwidth_mbps (0=unlimited). Sliding window with sleep-based backpressure. Thread-safe atomic counters — 10 tests
  • PTP clock sync detection (P8-Feature)PtpClockDetector checks PTP hardware/chrony/timesyncd synchronization quality. 4 states (SYNCED/DEGRADED/UNSYNC/UNAVAILABLE). strict_mode rejects distributed ASOF JOIN on bad sync. GET /admin/clock endpoint — 22 tests
  • Rebalance bandwidth throttling (P8-Feature)BandwidthThrottler rate-limits partition migration data transfer. Sliding 1-second window, thread-safe atomics, runtime adjustable via set_max_bandwidth_mbps(). Wired into PartitionMigrator::migrate_symbol(). Exposed in /admin/rebalance/status JSON — 10 tests
  • Production operations — monitoring, backup, systemd service
  • Kubernetes operations — Helm chart (PDB/HPA/ServiceMonitor), rolling upgrade, K8s operations guide, Karpenter Fleet API
  • ARM Graviton build verification — aarch64 (Amazon Linux 2023, Clang 19.1.7), 766/766 tests passing, xbar 7.99ms (1M rows)
  • Metrics provider — pluggable Prometheus metrics, Kafka stats integration — 4 tests
  • Task scheduler — interval/once jobs, cancel, exception-safe, monotonic clock — 18 tests
  • Multi-node metrics collection — METRICS_REQUEST/METRICS_RESULT RPC, parallel fan-out, ClusterNode callback registration — 10 tests
  • HTTP observability — structured JSON access log, slow query log (>100ms), X-Request-Id tracing, server lifecycle events, Prometheus http_requests_total/active_sessions — 2 tests
  • /whoami endpoint — returns authenticated role and subject for reliable client-side role detection — 1 test
  • Web UI cluster page — node status table, per-node metrics history charts (ingestion/queries/latency), recharts type fix
  • API key granular control — symbol/table ACL, tenant binding, key expiry, PATCH update endpoint, Web UI create/edit dialogs — 6 tests
  • Query Editor: resizable height (QE-10) — drag divider between editor and result area, 80–600px range, replaces fixed 180px
  • Query Editor: schema sidebar (QE-6) — left panel with table/column tree, click to insert into editor, refresh button
  • Query Editor: ZeptoDB function autocomplete (QE-7)xbar, vwap, ema, wma, mavg, msum, deltas, ratios, fills + SQL keyword snippets (ASOF JOIN, EXPLAIN, etc.)
  • Query Editor: result chart view (QE-5) — table/chart toggle (line/bar), X/Y column selectors, Recharts, 500-row cap
  • Query Editor: multi-tab editor (QE-1) — add/close/rename tabs, independent code & results per tab, localStorage persistence
  • Query Editor: multi-statement run (QE-9);-split sequential execution, per-statement result sub-tabs, per-statement error display
  • SSO/JWT CLI + JWKS auto-fetch--jwt-* / --jwks-url CLI flags, JWKS background key rotation, kid-based multi-key, POST /admin/auth/reload runtime refresh — 3 tests
  • Bare-metal tuning guide — CPU pinning, NUMA, hugepages, C-state, tcmalloc/LTO/PGO build, network tuning, benchmarking — docs/deployment/BARE_METAL_TUNING.md
  • Migration toolkit — kdb+ HDB loader, q→SQL, ClickHouse DDL/query translation, DuckDB Parquet, TimescaleDB hypertable — 126 tests
  • Native float/double — IEEE 754 float32/float64 in storage, SQL, and HTTP output
  • String symbol (dictionary-encoded)INSERT/SELECT/WHERE/GROUP BY/VWAP/FIRST/LAST with 'AAPL' syntax, LowCardinality dictionary encoding, distributed scatter-gather support — 29 tests
  • Arrow Flight server (P3) — gRPC-based Arrow Flight RPC: DoGet (SQL→RecordBatch stream), DoPut (ingest), GetFlightInfo, ListFlights, DoAction (ping/healthcheck). Python pyarrow.flight.connect("grpc://host:8815") for remote zero-copy-grade streaming. Stub mode when built without Flight. — 7 tests

Documentation — Getting Started & Onboarding

Section titled “Documentation — Getting Started & Onboarding”
  • Quick Start Guide — 5-minute onboarding: Docker → INSERT → SELECT → Python → Web UI
  • Interactive Playground design — Browser-based sandboxed SQL editor with preloaded datasets, session isolation, rate limiting
  • Example Dataset Bundle design--demo flag: 350K rows (trades/quotes/sensors), deterministic generation, starter queries on stdout
  • Dark/light theme toggle (QE-11) — CodeMirror theme syncs with MUI palette mode (TopBar toggle)
  • Result column sorting (QE-13) — click column header to cycle ASC/DESC/none, arrow indicators, numeric-aware sort
  • Result column filtering (QE-14) — per-column text filter row (toggle via filter icon), case-insensitive, match count display
  • Query history search & pin (QE-2) — search input in history panel, pin/unpin toggle, pinned items sorted to top, localStorage persistence
  • Saved queries (QE-3) — name + save to localStorage, load/delete from Saved panel, separate from history
  • Syntax error inline marker (QE-8) — parse error line from server response, highlight error line in CodeMirror with red decoration
  • Query execution cancel (QE-12) — AbortController-based cancellation, Run button becomes Cancel while loading, abort signal passed to fetch
  • Execution time history sparkline (QE-15) — SVG sparkline of last 20 query execution times, displayed in result header
  • EXPLAIN visualization (QE-4) — EXPLAIN results rendered as visual tree with colored operation/path/table nodes (+ server fix: string_rows JSON serialization for EXPLAIN/DDL)
  • Table detail page (/tables/[name]) — dedicated route with schema, column stats (min/max), row count cards, data preview; tables list navigates on click
  • Settings page enhancement — server info section (engine version, build date, health status) alongside runtime config
  • Login page polish — gradient accent, tagline chip, keyboard hint, Quick Start link, footer branding
  • Dashboard overview page — Health status, version info, 5 stat cards (ingested/stored/queries/partitions/latency), drop rate warning, ingestion rate live chart, tables summary with row counts, rows-per-table bar chart, avg query cost
  • Cluster status dashboard — Node topology ring visualization, partition distribution pie chart, node health table with store ratio bars, ticks-stored bar chart, time-series charts (ingestion/queries/latency per node), drop rate alert
  • Dashboard as default landing/ redirects to /dashboard, Dashboard first in sidebar, visible to all roles (admin/writer/reader/analyst/metrics)
  • API client template literal fix — Fixed broken string literals in api.ts and auth.tsx (backtick+double-quote mix from API variable introduction), all fetch URLs now use proper template literals with ${API} prefix
  • API URL consistency — All API calls use configurable API base path constant, supports both same-origin (Docker) and proxy (Next.js dev) modes
  • Docs site (docs.zeptodb.com) — mkdocs-material deployment
  • Docs nav update — Added 40+ missing pages (devlog 024-040, Flight API, multinode_stability, etc.)
  • Performance comparison page — vs kdb+/ClickHouse/TimescaleDB benchmark charts
  • SEO basics — sitemap, Open Graph, meta tags (mkdocs-material auto-generated)
  • GitHub README renewal — badges with logos, architecture diagram, emoji sections, GIF demo placeholder, navigation links, community section, updated test count (830+)
  • Community infrastructure — CONTRIBUTING.md, CODE_OF_CONDUCT.md, GitHub Issue templates (bug/feature/perf), FUNDING.yml
  • Community setup guide — Discord server structure (channels/roles/bots), GitHub Discussions categories — docs/community/COMMUNITY_SETUP.md
  • Registry submission content — Awesome Time-Series DB PR text, DB-Engines form data, DBDB/AlternativeTo/StackShare — docs/community/REGISTRY_SUBMISSIONS.md
  • Launch post drafts — Show HN, Reddit (r/programming, r/cpp, r/algotrading, r/selfhosted), timing strategy, launch day checklist — docs/community/LAUNCH_POSTS.md
  • Discord server created — Server ID 1492174712359354590, invite link https://discord.gg/zeptodb
  • Discord links added to Web UI — Join Discord button on home page, Discord link in sidebar
  • OIDC DiscoveryOidcDiscovery::fetch(issuer_url) auto-populates jwks_uri, authorization/token endpoints from /.well-known/openid-configuration. AuthManager auto-registers IdP + JWT validator — 2 tests
  • Server-side sessionsSessionStore with cookie-based session management. Configurable TTL (1h default), sliding window refresh, HttpOnly/SameSite cookies. AuthManager::check_session() resolves cookie → AuthContext — 10 tests
  • Web UI SSO login flow — OAuth2 Authorization Code Flow: /auth/login (redirect to IdP), /auth/callback (code exchange → session cookie → redirect), /auth/session (Bearer → session), /auth/logout, /auth/me. Web UI “Sign in with SSO” button enabled, session-aware auth provider — 3 tests
  • JWT Refresh TokenOAuth2TokenExchange::refresh() exchanges refresh_token for new access_token. POST /auth/refresh server endpoint. Session store tracks refresh_token per session. Web UI useAuth().refresh() hook — 4 tests
  • INTERVAL syntaxINTERVAL 'N unit' in SELECT and WHERE expressions. Supports seconds/minutes/hours/days/weeks/ms/μs/ns. Evaluates to nanoseconds. Works with NOW() - INTERVAL '5 minutes' in WHERE clauses — 3 tests
  • Prepared statement cache — Parsed AST cached by SQL hash (up to 4096 entries). Eliminates tokenize+parse overhead (~2-5μs) on repeated queries. Thread-safe with clear_prepared_cache() API — 1 test
  • Query result cache — TTL-based result cache for SELECT queries. enable_result_cache(max_entries, ttl_seconds). Auto-invalidated on INSERT/UPDATE/DELETE. Oldest-entry eviction when full — 2 tests
  • SAMPLE clauseSELECT * FROM trades SAMPLE 0.1 reads ~10% of rows. Deterministic hash-based sampling (splitmix64) for reproducible results. Works with WHERE, GROUP BY, aggregation. Shown in EXPLAIN plan — 8 tests
  • Scalar subqueries in WHEREWHERE price > (SELECT avg(price) FROM trades) and WHERE symbol IN (SELECT symbol FROM ...). Uncorrelated subqueries evaluated once and substituted as literals before outer scan. IN results auto-deduplicated. Error on multi-row/multi-column scalar subqueries — 8 tests
  • Docker Hub official imagedocker pull zeptodb/zeptodb:0.0.1. GitHub Actions workflow (docker-publish.yml) builds on tag push (v*) or manual dispatch. Multi-stage build, non-root user, health check endpoint
  • GitHub Releases + binaries — Release workflow builds amd64 + arm64 tarballs, creates GitHub Release with download links on tag push
  • Homebrew Formulahomebrew-tap repo with auto-update workflow triggered on release via repository_dispatch
  • Node.js 24 migration — All workflows set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true to preempt June 2026 deprecation
  • Deprecated docs.yml cleanup — Removed legacy MkDocs deploy workflow (replaced by Astro Starlight site)
  • TestPyPI workflow fix — Changed test-pypi.yml to target TestPyPI (test.pypi.org) with separate testpypi environment
  • Product website — Astro Starlight site (zeptodb-site/). Landing page with hero + benchmark comparison table + use case cards + CTA
  • Features page — Ingestion engine, query engine, storage, client APIs, security, clustering, deployment
  • Benchmarks page — Hardware specs, ingestion throughput, query latency, Python zero-copy numbers
  • Use Cases (4 pages) — Trading & Finance, IoT, Robotics, Autonomous Vehicles with architecture diagrams and SQL examples
  • Competitor comparisons (4 pages) — vs kdb+, vs ClickHouse, vs InfluxDB, vs TimescaleDB
  • Pricing page — Community (Free/OSS) vs Enterprise tiers with FAQ
  • Blog (4 posts) — Introducing ZeptoDB, How ASOF JOIN Works, Zero-Copy Python (522ns), Lock-Free Ingestion (5.52M/sec)
  • About / Contact / Community pages — Mission, tech philosophy, contributing guide, roadmap
  • Security page — TLS, Auth, RBAC, Rate Limiting, Audit, Compliance matrix (SOC2/MiFID II/GDPR/PCI)
  • Integrations page — Feed handlers, client libraries, monitoring, storage/cloud, auth providers, roadmap integrations
  • Docs site deployment automation — GitHub Actions build-deploy.yml (push + repository_dispatch), sync-docs.mjs for ZeptoDB docs sync
  • Custom header navigation — Product/Solutions/Docs/Pricing/Community top nav with GitHub Stars badge

  • K8s compatibility test suite — 27 automated tests covering Helm lint/template, pod lifecycle, networking, rolling updates, PDB, scale up/down (tests/k8s/test_k8s_compat.py)
  • K8s HA + performance test suite — 6 HA tests (3-node spread, node drain, concurrent drain PDB block, pod kill recovery, zero-downtime rolling update, scale 3→5→3) + 5 performance benchmarks (tests/k8s/test_k8s_ha_perf.py)
  • EKS test cluster config — Lightweight cluster definition for automated testing (tests/k8s/eks-compat-cluster.yaml)
  • K8s test report — Full results, benchmark numbers, Helm chart issues found (docs/operations/K8S_TEST_REPORT.md)

  • bench_rebalance binary — HTTP-based load test measuring rebalance impact on throughput/latency (tests/bench/bench_rebalance.cpp)
  • Helm rebalance config — bench-rebalance-values.yaml with RebalanceManager enabled (deploy/helm/bench-rebalance-values.yaml)
  • Orchestration script — Automated test execution on EKS (deploy/scripts/run_rebalance_bench.sh)
  • Benchmark guide — Prerequisites, execution, expected results, cost estimate (docs/bench/rebalance_benchmark_guide.md)

Client API Compatibility Matrix: docs/design/client_compatibility.md ❌ | ❌ | ✅ /admin/audit | ❌ | ❌ | ✅ /admin/audit | ❌ | ❌ | ✅ /admin/audit |