ZeptoDB — Completed Features
Last updated: 2026-04-07
Core Engine
Section titled “Core Engine”- Phase E — E2E Pipeline MVP (5.52M ticks/sec)
- Phase B — SIMD + JIT (BitMask 11x, filter within kdb+ range)
- Phase A — HDB Tiered Storage (LZ4, 4.8GB/s flush)
- Phase D — Python Bridge (zero-copy, 4x vs Polars)
- Phase C — Distributed Cluster (UCX transport, 2ns routing)
SQL Engine
Section titled “SQL Engine”- SQL + HTTP — Parser (1.5~4.5μs) + ClickHouse API (port 8123)
- SQL Phase 1 — IN operator, IS NULL/NOT NULL, NOT, HAVING clause
- SQL Phase 2 — SELECT arithmetic (
price * volume AS notional), CASE WHEN, multi-column GROUP BY - SQL Phase 3 — Date/time functions (DATE_TRUNC/NOW/EPOCH_S/EPOCH_MS), LIKE/NOT LIKE, UNION ALL/DISTINCT/INTERSECT/EXCEPT
- SQL subqueries / CTE — WITH clause, FROM subquery, chained CTEs, distributed CTE — 12 tests
- SQL INSERT — INSERT INTO table VALUES, multi-row, column list, HTTP API (ClickHouse Compatible)
- SQL UPDATE / DELETE — UPDATE SET WHERE, DELETE FROM WHERE, in-place compaction
JOIN & Window Functions
Section titled “JOIN & Window Functions”- JOIN — ASOF, Hash, LEFT, RIGHT, FULL OUTER, Window JOIN
- FlatHashMap for joins — CRC32 intrinsic open-addressing hash map, replaces
std::unordered_mapin all join operators (ASOF, Hash, Window) — 9 unit tests - Window functions — EMA, DELTA, RATIO, SUM, AVG, MIN, MAX, LAG, LEAD, ROW_NUMBER, RANK, DENSE_RANK
- Financial functions — xbar, FIRST, LAST, Window JOIN, UNION JOIN (uj), PLUS JOIN (pj), AJ0
Query Execution
Section titled “Query Execution”- Parallel query — LocalQueryScheduler (scatter/gather, 3.48x@8T), CHUNKED mode
- Time range index — O(log n) binary search within partitions, O(1) partition skip
- Sorted column index —
p#/g#style sorted attribute, O(log n) binary search range scan, 269x vs full scan — 13 tests - Materialized View — CREATE/DROP MATERIALIZED VIEW, incremental aggregation on ingest, OHLCV/SUM/COUNT/MIN/MAX/FIRST/LAST, xbar time bucket
- MV query rewrite — Automatic rewrite of SELECT GROUP BY into direct MV lookup when matching MV exists. O(n) → O(1) for aggregation queries — 6 tests (devlog 064)
Storage
Section titled “Storage”- Parquet HDB — SNAPPY/ZSTD/LZ4_RAW, DuckDB/Polars/Spark direct query (Arrow C++ API)
- S3 HDB Flush — async upload, MinIO compatible, cloud data lake
- Storage tiering — Hot (memory) → Warm (SSD) → Cold (S3) → Drop, ALTER TABLE SET STORAGE POLICY, FlushManager auto-tiering
- DDL / Schema Management — CREATE TABLE, DROP TABLE (IF EXISTS), ALTER TABLE (ADD/DROP COLUMN, SET TTL), TTL auto-eviction — 8 tests
- Data Durability — Intra-day auto-snapshot (60s default), recovery replays on restart — max data loss ≤ 60s
Ingestion & Feed Handlers
Section titled “Ingestion & Feed Handlers”- Feed Handlers — FIX, NASDAQ ITCH (350ns parsing)
- Kafka consumer — JSON/binary/human-readable decode, backpressure retry, Prometheus metrics, commit modes — 26 tests
- Connection hooks & session tracking — on_connect/on_disconnect callbacks, session list, idle eviction, query count — 7 tests
Python Ecosystem
Section titled “Python Ecosystem”- Python Ecosystem — zepto_py: from_pandas/polars/arrow, ArrowSession, StreamingSession, ApexConnection — 208 tests
- Python execute() — Full SQL access (SELECT, INSERT, UPDATE, DELETE, DDL, MV)
Security & Multi-Tenancy
Section titled “Security & Multi-Tenancy”- Enterprise Security — TLS/HTTPS, API Key + JWT/OIDC, RBAC, Rate Limiting, Admin REST API, Query Timeout/Kill, Secrets Management (Vault/File/Env), Audit Log (SOC2/EMIR/MiFID II) — 69 tests
- Vault-backed API Key Store — Write-through sync of API keys to HashiCorp Vault KV v2, multi-node key sharing via Vault, graceful degradation when Vault unavailable — 8 tests
- Multi-tenancy — TenantManager, per-tenant query concurrency quota, table namespace isolation, usage tracking
Cluster & HA
Section titled “Cluster & HA”- Cluster Integrity — Unified PartitionRouter, FencingToken in RPC (24-byte header), split-brain defense (K8s Lease), CoordinatorHA auto re-registration — 13 tests
- Distributed DML routing — INSERT routes to symbol node, UPDATE/DELETE broadcast, DDL broadcast
- RingConsensus (P8-Critical) —
RingConsensusabstract interface +EpochBroadcastConsensusimplementation. Coordinator epoch broadcast synchronizes the ring across all nodes.RING_UPDATE/RING_ACKRPC messages.ClusterConfig::is_coordinatorflag. Plugin architecture replaceable with Raft (set_consensus()) - CoordinatorHA ↔ K8sLease integration (P8-Critical) — K8sLease acquisition required on standby→active promotion path (
require_lease),FencingToken::advance()+ RPC client epoch propagation, automatic demote on lease loss - WalReplicator replication guarantee (P8-Critical) — Quorum write (
quorum_w), failure retry queue (max_retries/retry_queue_capacity), backpressure (backpressure— producer block), backward compatible with existing async/sync modes - Failover data recovery (P8-Critical) — Auto re-replication built into FailoverManager (
auto_re_replicate/async_re_replicate). PartitionMigrator integration, node registration viaregister_node(), graceful fallback when unregistered - Internal RPC security (P8-Critical) —
RpcSecurityConfigshared-secret HMAC authentication. AUTH_HANDSHAKE/AUTH_OK/AUTH_REJECT protocol. mTLS configuration structure prepared - HealthMonitor DEAD recovery (P8-High) —
REJOININGstate added (DEAD→REJOINING→ACTIVE).on_rejoin()callback for data resynchronization control. Router auto-readds node on REJOINING→ACTIVE transition in ClusterNode - HealthMonitor UDP fault tolerance (P8-High) — Consecutive miss verification (default 3 times), fatal error on bind failure, secondary TCP heartbeat (dual verification with TCP probe before SUSPECT→DEAD transition)
- TcpRpcServer resource management (P8-High) — Thread pool conversion (detach→fixed worker pool+task queue), payload size limit (64MB), graceful drain (30-second timeout), concurrent connection limit (1024)
- PartitionRouter concurrency (P8-High) — Built-in
ring_mutex_(shared_mutex). add/remove uses unique_lock, route/plan uses shared_lock. TOCTOU eliminated - TcpRpcClient::ping() connection leak (P8-High) — connect_to_server()+close() → acquire()/release() pool recycling
- GossipNodeRegistry data race (P8-Medium) —
bool running_→std::atomic<bool>. Multithreaded UB eliminated - K8sNodeRegistry deadlock (P8-Medium) —
fire_event_unlocked()removed. Changed to release lock before invoking callbacks - ClusterNode node rejoin (P8-Medium) — Seed connection success count,
std::runtime_erroron total failure. Bootstrap (no seeds) allowed normally - SnapshotCoordinator consistency (P8-Medium) — 2PC (PREPARE→COMMIT/ABORT). Pauses ingest on all nodes then flushes at a consistent point-in-time. ABORT on all nodes on failure.
take_snapshot_legacy()backward compatible - K8sNodeRegistry actual implementation (P8-Medium) — poll_loop() performs K8s Endpoints API HTTP GET. Auto-detects environment variables, SA token authentication, parse_endpoints_json()+reconcile() diff→JOINED/LEFT events
- PartitionMigrator atomicity (P8-Medium) — MoveState state machine (PENDING→DUAL_WRITE→COPYING→COMMITTED/FAILED), MigrationCheckpoint JSON disk persistence (save/load), resume_plan() retry (max_retries=3), rollback_move() — sends DELETE to dest on failure
- Dual-write ingestion wiring (P8-Feature) —
ClusterNode::ingest_tick()checksmigration_target()before routing; during partition migration, ticks are sent to both source and destination nodes to prevent data loss - Live rebalancing (P8-Feature) —
RebalanceManagerorchestrates zero-downtime partition migration on node add/remove. Background thread with pause/resume/cancel, checkpoint support, sequential move execution viaPartitionMigrator - Load-based auto-rebalancing (P8-Feature) —
RebalancePolicywith configurable imbalance ratio, check interval, and cooldown. Background policy thread monitors per-node partition counts viaLoadProvidercallback and auto-triggersstart_remove_node()on overloaded nodes - Rebalance admin HTTP API (P8-Feature) — 5 REST endpoints (
/admin/rebalance/{status,start,pause,resume,cancel}) for live rebalance control. Admin RBAC enforced, JSON request/response, 503 when not in cluster mode - Rebalance hardening:
peer_rpc_clients_thread safety (P8-Feature) —std::shared_mutexprotectspeer_rpc_clients_map inClusterNode.shared_lockfor reads inremote_ingest()hot path,unique_lockfor writes. Race-safe lazy client creation — 1 test - Rebalance hardening: move timeout (P8-Feature) —
move_timeout_secinRebalanceConfig(default 300s).PartitionMigrator::execute_move()wrapsmigrate_symbol()instd::async+wait_for. On timeout: FAILED + dual-write ended — 2 tests - Rebalance hardening: query routing safety (P8-Feature) —
recently_migrated_map inPartitionRouter. Afterend_migration(),recently_migrated(symbol)returns{from, to}during grace period (default 30s). Auto-expires. Query layer reads from both nodes during transition — 5 tests - Partial-move rebalance API (P8-Feature) —
start_move_partitions(vector<Move>)moves specific symbols between existing nodes without full drain. HTTPmove_partitionsaction in/admin/rebalance/start. No ring topology broadcast — 6 tests - Rebalance progress in Web UI (P8-Feature) — cluster dashboard panel showing live rebalance state, progress bar, completed/failed/total moves, current symbol. Auto-refreshes every 2s via
/admin/rebalance/status - Rebalance history endpoint (P8-Feature) —
GET /admin/rebalance/historyreturns past rebalance events (action, node, moves, duration, cancelled). In-memory ring buffer (max 50). Web UI history table on cluster dashboard — 5 tests - Rebalance ring broadcast (P8-Feature) —
RebalanceManagercallsRingConsensus::propose_add/remove()after all moves complete, synchronizing hash ring across all cluster nodes. Skipped on cancel.set_consensus()setter,RebalanceActionenum — 3 tests - Rebalance bandwidth throttling (P8-Feature) —
BandwidthThrottlerrate-limits partition migration data transfer. Configurablemax_bandwidth_mbps(0=unlimited). Sliding window with sleep-based backpressure. Thread-safe atomic counters — 10 tests - PTP clock sync detection (P8-Feature) —
PtpClockDetectorchecks PTP hardware/chrony/timesyncd synchronization quality. 4 states (SYNCED/DEGRADED/UNSYNC/UNAVAILABLE).strict_moderejects distributed ASOF JOIN on bad sync.GET /admin/clockendpoint — 22 tests - Rebalance bandwidth throttling (P8-Feature) —
BandwidthThrottlerrate-limits partition migration data transfer. Sliding 1-second window, thread-safe atomics, runtime adjustable viaset_max_bandwidth_mbps(). Wired intoPartitionMigrator::migrate_symbol(). Exposed in/admin/rebalance/statusJSON — 10 tests
Operations & Deployment
Section titled “Operations & Deployment”- Production operations — monitoring, backup, systemd service
- Kubernetes operations — Helm chart (PDB/HPA/ServiceMonitor), rolling upgrade, K8s operations guide, Karpenter Fleet API
- ARM Graviton build verification — aarch64 (Amazon Linux 2023, Clang 19.1.7), 766/766 tests passing, xbar 7.99ms (1M rows)
- Metrics provider — pluggable Prometheus metrics, Kafka stats integration — 4 tests
- Task scheduler — interval/once jobs, cancel, exception-safe, monotonic clock — 18 tests
- Multi-node metrics collection — METRICS_REQUEST/METRICS_RESULT RPC, parallel fan-out, ClusterNode callback registration — 10 tests
- HTTP observability — structured JSON access log, slow query log (>100ms), X-Request-Id tracing, server lifecycle events, Prometheus http_requests_total/active_sessions — 2 tests
-
/whoamiendpoint — returns authenticated role and subject for reliable client-side role detection — 1 test - Web UI cluster page — node status table, per-node metrics history charts (ingestion/queries/latency), recharts type fix
- API key granular control — symbol/table ACL, tenant binding, key expiry, PATCH update endpoint, Web UI create/edit dialogs — 6 tests
- Query Editor: resizable height (QE-10) — drag divider between editor and result area, 80–600px range, replaces fixed 180px
- Query Editor: schema sidebar (QE-6) — left panel with table/column tree, click to insert into editor, refresh button
- Query Editor: ZeptoDB function autocomplete (QE-7) —
xbar,vwap,ema,wma,mavg,msum,deltas,ratios,fills+ SQL keyword snippets (ASOF JOIN, EXPLAIN, etc.) - Query Editor: result chart view (QE-5) — table/chart toggle (line/bar), X/Y column selectors, Recharts, 500-row cap
- Query Editor: multi-tab editor (QE-1) — add/close/rename tabs, independent code & results per tab, localStorage persistence
- Query Editor: multi-statement run (QE-9) —
;-split sequential execution, per-statement result sub-tabs, per-statement error display - SSO/JWT CLI + JWKS auto-fetch —
--jwt-*/--jwks-urlCLI flags, JWKS background key rotation, kid-based multi-key,POST /admin/auth/reloadruntime refresh — 3 tests - Bare-metal tuning guide — CPU pinning, NUMA, hugepages, C-state, tcmalloc/LTO/PGO build, network tuning, benchmarking —
docs/deployment/BARE_METAL_TUNING.md
Migration Toolkit
Section titled “Migration Toolkit”- Migration toolkit — kdb+ HDB loader, q→SQL, ClickHouse DDL/query translation, DuckDB Parquet, TimescaleDB hypertable — 126 tests
Data Types
Section titled “Data Types”- Native float/double — IEEE 754 float32/float64 in storage, SQL, and HTTP output
- String symbol (dictionary-encoded) —
INSERT/SELECT/WHERE/GROUP BY/VWAP/FIRST/LASTwith'AAPL'syntax, LowCardinality dictionary encoding, distributed scatter-gather support — 29 tests
Connectivity
Section titled “Connectivity”- Arrow Flight server (P3) — gRPC-based Arrow Flight RPC: DoGet (SQL→RecordBatch stream), DoPut (ingest), GetFlightInfo, ListFlights, DoAction (ping/healthcheck). Python
pyarrow.flight.connect("grpc://host:8815")for remote zero-copy-grade streaming. Stub mode when built without Flight. — 7 tests
Documentation — Getting Started & Onboarding
Section titled “Documentation — Getting Started & Onboarding”- Quick Start Guide — 5-minute onboarding: Docker → INSERT → SELECT → Python → Web UI
- Interactive Playground design — Browser-based sandboxed SQL editor with preloaded datasets, session isolation, rate limiting
- Example Dataset Bundle design —
--demoflag: 350K rows (trades/quotes/sensors), deterministic generation, starter queries on stdout
Query Editor Enhancements (Phase 2)
Section titled “Query Editor Enhancements (Phase 2)”- Dark/light theme toggle (QE-11) — CodeMirror theme syncs with MUI palette mode (TopBar toggle)
- Result column sorting (QE-13) — click column header to cycle ASC/DESC/none, arrow indicators, numeric-aware sort
- Result column filtering (QE-14) — per-column text filter row (toggle via filter icon), case-insensitive, match count display
- Query history search & pin (QE-2) — search input in history panel, pin/unpin toggle, pinned items sorted to top, localStorage persistence
- Saved queries (QE-3) — name + save to localStorage, load/delete from Saved panel, separate from history
- Syntax error inline marker (QE-8) — parse error line from server response, highlight error line in CodeMirror with red decoration
- Query execution cancel (QE-12) — AbortController-based cancellation, Run button becomes Cancel while loading, abort signal passed to fetch
- Execution time history sparkline (QE-15) — SVG sparkline of last 20 query execution times, displayed in result header
- EXPLAIN visualization (QE-4) — EXPLAIN results rendered as visual tree with colored operation/path/table nodes (+ server fix: string_rows JSON serialization for EXPLAIN/DDL)
- Table detail page (
/tables/[name]) — dedicated route with schema, column stats (min/max), row count cards, data preview; tables list navigates on click - Settings page enhancement — server info section (engine version, build date, health status) alongside runtime config
- Login page polish — gradient accent, tagline chip, keyboard hint, Quick Start link, footer branding
Web UI — Dashboard & Overview (P1)
Section titled “Web UI — Dashboard & Overview (P1)”- Dashboard overview page — Health status, version info, 5 stat cards (ingested/stored/queries/partitions/latency), drop rate warning, ingestion rate live chart, tables summary with row counts, rows-per-table bar chart, avg query cost
- Cluster status dashboard — Node topology ring visualization, partition distribution pie chart, node health table with store ratio bars, ticks-stored bar chart, time-series charts (ingestion/queries/latency per node), drop rate alert
- Dashboard as default landing —
/redirects to/dashboard, Dashboard first in sidebar, visible to all roles (admin/writer/reader/analyst/metrics)
Bug Fixes
Section titled “Bug Fixes”- API client template literal fix — Fixed broken string literals in
api.tsandauth.tsx(backtick+double-quote mix fromAPIvariable introduction), all fetch URLs now use proper template literals with${API}prefix - API URL consistency — All API calls use configurable
APIbase path constant, supports both same-origin (Docker) and proxy (Next.js dev) modes
Website & Docs (P2)
Section titled “Website & Docs (P2)”- Docs site (docs.zeptodb.com) — mkdocs-material deployment
- Docs nav update — Added 40+ missing pages (devlog 024-040, Flight API, multinode_stability, etc.)
- Performance comparison page — vs kdb+/ClickHouse/TimescaleDB benchmark charts
SEO & Community (P2)
Section titled “SEO & Community (P2)”- SEO basics — sitemap, Open Graph, meta tags (mkdocs-material auto-generated)
- GitHub README renewal — badges with logos, architecture diagram, emoji sections, GIF demo placeholder, navigation links, community section, updated test count (830+)
- Community infrastructure — CONTRIBUTING.md, CODE_OF_CONDUCT.md, GitHub Issue templates (bug/feature/perf), FUNDING.yml
- Community setup guide — Discord server structure (channels/roles/bots), GitHub Discussions categories —
docs/community/COMMUNITY_SETUP.md - Registry submission content — Awesome Time-Series DB PR text, DB-Engines form data, DBDB/AlternativeTo/StackShare —
docs/community/REGISTRY_SUBMISSIONS.md - Launch post drafts — Show HN, Reddit (r/programming, r/cpp, r/algotrading, r/selfhosted), timing strategy, launch day checklist —
docs/community/LAUNCH_POSTS.md - Discord server created — Server ID 1492174712359354590, invite link https://discord.gg/zeptodb
- Discord links added to Web UI — Join Discord button on home page, Discord link in sidebar
SSO / Identity Enhancement (P6)
Section titled “SSO / Identity Enhancement (P6)”- OIDC Discovery —
OidcDiscovery::fetch(issuer_url)auto-populates jwks_uri, authorization/token endpoints from/.well-known/openid-configuration. AuthManager auto-registers IdP + JWT validator — 2 tests - Server-side sessions —
SessionStorewith cookie-based session management. Configurable TTL (1h default), sliding window refresh, HttpOnly/SameSite cookies.AuthManager::check_session()resolves cookie → AuthContext — 10 tests - Web UI SSO login flow — OAuth2 Authorization Code Flow:
/auth/login(redirect to IdP),/auth/callback(code exchange → session cookie → redirect),/auth/session(Bearer → session),/auth/logout,/auth/me. Web UI “Sign in with SSO” button enabled, session-aware auth provider — 3 tests - JWT Refresh Token —
OAuth2TokenExchange::refresh()exchanges refresh_token for new access_token.POST /auth/refreshserver endpoint. Session store tracks refresh_token per session. Web UIuseAuth().refresh()hook — 4 tests
Engine Performance (P7 Tier A)
Section titled “Engine Performance (P7 Tier A)”- INTERVAL syntax —
INTERVAL 'N unit'in SELECT and WHERE expressions. Supports seconds/minutes/hours/days/weeks/ms/μs/ns. Evaluates to nanoseconds. Works withNOW() - INTERVAL '5 minutes'in WHERE clauses — 3 tests - Prepared statement cache — Parsed AST cached by SQL hash (up to 4096 entries). Eliminates tokenize+parse overhead (~2-5μs) on repeated queries. Thread-safe with
clear_prepared_cache()API — 1 test - Query result cache — TTL-based result cache for SELECT queries.
enable_result_cache(max_entries, ttl_seconds). Auto-invalidated on INSERT/UPDATE/DELETE. Oldest-entry eviction when full — 2 tests - SAMPLE clause —
SELECT * FROM trades SAMPLE 0.1reads ~10% of rows. Deterministic hash-based sampling (splitmix64) for reproducible results. Works with WHERE, GROUP BY, aggregation. Shown in EXPLAIN plan — 8 tests - Scalar subqueries in WHERE —
WHERE price > (SELECT avg(price) FROM trades)andWHERE symbol IN (SELECT symbol FROM ...). Uncorrelated subqueries evaluated once and substituted as literals before outer scan. IN results auto-deduplicated. Error on multi-row/multi-column scalar subqueries — 8 tests
Package Distribution (P2)
Section titled “Package Distribution (P2)”- Docker Hub official image —
docker pull zeptodb/zeptodb:0.0.1. GitHub Actions workflow (docker-publish.yml) builds on tag push (v*) or manual dispatch. Multi-stage build, non-root user, health check endpoint - GitHub Releases + binaries — Release workflow builds amd64 + arm64 tarballs, creates GitHub Release with download links on tag push
- Homebrew Formula —
homebrew-taprepo with auto-update workflow triggered on release via repository_dispatch
CI/CD (P2)
Section titled “CI/CD (P2)”- Node.js 24 migration — All workflows set
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=trueto preempt June 2026 deprecation - Deprecated docs.yml cleanup — Removed legacy MkDocs deploy workflow (replaced by Astro Starlight site)
- TestPyPI workflow fix — Changed
test-pypi.ymlto target TestPyPI (test.pypi.org) with separatetestpypienvironment
Website (P2)
Section titled “Website (P2)”- Product website — Astro Starlight site (
zeptodb-site/). Landing page with hero + benchmark comparison table + use case cards + CTA - Features page — Ingestion engine, query engine, storage, client APIs, security, clustering, deployment
- Benchmarks page — Hardware specs, ingestion throughput, query latency, Python zero-copy numbers
- Use Cases (4 pages) — Trading & Finance, IoT, Robotics, Autonomous Vehicles with architecture diagrams and SQL examples
- Competitor comparisons (4 pages) — vs kdb+, vs ClickHouse, vs InfluxDB, vs TimescaleDB
- Pricing page — Community (Free/OSS) vs Enterprise tiers with FAQ
- Blog (4 posts) — Introducing ZeptoDB, How ASOF JOIN Works, Zero-Copy Python (522ns), Lock-Free Ingestion (5.52M/sec)
- About / Contact / Community pages — Mission, tech philosophy, contributing guide, roadmap
- Security page — TLS, Auth, RBAC, Rate Limiting, Audit, Compliance matrix (SOC2/MiFID II/GDPR/PCI)
- Integrations page — Feed handlers, client libraries, monitoring, storage/cloud, auth providers, roadmap integrations
- Docs site deployment automation — GitHub Actions
build-deploy.yml(push + repository_dispatch),sync-docs.mjsfor ZeptoDB docs sync - Custom header navigation — Product/Solutions/Docs/Pricing/Community top nav with GitHub Stars badge
Kubernetes Compatibility & HA Testing
Section titled “Kubernetes Compatibility & HA Testing”- K8s compatibility test suite — 27 automated tests covering Helm lint/template, pod lifecycle, networking, rolling updates, PDB, scale up/down (
tests/k8s/test_k8s_compat.py) - K8s HA + performance test suite — 6 HA tests (3-node spread, node drain, concurrent drain PDB block, pod kill recovery, zero-downtime rolling update, scale 3→5→3) + 5 performance benchmarks (
tests/k8s/test_k8s_ha_perf.py) - EKS test cluster config — Lightweight cluster definition for automated testing (
tests/k8s/eks-compat-cluster.yaml) - K8s test report — Full results, benchmark numbers, Helm chart issues found (
docs/operations/K8S_TEST_REPORT.md)
Live Rebalancing Load Test
Section titled “Live Rebalancing Load Test”- bench_rebalance binary — HTTP-based load test measuring rebalance impact on throughput/latency (
tests/bench/bench_rebalance.cpp) - Helm rebalance config — bench-rebalance-values.yaml with RebalanceManager enabled (
deploy/helm/bench-rebalance-values.yaml) - Orchestration script — Automated test execution on EKS (
deploy/scripts/run_rebalance_bench.sh) - Benchmark guide — Prerequisites, execution, expected results, cost estimate (
docs/bench/rebalance_benchmark_guide.md)
Client API Compatibility Matrix:
docs/design/client_compatibility.md❌ | ❌ | ✅/admin/audit| ❌ | ❌ | ✅/admin/audit| ❌ | ❌ | ✅/admin/audit|