ZeptoDB Kubernetes Operations Guide
Last updated: 2026-04-30
Table of Contents
Section titled “Table of Contents”- Architecture Overview
- Initial Deployment
- Day-2 Operations
- Monitoring & Alerting
- Scaling
- Backup & Recovery
- Upgrades & Rollback
- Security
- Cluster Mode
- Troubleshooting
- Runbooks
See also: Failure Scenarios & Recovery Guide — Automatic/manual recovery procedures for 8 failure scenarios
1. Architecture Overview
Section titled “1. Architecture Overview”┌─────────────────────────────────────────────────────────────┐│ Kubernetes Cluster ││ ││ ┌──────────────────────────────────────────────────────┐ ││ │ Namespace: zeptodb │ ││ │ │ ││ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ ││ │ │ Pod-0 │ │ Pod-1 │ │ Pod-2 │ ← Deployment│ ││ │ │ ZeptoDB │ │ ZeptoDB │ │ ZeptoDB │ (3 replicas)│ ││ │ │ :8123 │ │ :8123 │ │ :8123 │ │ ││ │ └────┬────┘ └────┬────┘ └────┬────┘ │ ││ │ │ │ │ │ ││ │ ┌────┴─────────────┴────────────┴────┐ │ ││ │ │ Service (LoadBalancer :8123) │ │ ││ │ │ + Headless Service (pod discovery) │ │ ││ │ └────────────────────────────────────┘ │ ││ │ │ ││ │ ConfigMap │ PVC (gp3 500Gi) │ PDB │ HPA │ ServiceMon│ ││ └──────────────────────────────────────────────────────┘ ││ ││ ┌──────────────────────┐ ┌─────────────────────────────┐ ││ │ Prometheus │ │ Grafana │ ││ │ ServiceMonitor 15s │ │ Dashboard + 9 Alert Rules │ ││ └──────────────────────┘ └─────────────────────────────┘ │└─────────────────────────────────────────────────────────────┘Helm Chart Components
Section titled “Helm Chart Components”| Resource | Template | Purpose |
|---|---|---|
| Deployment | deployment.yaml | ZeptoDB pods (rolling update) |
| Service | service.yaml | LoadBalancer + Headless |
| ConfigMap | configmap.yaml | zeptodb.conf |
| PVC | pvc.yaml | gp3 500Gi persistent storage |
| HPA | hpa.yaml | Auto-scaling (3–10 replicas) |
| PDB | pdb.yaml | minAvailable: 2 |
| ServiceMonitor | servicemonitor.yaml | Prometheus scrape config |
2. Initial Deployment
Section titled “2. Initial Deployment”Prerequisites
Section titled “Prerequisites”# Requiredkubectl version --client # 1.26+helm version # 3.x
# Verify cluster accesskubectl cluster-infokubectl get nodesDeploy with Helm (Recommended)
Section titled “Deploy with Helm (Recommended)”# Create namespacekubectl create namespace zeptodb
# Installhelm install zeptodb ./deploy/helm/zeptodb \ -n zeptodb \ --set image.repository=your-registry/zeptodb \ --set image.tag=1.0.0
# Verifykubectl get all -n zeptodbProduction values override
Section titled “Production values override”values-prod.yaml:
replicaCount: 3
image: repository: your-registry/zeptodb tag: "1.0.0"
resources: requests: cpu: "4" memory: "16Gi" limits: cpu: "8" memory: "32Gi"
persistence: storageClass: gp3 size: 500Gi
config: workerThreads: 8 parallelThreshold: 100000
autoscaling: enabled: true minReplicas: 3 maxReplicas: 10
podDisruptionBudget: enabled: true minAvailable: 2
# Graviton (ARM) nodesnodeSelector: kubernetes.io/arch: arm64 # or for x86: # kubernetes.io/arch: amd64helm install zeptodb ./deploy/helm/zeptodb \ -n zeptodb \ -f values-prod.yaml \ --wait --timeout 5mPost-Deploy Verification
Section titled “Post-Deploy Verification”# All pods runningkubectl get pods -n zeptodb -o wide
# Health checkexport LB=$(kubectl get svc zeptodb -n zeptodb \ -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')curl -s http://$LB:8123/healthcurl -s http://$LB:8123/ready
# Test querycurl -X POST http://$LB:8123/ -d 'SELECT 1'3. Day-2 Operations
Section titled “3. Day-2 Operations”Daily Checks
Section titled “Daily Checks”#!/bin/bash# daily-check.sh — run from cron or manually
NS=zeptodbLB=$(kubectl get svc zeptodb -n $NS \ -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
echo "=== Pod Status ==="kubectl get pods -n $NS -o wide
echo "=== Health ==="curl -sf http://$LB:8123/health && echo " OK" || echo " FAIL"
echo "=== Readiness ==="curl -sf http://$LB:8123/ready && echo " OK" || echo " FAIL"
echo "=== HPA ==="kubectl get hpa -n $NS
echo "=== PVC ==="kubectl get pvc -n $NS
echo "=== Recent Events ==="kubectl get events -n $NS --sort-by='.lastTimestamp' | tail -10Configuration Changes
Section titled “Configuration Changes”When a ConfigMap is changed, the checksum/config annotation automatically triggers a rollout.
# Change worker threadshelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set config.workerThreads=16 \ --wait
# Change multiple settings at oncehelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ -f values-prod.yaml \ --set config.workerThreads=16 \ --set config.queryCacheSize=2000 \ --waitChecking Logs
Section titled “Checking Logs”# Logs for a specific podkubectl logs -f <pod-name> -n zeptodb
# Logs for all pods (stern recommended)stern zeptodb -n zeptodb
# Previous crash logskubectl logs <pod-name> -n zeptodb --previous
# Logs since a specific timekubectl logs <pod-name> -n zeptodb --since=1hPod Restart
Section titled “Pod Restart”# Full rolling restart (zero-downtime)kubectl rollout restart deployment/zeptodb -n zeptodb
# Delete a specific pod only (Deployment auto-recreates it)kubectl delete pod <pod-name> -n zeptodb4. Monitoring & Alerting
Section titled “4. Monitoring & Alerting”Prometheus Setup
Section titled “Prometheus Setup”# Enable ServiceMonitor (requires Prometheus Operator)helm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set serviceMonitor.enabled=true \ --set serviceMonitor.interval=15sIn environments without ServiceMonitor, use Pod annotation-based scraping:
# Already included in deployment.yamlannotations: prometheus.io/scrape: "true" prometheus.io/port: "8123" prometheus.io/path: "/metrics"Key Metrics
Section titled “Key Metrics”# Check directlycurl -s http://$LB:8123/metrics| Metric | Type | Alert Threshold |
|---|---|---|
zepto_server_up | gauge | == 0 → critical |
zepto_server_ready | gauge | == 0 for 5m → warning |
zepto_ticks_ingested_total | counter | rate < 1000/s → warning |
zepto_ticks_dropped_total | counter | rate > 1000/s → warning |
zepto_queries_executed_total | counter | rate > 100/s → info |
zepto_rows_scanned_total | counter | rate > 10M/s → warning |
Alert Rules (9 rules)
Section titled “Alert Rules (9 rules)”Defined in monitoring/zeptodb-alerts.yml:
| Alert | Severity | Condition |
|---|---|---|
| ApexDBDown | critical | zepto_server_up == 0 for 1m |
| ApexDBNotReady | warning | zepto_server_ready == 0 for 5m |
| HighTickDropRate | warning | drop rate > 1000/s for 2m |
| HighQueryRate | info | query rate > 100/s for 5m |
| HighRowScanRate | warning | scan rate > 10M/s for 5m |
| LowIngestionRate | warning | ingestion < 1000/s for 10m |
| HighDiskUsage | warning | disk < 20% free for 5m |
| HighMemoryUsage | warning | memory < 10% free for 5m |
| HighCPUUsage | warning | CPU > 90% for 10m |
Grafana Dashboard
Section titled “Grafana Dashboard”# Import dashboardkubectl create configmap grafana-zeptodb \ -n monitoring \ --from-file=monitoring/grafana-dashboard.json
# Or import via Grafana UI → Import → monitoring/grafana-dashboard.jsonGrafana can connect directly as a ClickHouse data source (port 8123, ClickHouse compatible API).
5. Scaling
Section titled “5. Scaling”Cluster Requirements
Section titled “Cluster Requirements”See EKS Cluster Requirements for full cluster setup including K8s version, Auto Mode, and custom NodePool configuration.
EKS Auto Mode (Node Auto-Scaling)
Section titled “EKS Auto Mode (Node Auto-Scaling)”EKS Auto Mode includes built-in Karpenter — no separate install needed. Nodes are provisioned via EC2 Fleet API when pods are pending.
# Check node pools (built-in + custom)kubectl get nodepoolskubectl get nodeclasses
# Check node claims (active nodes)kubectl get nodeclaims
# Monitor scaling eventskubectl describe nodepool zepto-realtimekubectl describe nodepool zepto-analyticsTwo custom node pools are configured:
| Pool | Trigger | Capacity | Consolidation |
|---|---|---|---|
| zepto-realtime | Pending pods with zeptodb.com/role: realtime | On-Demand only | WhenEmpty, after 30m |
| zepto-analytics | Pending pods with zeptodb.com/role: analytics | Spot + On-Demand | WhenEmptyOrUnderutilized, after 5m |
Scaling flow: HPA increases replicas → pods pending → Auto Mode provisions node (30-60s) → pods scheduled.
Horizontal Pod Autoscaler (HPA)
Section titled “Horizontal Pod Autoscaler (HPA)”Default configuration: Auto-scales between 3–10 replicas based on CPU 70% / Memory 80% thresholds.
# Check HPA statuskubectl get hpa -n zeptodbkubectl describe hpa zeptodb -n zeptodb
# Manual scalekubectl scale deployment zeptodb -n zeptodb --replicas=5
# Change HPA settingshelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set autoscaling.minReplicas=5 \ --set autoscaling.maxReplicas=20 \ --set autoscaling.targetCPU=60Ingest-rate HPA (P8-I4, devlog 117)
Section titled “Ingest-rate HPA (P8-I4, devlog 117)”For production ingest workloads, CPU/memory utilization is a poor proxy:
a pod can be CPU-idle while its ring buffer saturates, or CPU-busy on
queries while ingest is light. ZeptoDB exposes
zepto_ingest_ticks_per_sec on GET /metrics (a per-pod gauge of the
instantaneous ingest rate) so the HPA can autoscale on real ingest load.
CPU/memory metrics remain configured on the same HPA as a safety net.
Prerequisites. The custom Pods metric requires
prometheus-adapter
to expose zepto_ingest_ticks_per_sec as pods/zepto_ingest_ticks_per_sec.
A minimal rule snippet:
# prometheus-adapter ConfigMaprules: - seriesQuery: 'zepto_ingest_ticks_per_sec{namespace!="",pod!=""}' resources: overrides: namespace: { resource: namespace } pod: { resource: pod } name: matches: "^(.*)$" as: "$1" metricsQuery: | avg_over_time(<<.Series>>{<<.LabelMatchers>>}[1m])Enable on the chart. Off by default; set both flags on helm upgrade:
helm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set autoscaling.ingestRateEnabled=true \ --set autoscaling.targetIngestRate=50000 # ticks/sec per podtargetIngestRate is AverageValue for the HPA Pods metric — scale-out
is triggered when the per-pod 1-minute average ingest rate exceeds it.
Tune to ~70–80% of a single pod’s measured sustained ingest ceiling
(see docs/devlog/102_ingest_scale_phase1.md for the underlying
ingest-path tunables and pipeline.drainThreads /
pipeline.ringBufferCapacity).
Karpenter compatibility. No special config required — the standard HPA → pending pods → Karpenter scale-out path works unchanged. When the ingest-rate metric pushes HPA above currently-scheduled capacity, Karpenter provisions a new node from the realtime pool in the usual 30–60s.
Scale-Down Protection
Section titled “Scale-Down Protection”# Already configured in values.yamlautoscaling: scaleDown: stabilizationSeconds: 300 # Scale down after 5-minute stabilization scaleUp: stabilizationSeconds: 60 # Scale up after 1-minute stabilizationVertical Scaling
Section titled “Vertical Scaling”helm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set resources.requests.cpu=8 \ --set resources.requests.memory=32Gi \ --set resources.limits.cpu=16 \ --set resources.limits.memory=64Gi \ --waitNode Selection (Graviton / x86)
Section titled “Node Selection (Graviton / x86)”# Graviton (ARM) nodeshelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set nodeSelector."kubernetes\.io/arch"=arm64
# Dedicated instance typehelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set nodeSelector."node\.kubernetes\.io/instance-type"=c7g.4xlargeVertical ingest tuning (devlog 102)
Section titled “Vertical ingest tuning (devlog 102)”Phase 1 of the ingest scale-out plan exposes two single-pod ingest
knobs via Helm. Both default to 0 (engine default) so existing
charts are unchanged.
| Helm value | Engine default (when 0) | Purpose |
|---|---|---|
pipeline.drainThreads | max(2, hw_concurrency() / 4) | Number of drain threads moving ticks from TickPlant → storage. Lock-free MPMC → scales near-linearly. |
pipeline.ringBufferCapacity | 65536 slots | TickPlant ring-buffer size. Absorbs ingest bursts before the synchronous store_tick() fallback (~34× slower) kicks in. Must be a power of two in [4096, 16777216]. |
When to tune
Section titled “When to tune”| Workload | Tags × rate | pipeline.drainThreads | pipeline.ringBufferCapacity |
|---|---|---|---|
| IoT pilot | 1 k × 1 Hz | 0 (auto) | 65536 (default) |
| Auto factory | 5 k × 100 Hz | 4 | 262144 |
| Semi fab (CMP burst) | 30 k × 10 kHz | 8 | 1048576 |
# Raise both for a CMP-burst semi-fab workloadhelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set pipeline.drainThreads=8 \ --set pipeline.ringBufferCapacity=1048576How to observe
Section titled “How to observe”Both effective values are emitted on ZeptoPipeline::start():
kubectl logs -n zeptodb zeptodb-0 | grep "drain_threads="# [info] ZeptoPipeline 시작 완료 (drain_threads=8, ring_capacity=1048576)Also exposed as pod env vars for K8s-side inspection:
ZEPTO_DRAIN_THREADS, ZEPTO_RING_BUFFER_CAPACITY.
When TickPlant queue full! Dropping tick seq=… appears in logs,
raise pipeline.ringBufferCapacity first (power of two), then
pipeline.drainThreads. A non-power-of-two or out-of-range capacity
causes the pod to fail fast at startup with a clear
std::invalid_argument in the crash log.
This is a single-pod vertical scaling knob. Horizontal scale-out
— a stateless zepto_ingest_node tier plus an ingest-rate HPA — is
Phase 2 and is tracked in docs/BACKLOG.md under P8 — Cluster.
Sizing and placement for enterprise factory workloads
Section titled “Sizing and placement for enterprise factory workloads”Horizontal scale-out only delivers linear ingest gain when each replica lands on a
distinct node. The Helm chart defaults enforce this via hard podAntiAffinity plus
topologySpread; see docs/devlog/104_pod_placement_hardening.md for the root-cause
analysis. Use this table as a starting point and right-size per sector.
| Sector | Replicas | Nodes | resources.requests (cpu / memory) | podAntiAffinity.required |
|---|---|---|---|---|
| Dev / sandbox | 2 | 1 is OK | 1c / 2 Gi | false |
| Small IoT pilot | 3 | 3 | 2c / 4 Gi (default) | true |
| Auto factory | 5 | 5 | 4c / 8 Gi | true |
| Semi fab (CMP / lithography) | 10 | 10 | 8c / 16 Gi | true |
Why required: true is the production default
Section titled “Why required: true is the production default”A soft preferredDuringSchedulingIgnoredDuringExecution lets Kubernetes co-locate
two pods on the same node as soon as HPA scales replicas > nodes. When that happens
the two ZeptoDB processes fight for the same CPU, halving ingest throughput, and a
single node failure takes down both replicas at once — breaking the scale-out
guarantee silently.
Flip to required: false only on dev clusters where a tight fixed node count makes
co-location acceptable (e.g. a 3-replica chart on a 1-node kind cluster). Everywhere
else, leave it on and rely on EKS Auto Mode / Karpenter to provision the Nth node
(typically 30–60 s per §5 above) when a hard antiAffinity leaves a pod Pending.
Why topologySpread + maxSkew: 1 alongside
Section titled “Why topologySpread + maxSkew: 1 alongside”required: true alone refuses to schedule extras beyond the current node count.
topologySpreadConstraints with maxSkew: 1 is the smarter complement when
replicas > nodes is a legitimate transient state (brief HPA spike ahead of node
provision, planned drain): it spreads pods as evenly as possible across hostnames
and still allows more replicas than nodes. Set
topologySpread.whenUnsatisfiable: ScheduleAnyway if you want the spread hint
without the scheduling block.
Resource sizing rules of thumb
Section titled “Resource sizing rules of thumb”- CPU request = 1 core for HTTP/RPC +
pipeline.drainThreadscores for ingest draining. IfdrainThreadsis0(auto), the engine picksmax(2, hw_concurrency / 4), so plan for 1 + 2 = 3 cores minimum on any node smaller than 16 vCPU. The 2c/4c default covers a 2-core drain pool plus HTTP/RPC, sized for ~200K–500K ticks/s. - Memory request = ~100 MB baseline + 32 MB per active arena +
ringBufferCapacity × 64bytes for theTickPlantring. Example: 200 active partitions + 1 M-slot ring ≈ 100 MB + 6.4 GB + 64 MB ≈ 6.6 GB — round up to 8 GB limit for headroom. Raiselimits.memorybefore raisingpipeline.ringBufferCapacity. - Bare-metal trading (Guaranteed QoS) — pin
requests.cpu == limits.cpuandrequests.memory == limits.memoryin an overlay. Keephugepages-2Mion both sides to retain HugePages reservation.
# Auto-factory profile (5 replicas × 5 nodes × 4c/8G)helm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set replicaCount=5 \ --set autoscaling.minReplicas=5 \ --set resources.requests.cpu=4000m \ --set resources.requests.memory=8Gi \ --set resources.limits.cpu=8000m \ --set resources.limits.memory=16Gi
# Dev overlay (2 replicas on 1 node, co-location OK)helm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set replicaCount=2 \ --set podAntiAffinity.required=false \ --set resources.requests.cpu=1000m \ --set resources.requests.memory=2Gi6. Backup & Recovery
Section titled “6. Backup & Recovery”In-Cluster Backup (CronJob)
Section titled “In-Cluster Backup (CronJob)”apiVersion: batch/v1kind: CronJobmetadata: name: zeptodb-backup namespace: zeptodbspec: schedule: "0 2 * * *" # Daily at 02:00 UTC concurrencyPolicy: Forbid jobTemplate: spec: template: spec: restartPolicy: OnFailure containers: - name: backup image: amazon/aws-cli:latest env: - name: S3_BUCKET value: "your-zeptodb-backups" - name: DATA_DIR value: "/opt/zeptodb/data" command: - /bin/sh - -c - | TIMESTAMP=$(date +%Y%m%d_%H%M%S) tar -czf /tmp/zeptodb-${TIMESTAMP}.tar.gz -C ${DATA_DIR} . aws s3 cp /tmp/zeptodb-${TIMESTAMP}.tar.gz \ s3://${S3_BUCKET}/backups/zeptodb-${TIMESTAMP}.tar.gz \ --storage-class STANDARD_IA echo "Backup completed: zeptodb-${TIMESTAMP}.tar.gz" volumeMounts: - name: data mountPath: /opt/zeptodb/data readOnly: true volumes: - name: data persistentVolumeClaim: claimName: zeptodb-datakubectl apply -f deploy/k8s/backup-cronjob.yaml
# Trigger manual backupkubectl create job --from=cronjob/zeptodb-backup zeptodb-backup-manual -n zeptodb
# Check backup statuskubectl get jobs -n zeptodbkubectl logs job/zeptodb-backup-manual -n zeptodbPVC Snapshot (EBS)
Section titled “PVC Snapshot (EBS)”# VolumeSnapshot (requires CSI driver)cat <<EOF | kubectl apply -f -apiVersion: snapshot.storage.k8s.io/v1kind: VolumeSnapshotmetadata: name: zeptodb-snap-$(date +%Y%m%d) namespace: zeptodbspec: volumeSnapshotClassName: ebs-csi-snapclass source: persistentVolumeClaimName: zeptodb-dataEOF
# Verify snapshotkubectl get volumesnapshot -n zeptodbRecovery from Snapshot
Section titled “Recovery from Snapshot”# Create new PVC from snapshotcat <<EOF | kubectl apply -f -apiVersion: v1kind: PersistentVolumeClaimmetadata: name: zeptodb-data-restored namespace: zeptodbspec: accessModes: [ReadWriteOnce] storageClassName: gp3 resources: requests: storage: 500Gi dataSource: name: zeptodb-snap-20260324 kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.ioEOF
# Replace PVC in Deploymenthelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set persistence.existingClaim=zeptodb-data-restored \ --wait7. Upgrades & Rollback
Section titled “7. Upgrades & Rollback”For details: Rolling Upgrade Guide
Standard Upgrade
Section titled “Standard Upgrade”# 1. Pre-flightkubectl get pods -n zeptodb -o widecurl -s http://$LB:8123/health
# 2. Upgradehelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set image.tag=1.1.0 \ --wait --timeout 5m
# 3. Monitorkubectl rollout status deployment/zeptodb -n zeptodb
# 4. Verifycurl -s http://$LB:8123/healthcurl -X POST http://$LB:8123/ -d 'SELECT 1'Zero-Downtime Guarantee Mechanisms
Section titled “Zero-Downtime Guarantee Mechanisms”| Setting | Value | Effect |
|---|---|---|
maxSurge | 1 | Create 1 new pod first |
maxUnavailable | 0 | Maintain existing pod count |
PDB minAvailable | 2 | Guarantee minimum 2 pods |
preStop sleep | 15s | Wait for in-flight queries to complete |
readinessProbe | /ready | Only ready pods receive traffic |
Rollback
Section titled “Rollback”# Immediate rollbackhelm rollback zeptodb -n zeptodb
# Rollback to a specific revisionhelm history zeptodb -n zeptodbhelm rollback zeptodb <REVISION> -n zeptodb
# kubectl rollback (without Helm)kubectl rollout undo deployment/zeptodb -n zeptodbCanary Deployment
Section titled “Canary Deployment”# 1. Canary deployment (1 replica)helm install zeptodb-canary ./deploy/helm/zeptodb -n zeptodb \ --set replicaCount=1 \ --set image.tag=2.0.0 \ --set service.type=ClusterIP \ --set autoscaling.enabled=false \ --set podDisruptionBudget.enabled=false
# 2. Canary testingkubectl port-forward svc/zeptodb-canary 8124:8123 -n zeptodbcurl -X POST http://localhost:8124/ -d 'SELECT vwap(price, volume) FROM trades WHERE symbol = 1'
# 3a. Success → promotehelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb --set image.tag=2.0.0 --waithelm uninstall zeptodb-canary -n zeptodb
# 3b. Failure → removehelm uninstall zeptodb-canary -n zeptodb8. Security
Section titled “8. Security”TLS Termination
Section titled “TLS Termination”# Create TLS Secretkubectl create secret tls zeptodb-tls \ -n zeptodb \ --cert=/path/to/cert.pem \ --key=/path/to/key.pem
# Ingress with TLScat <<EOF | kubectl apply -f -apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: zeptodb namespace: zeptodb annotations: nginx.ingress.kubernetes.io/backend-protocol: "HTTP"spec: tls: - hosts: - zeptodb.example.com secretName: zeptodb-tls rules: - host: zeptodb.example.com http: paths: - path: / pathType: Prefix backend: service: name: zeptodb port: number: 8123EOFAPI Key / JWT Secrets
Section titled “API Key / JWT Secrets”# API keys filekubectl create secret generic zeptodb-auth \ -n zeptodb \ --from-file=keys.txt=/path/to/keys.txt
# JWT secretkubectl create secret generic zeptodb-jwt \ -n zeptodb \ --from-literal=JWT_SECRET='your-jwt-secret'
# Vault integration (Secrets Store CSI)# → SecretsProvider chain: Vault KV v2 → K8s file → env varNetwork Policy
Section titled “Network Policy”# Allow access only from same namespace + monitoringapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: zeptodb-netpol namespace: zeptodbspec: podSelector: matchLabels: app.kubernetes.io/name: zeptodb policyTypes: [Ingress] ingress: - from: - namespaceSelector: matchLabels: name: zeptodb - namespaceSelector: matchLabels: name: monitoring ports: - port: 8123 protocol: TCPRBAC (Kubernetes)
Section titled “RBAC (Kubernetes)”# Role for operatorsapiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: zeptodb-operator namespace: zeptodbrules:- apiGroups: ["", "apps", "autoscaling"] resources: ["pods", "deployments", "services", "configmaps", "hpa"] verbs: ["get", "list", "watch", "update", "patch"]- apiGroups: [""] resources: ["pods/log", "pods/exec"] verbs: ["get", "create"]9. Cluster Mode
Section titled “9. Cluster Mode”For operating a ZeptoDB distributed cluster on Kubernetes.
Enable Cluster
Section titled “Enable Cluster”helm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set cluster.enabled=true \ --set cluster.rpcPortOffset=100 \ --set cluster.heartbeatPort=9100 \ --set headless.enabled=trueDirect pod-to-pod communication via Headless Service:
- RPC:
<pod-name>.zeptodb-headless.zeptodb.svc:8223 - Heartbeat: UDP
:9100
Write-path routing (devlog 111)
Section titled “Write-path routing (devlog 111)”When cluster mode is enabled, every pod automatically wires a
CoordinatorRoutingAdapter so that HTTP/SQL INSERT statements and
Python Pipeline.ingest_* calls are routed to the partition owner via
the PartitionRouter consistent-hash ring. Without this wire-up (the
state before devlog 111), writes would land on whichever pod the Service
LoadBalancer happened to pick, silently mis-partitioning data.
Verify the routing is live by checking any pod’s startup log:
kubectl logs -n zeptodb zeptodb-0 | grep -E 'Cluster routing|Peer RPC'# Expected output:# Peer RPC server: port 8223# Cluster routing: enabled (N remote nodes)Feed consumers (KafkaConsumer, MqttConsumer, OpcUaConsumer)
route through their own set_routing() hook, bypassing the HTTP LB
entirely — use them as the primary ingest path for production multi-pod
deployments.
DDL replication (devlog 112). CREATE / DROP / ALTER TABLE sent to
any pod is fire-and-forget replicated to every remote pod via
QueryCoordinator::forward_ddl_to_remotes. Per-remote failures emit
ZEPTO_WARN but never fail the client request, so operators should
still pre-provision critical tables at deploy time if a pod might be
unreachable at DDL time.
Stateless ingest tier (optional)
Section titled “Stateless ingest tier (optional)”For workloads where ingest load scales independently of query/storage
load, deploy a dedicated stateless ingest tier (P8-I3, devlog 113).
Each ingest pod runs the zepto_ingest_node binary, holds zero data,
and forwards every HTTP INSERT to the correct storage pod via
CoordinatorRoutingAdapter (same routing path as devlog 111).
Topology:
clients ──► ingest Service (ClusterIP) ──► N × zepto_ingest_node pods │ (owns no data, │ node_id=99999) ▼ TCP RPC fan-out ▼ storage StatefulSet (zeptodb-N) — owns partitions, runs queriesEnable via Helm (opt-in):
helm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set ingest.enabled=true \ --set ingest.replicas=3 \ --set-string 'ingest.extraArgs={--add-node,0:zeptodb-0.zeptodb-headless:8123,--add-node,1:zeptodb-1.zeptodb-headless:8123,--add-node,2:zeptodb-2.zeptodb-headless:8123}'Notes:
- Storage-pod discovery is currently manual via
ingest.extraArgs. A future init container will generate--add-nodeflags from the headless service automatically. ingest.noAuth: trueis the default — put the ingest Service behind an auth-enforcing ingress if you expose it outside the cluster.- The ingest tier can be scaled with its own HPA independently of the
storage StatefulSet. Ingest-rate HPA on
zepto_pipeline_ticks_per_secis tracked as BACKLOG P8-I4. - DDL (
CREATE / DROP / ALTER TABLE) sent to an ingest pod replicates to every storage pod automatically via devlog 112’sforward_ddl_to_remotes.
Cluster Health
Section titled “Cluster Health”# Check cluster status for each podfor pod in $(kubectl get pods -n zeptodb -l app.kubernetes.io/name=zeptodb -o name); do echo "--- $pod ---" kubectl exec -n zeptodb $pod -- curl -s http://localhost:8123/health echodoneCluster Upgrade Considerations
Section titled “Cluster Upgrade Considerations”- CoordinatorHA handles automatic re-registration
- FencingToken prevents split-brain
- Increase
gracefulShutdowntime during upgrades to ensure WAL flush
helm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set image.tag=1.1.0 \ --set gracefulShutdown.preStopSleepSeconds=30 \ --set gracefulShutdown.terminationGracePeriodSeconds=60 \ --wait --timeout 10m10. Troubleshooting
Section titled “10. Troubleshooting”Pod Fails to Start
Section titled “Pod Fails to Start”# Check statuskubectl describe pod <pod> -n zeptodb
# Common causes:# - ImagePullBackOff → Check image path/authentication# - Pending → Insufficient resources (kubectl describe node)# - CrashLoopBackOff → Check logs (kubectl logs --previous)Readiness Probe Failure
Section titled “Readiness Probe Failure”kubectl logs <pod> -n zeptodb | grep -i "error\|fail\|ready"
# Check directly from inside the podkubectl exec -n zeptodb <pod> -- curl -s http://localhost:8123/readyPVC Not Bound
Section titled “PVC Not Bound”kubectl describe pvc zeptodb-data -n zeptodb
# Check StorageClasskubectl get sc# If gp3 StorageClass does not exist, it needs to be createdOOMKilled
Section titled “OOMKilled”# Check memory usagekubectl top pods -n zeptodb
# Increase limitshelm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb \ --set resources.limits.memory=64Gi \ --waitSlow Queries
Section titled “Slow Queries”# Check query plan with EXPLAINcurl -X POST http://$LB:8123/ -d 'EXPLAIN SELECT ...'
# Check running queries via Admin APIcurl -H "Authorization: Bearer $ADMIN_KEY" http://$LB:8123/admin/queries
# Kill slow querycurl -X DELETE -H "Authorization: Bearer $ADMIN_KEY" \ http://$LB:8123/admin/queries/<query-id>HPA Not Scaling
Section titled “HPA Not Scaling”kubectl describe hpa zeptodb -n zeptodb
# Check metrics-serverkubectl top pods -n zeptodb# "error: Metrics API not available" → metrics-server needs to be installed11. Runbooks
Section titled “11. Runbooks”Runbook: Emergency Restart
Section titled “Runbook: Emergency Restart”# 1. Record current statekubectl get pods -n zeptodb -o wide > /tmp/zeptodb-state.txt
# 2. Rolling restart (zero-downtime)kubectl rollout restart deployment/zeptodb -n zeptodbkubectl rollout status deployment/zeptodb -n zeptodb --timeout=5m
# 3. Verifycurl -s http://$LB:8123/healthcurl -X POST http://$LB:8123/ -d 'SELECT 1'Runbook: Disk Full
Section titled “Runbook: Disk Full”# 1. Checkkubectl exec -n zeptodb <pod> -- df -h /opt/zeptodb/data
# 2. Clean up old HDB data (TTL setting)curl -X POST http://$LB:8123/ \ -d "ALTER TABLE trades SET TTL 90 DAYS"
# 3. Expand PVC (if StorageClass has allowVolumeExpansion: true)kubectl patch pvc zeptodb-data -n zeptodb \ -p '{"spec":{"resources":{"requests":{"storage":"1Ti"}}}}'Runbook: Node Drain (Maintenance)
Section titled “Runbook: Node Drain (Maintenance)”# PDB guarantees minAvailable: 2, so drain is safekubectl drain <node> --ignore-daemonsets --delete-emptydir-data
# After maintenance is completekubectl uncordon <node>Runbook: Complete Redeployment
Section titled “Runbook: Complete Redeployment”# 1. Backupkubectl create job --from=cronjob/zeptodb-backup zeptodb-pre-redeploy -n zeptodbkubectl wait --for=condition=complete job/zeptodb-pre-redeploy -n zeptodb --timeout=10m
# 2. Deletehelm uninstall zeptodb -n zeptodb# PVC is preserved (not deleted by helm uninstall)
# 3. Redeployhelm install zeptodb ./deploy/helm/zeptodb -n zeptodb -f values-prod.yaml --wait
# 4. Verifycurl -s http://$LB:8123/healthQuick Reference
Section titled “Quick Reference”# === Status ===kubectl get all -n zeptodbkubectl get hpa -n zeptodbkubectl get pvc -n zeptodbkubectl get events -n zeptodb --sort-by='.lastTimestamp' | tail -20
# === Logs ===kubectl logs -f deployment/zeptodb -n zeptodbkubectl logs <pod> -n zeptodb --previous
# === Health ===curl http://$LB:8123/healthcurl http://$LB:8123/readycurl http://$LB:8123/metrics
# === Helm ===helm list -n zeptodbhelm history zeptodb -n zeptodbhelm get values zeptodb -n zeptodb
# === Upgrade ===helm upgrade zeptodb ./deploy/helm/zeptodb -n zeptodb --set image.tag=X.Y.Z --waithelm rollback zeptodb -n zeptodb
# === Scale ===kubectl scale deployment zeptodb -n zeptodb --replicas=5kubectl top pods -n zeptodb