Last updated: 2026-04-08
Item Requirement Notes Kubernetes version 1.35 (latest standard support)Mode EKS Auto Mode AWS manages compute, networking, storage, LB, DNS Provisioning eksctldeploy/k8s/eks-bench-cluster.yaml
Component Previously Required Auto Mode Karpenter Separate Helm install Built-in VPC CNI EKS addon Built-in CoreDNS EKS addon Built-in kube-proxy EKS addon Built-in EBS CSI driver EKS addon Built-in AWS Load Balancer Controller Separate install Built-in Spot interruption handling SQS queue + controller Built-in Node patching / AMI updates Manual Automatic (21-day max lifetime) Pod Identity Agent Separate install Built-in
EKS Auto Mode creates two default node pools:
Pool Purpose systemCluster infrastructure (CoreDNS, etc.) general-purposeGeneral workloads
Applied via kubectl after cluster creation (see deploy/scripts/setup_eks.sh).
For ingestion and low-latency queries.
apiVersion : karpenter.sh/v1
- key : kubernetes.io/arch
- key : karpenter.sh/capacity-type
- key : eks.amazonaws.com/instance-category
- key : eks.amazonaws.com/instance-cpu
consolidationPolicy : WhenEmpty
Setting Value Rationale Capacity On-Demand only No Spot interruption for trading workloads Arch arm64 (Graviton) Best price-performance Instance categories c (compute), i (storage)NVMe for HDB, compute for ingestion Consolidation WhenEmpty, 30m Conservative — avoid churn during trading hours Disruption budget 1 node max Protect availability
For backtesting and batch queries.
apiVersion : karpenter.sh/v1
- key : kubernetes.io/arch
- key : karpenter.sh/capacity-type
values : [ " spot " , " on-demand " ]
- key : eks.amazonaws.com/instance-category
- key : eks.amazonaws.com/instance-cpu
values : [ " 4 " , " 8 " , " 16 " , " 32 " ]
consolidationPolicy : WhenEmptyOrUnderutilized
Setting Value Rationale Capacity Spot + On-Demand Cost optimization, wide instance selection reduces interruption Instance categories c, m, rWide selection = more Spot pools = lower interruption Consolidation WhenEmptyOrUnderutilized, 5m Aggressive — reclaim idle batch nodes fast Disruption budget 20% Allow faster consolidation
NodeClass Ephemeral Storage IOPS Throughput zepto-realtime100 Gi 6000 400 MB/s zepto-analytics200 Gi default default
Note: EKS Auto Mode uses eks.amazonaws.com/v1 API for NodeClass (not karpenter.k8s.aws/v1). Labels also use eks.amazonaws.com/* prefix instead of karpenter.k8s.aws/*.
→ HPA scales Deployment replicas (CPU > 70% or Memory > 80%)
→ New pods are Pending (no capacity)
→ Auto Mode detects Pending pods (built-in Karpenter)
→ EC2 Fleet API → new node in 30-60s
→ Pods scheduled on new node
Component Type Setting ZeptoDB pods PDB minAvailable: 2 (Helm chart)Realtime NodePool Karpenter budget max 1 node at a time Analytics NodePool Karpenter budget max 20% nodes Auto Mode default Node lifetime 21 days max (auto-replaced)
For bench clusters, scale node groups to 0 when not in use:
./tools/eks-bench.sh sleep # Scale to 0 → ~$0.10/hr (control plane only)
./tools/eks-bench.sh wake # Restore nodes → ready in 3-5 min
./tools/eks-bench.sh status # Check current state
State Hourly Cost Nodes Wake ~$3.60/hr 6 (3× r7i.2xlarge + 2× m7i.large + 1× c7i.xlarge) Sleep ~$0.10/hr 0 (control plane only)
File Description tools/eks-bench.shSleep/wake script for cost optimization deploy/scripts/setup_eks.shEKS Auto Mode setup script deploy/k8s/eks-bench-cluster.yamleksctl cluster config deploy/helm/zeptodb/values.yamlHelm values docs/operations/KUBERNETES_OPERATIONS.mdDay-2 operations guide