Skip to content

EKS Cluster Requirements for ZeptoDB

Last updated: 2026-04-08


ItemRequirementNotes
Kubernetes version1.35 (latest standard support)
ModeEKS Auto ModeAWS manages compute, networking, storage, LB, DNS
Provisioningeksctldeploy/k8s/eks-bench-cluster.yaml

2. What EKS Auto Mode Manages (No Manual Install Needed)

Section titled “2. What EKS Auto Mode Manages (No Manual Install Needed)”
ComponentPreviously RequiredAuto Mode
KarpenterSeparate Helm installBuilt-in
VPC CNIEKS addonBuilt-in
CoreDNSEKS addonBuilt-in
kube-proxyEKS addonBuilt-in
EBS CSI driverEKS addonBuilt-in
AWS Load Balancer ControllerSeparate installBuilt-in
Spot interruption handlingSQS queue + controllerBuilt-in
Node patching / AMI updatesManualAutomatic (21-day max lifetime)
Pod Identity AgentSeparate installBuilt-in

EKS Auto Mode creates two default node pools:

PoolPurpose
systemCluster infrastructure (CoreDNS, etc.)
general-purposeGeneral workloads

Applied via kubectl after cluster creation (see deploy/scripts/setup_eks.sh).

For ingestion and low-latency queries.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: zepto-realtime
spec:
weight: 10
template:
spec:
nodeClassRef:
group: eks.amazonaws.com
kind: NodeClass
name: zepto-realtime
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: eks.amazonaws.com/instance-category
operator: In
values: ["c", "i"]
- key: eks.amazonaws.com/instance-cpu
operator: In
values: ["4", "8", "16"]
limits:
cpu: "64"
memory: 128Gi
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30m
budgets:
- nodes: "1"
SettingValueRationale
CapacityOn-Demand onlyNo Spot interruption for trading workloads
Archarm64 (Graviton)Best price-performance
Instance categoriesc (compute), i (storage)NVMe for HDB, compute for ingestion
ConsolidationWhenEmpty, 30mConservative — avoid churn during trading hours
Disruption budget1 node maxProtect availability

For backtesting and batch queries.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: zepto-analytics
spec:
weight: 50
template:
spec:
nodeClassRef:
group: eks.amazonaws.com
kind: NodeClass
name: zepto-analytics
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: eks.amazonaws.com/instance-category
operator: In
values: ["c", "m", "r"]
- key: eks.amazonaws.com/instance-cpu
operator: In
values: ["4", "8", "16", "32"]
limits:
cpu: "128"
memory: 512Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 5m
budgets:
- nodes: "20%"
SettingValueRationale
CapacitySpot + On-DemandCost optimization, wide instance selection reduces interruption
Instance categoriesc, m, rWide selection = more Spot pools = lower interruption
ConsolidationWhenEmptyOrUnderutilized, 5mAggressive — reclaim idle batch nodes fast
Disruption budget20%Allow faster consolidation
NodeClassEphemeral StorageIOPSThroughput
zepto-realtime100 Gi6000400 MB/s
zepto-analytics200 Gidefaultdefault

Note: EKS Auto Mode uses eks.amazonaws.com/v1 API for NodeClass (not karpenter.k8s.aws/v1). Labels also use eks.amazonaws.com/* prefix instead of karpenter.k8s.aws/*.

Pod demand increases
→ HPA scales Deployment replicas (CPU > 70% or Memory > 80%)
→ New pods are Pending (no capacity)
→ Auto Mode detects Pending pods (built-in Karpenter)
→ EC2 Fleet API → new node in 30-60s
→ Pods scheduled on new node
ComponentTypeSetting
ZeptoDB podsPDBminAvailable: 2 (Helm chart)
Realtime NodePoolKarpenter budgetmax 1 node at a time
Analytics NodePoolKarpenter budgetmax 20% nodes
Auto Mode defaultNode lifetime21 days max (auto-replaced)
  • API server endpoint: restrict publicAccessCidrs (not 0.0.0.0/0)
  • Enable private endpoint access
  • Enable control plane logging (api, audit, authenticator)
  • Use Pod Identity for service accounts

For bench clusters, scale node groups to 0 when not in use:

Terminal window
./tools/eks-bench.sh sleep # Scale to 0 → ~$0.10/hr (control plane only)
./tools/eks-bench.sh wake # Restore nodes → ready in 3-5 min
./tools/eks-bench.sh status # Check current state
StateHourly CostNodes
Wake~$3.60/hr6 (3× r7i.2xlarge + 2× m7i.large + 1× c7i.xlarge)
Sleep~$0.10/hr0 (control plane only)
FileDescription
tools/eks-bench.shSleep/wake script for cost optimization
deploy/scripts/setup_eks.shEKS Auto Mode setup script
deploy/k8s/eks-bench-cluster.yamleksctl cluster config
deploy/helm/zeptodb/values.yamlHelm values
docs/operations/KUBERNETES_OPERATIONS.mdDay-2 operations guide