EKS Architecture Benchmark: amd64 vs arm64, 76/76 PASS

Running on Graviton in a single-node test is one thing. Running a full Kubernetes deployment — with rolling updates, pod disruption budgets, service failover, and Karpenter autoscaling — on both amd64 and arm64 is the real validation. This post covers ZeptoDB’s EKS architecture benchmark: 76/76 tests passing on both architectures, with no meaningful operational difference.

Test Matrix

Two test suites ran on both architectures:

Suite	Tests	Coverage
K8s Compatibility	27	Pod lifecycle, ConfigMap, Secret, PVC, Service, Ingress, HPA, Karpenter, Helm
HA + Performance	11	Rolling update, PDB, failover, HTTP throughput, network latency, pod startup
Total	38 × 2 architectures = 76

Every test passed on both amd64 and arm64. Zero architecture-specific failures.

Bugs Fixed Along the Way

The benchmark run exposed 7 bugs — none architecture-specific, all related to test infrastructure assumptions:

1. Default Argument Evaluation

# Before (broken): default evaluated at module load time
def get_pods(namespace="default", label=RELEASE):
    ...

# After: evaluated at call time
def get_pods(namespace="default", label=None):
    if label is None:
        label = RELEASE

When monkey-patching RELEASE for multi-arch runs, the default argument retained the original value. Classic Python mutable-default gotcha.

2. PDB Drain Bypass

The HA03 test used kubectl drain to verify Pod Disruption Budgets. But kubectl drain can bypass PDB when pods reschedule between sequential drain operations. The fix switches to the Kubernetes Eviction API, which respects PDB atomically.

3. Warmup After Rollback

PERF03 (network latency) failed intermittently after PERF02 triggered a rollback. The newly-rolled-back pods weren’t ready to serve traffic. Adding a warmup request before measurement fixed the flake.

4. Karpenter Node Wait

Both test files’ setup() functions didn’t wait for pods to be scheduled on Karpenter-provisioned nodes. On fresh clusters, Karpenter needs time to provision instances before pods can start. Added explicit wait-for-ready logic.

Key Numbers

Metric	amd64	arm64
Pod startup	4.92s	6.04s
Rolling update (3 replicas)	25.13s	25.38s
HTTP throughput	155 req/s	141 req/s
Service failover	3.57s	3.45s

Analysis

Pod startup: arm64 is ~1s slower, likely due to Karpenter provisioning differences (Graviton instance types have different availability). Not operationally significant.
Rolling update: Effectively identical. The update strategy (maxSurge/maxUnavailable) dominates, not the architecture.
HTTP throughput: amd64 shows ~10% higher throughput in this test. This is a single-pod measurement and may reflect instance type differences rather than architectural ones.
Service failover: Effectively identical. Kubernetes service mesh behavior is architecture-independent.

The bottom line: no metric shows a difference large enough to affect operational decisions.

What Was Tested

The K8s compatibility suite validates that ZeptoDB’s Helm chart and container images work correctly across the full Kubernetes API surface:

K8S01  Pod lifecycle (create, ready, delete)
K8S02  ConfigMap injection
K8S03  Secret mounting
K8S04  PVC provisioning (EBS gp3)
K8S05  Service (ClusterIP, NodePort)
K8S06  Ingress (ALB)
K8S07  HPA scaling
K8S08  Karpenter node provisioning
K8S09  Helm install/upgrade/rollback
...
K8S27  Multi-AZ pod distribution

The HA + performance suite validates production operational scenarios:

HA01   Rolling update (zero-downtime)
HA02   Pod failure recovery
HA03   PDB enforcement during drain
HA04   Service failover timing
PERF01 Pod startup latency
PERF02 Rolling update duration
PERF03 Network latency (intra-cluster)
PERF04 HTTP throughput (sustained)
...

arm64 Production Readiness

The conclusion for this benchmark suite is straightforward: arm64 (Graviton) reached parity with x86_64 for ZeptoDB on EKS.

Functional parity: 76/76 tests pass on both architectures
Operational parity: no meaningful difference in startup, update, failover, or throughput
Instance efficiency: Graviton showed lower EC2 cost for this benchmark environment
No code changes: same container image build pipeline, same Helm chart, same configuration

The only consideration is instance type availability — Graviton instances may have different spot-market availability and AZ coverage than their x86 equivalents. This is an infrastructure planning concern, not a ZeptoDB concern.

76/76 PASS

Full K8s compatibility + HA/performance suites passing on both amd64 and arm64 EKS.

Lower EC2 cost in this run

Graviton instances cost less with no functional or operational penalty.

7 bugs fixed

Default arg evaluation, PDB drain bypass, warmup timing, Karpenter wait — all test infrastructure fixes.

No operational difference

Pod startup, rolling update, failover, throughput — all within noise between architectures.