All posts

Karpenter Node Autoprovisioning: Cut Kubernetes Costs 40%+

GeekOnCloud··6 min read

Ever watched your Kubernetes cluster spin up m5.24xlarge instances for pods that need 500m CPU? Or waited 3 minutes for Cluster Autoscaler to provision a node while your deployment sat pending? If you're running EKS at any real scale, you've felt these pain points. Karpenter fixes both—and typically cuts compute costs 30-60% in the process.

I migrated a 200-node production cluster from Cluster Autoscaler to Karpenter last quarter. Monthly EC2 bill dropped from $47K to $28K. Here's exactly how Karpenter works and how to implement it without torching your production environment.

Why Cluster Autoscaler Falls Short

Cluster Autoscaler was designed for a simpler time. It works with node groups—predefined templates that specify exact instance types. You want m5.large nodes? Create a node group. Need c5.2xlarge for compute workloads? Another node group. GPU instances? You guessed it.

This creates three expensive problems:

Bin-packing inefficiency: When a pod needs 3GB RAM, Cluster Autoscaler might provision a 16GB node because that's what the node group specifies. You're paying for 13GB of unused memory.

Scaling latency: Cluster Autoscaler checks for pending pods every 10-15 seconds, then calls the AWS API, waits for node registration. Real-world time from pod pending to running: 2-4 minutes.

Operational overhead: Managing 15+ node groups, keeping AMIs updated, coordinating instance type availability across AZs—it's a full-time job nobody wants.

Karpenter takes a different approach. No node groups. You define constraints (instance families, sizes, capacity types), and Karpenter provisions the exact right node for pending pods in real-time.

Karpenter Architecture in 60 Seconds

Karpenter runs as a deployment in your cluster with two core components:

The controller watches for pending pods and nodes, making provisioning and deprovisioning decisions. The webhook validates and mutates Karpenter custom resources.

When pods go pending, Karpenter:

  1. Groups pending pods by scheduling constraints
  2. Runs a bin-packing simulation across all available instance types
  3. Selects the cheapest instance(s) that fit the workload
  4. Calls EC2 directly (not ASG) to launch instances
  5. Node joins cluster in 45-90 seconds

That direct EC2 integration is key. No Auto Scaling Group overhead, no launch template indirection. Karpenter makes one API call and you get a node.

Installing Karpenter on EKS

Prerequisites: EKS 1.25+, Helm 3.x, and IAM permissions for EC2, IAM, and SSM.

First, set up the IAM roles. Karpenter needs permission to create instances, pass roles, and read SSM parameters for AMI discovery:

export KARPENTER_VERSION="0.37.0"
export CLUSTER_NAME="production-eks"
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export KARPENTER_IAM_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}"

# Create the IAM role (using eksctl for brevity—Terraform works too)
eksctl create iamserviceaccount \
  --cluster="${CLUSTER_NAME}" \
  --name=karpenter \
  --namespace=kube-system \
  --role-name="KarpenterControllerRole-${CLUSTER_NAME}" \
  --attach-policy-arn="arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy" \
  --override-existing-serviceaccounts \
  --approve

# Install Karpenter via Helm
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "${KARPENTER_VERSION}" \
  --namespace kube-system \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

The interruption queue handles Spot termination notices. Don't skip it if you're using Spot instances.

Configuring NodePools and EC2NodeClasses

Karpenter uses two CRDs: NodePool defines what workloads can run on provisioned nodes, and EC2NodeClass defines how to configure the underlying EC2 instances.

Here's a production-ready configuration that prioritizes Spot instances, allows fallback to On-Demand, and restricts to current-gen instance families:

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["medium", "large", "xlarge", "2xlarge", "4xlarge"]
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
    memory: 2000Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "production-eks"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "production-eks"
  instanceStorePolicy: RAID0
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        deleteOnTermination: true
  tags:
    Environment: production
    ManagedBy: karpenter

Key configuration choices here:

  • Instance generation > 5: Excludes old instance types (m4, c4) that cost more per vCPU
  • Instance categories c/m/r: Compute, general purpose, and memory-optimized covers 95% of workloads
  • consolidationPolicy: WhenUnderutilized: Karpenter will actively move pods and terminate underused nodes
  • consolidateAfter: 30s: Aggressive consolidation—adjust to 5m for less churn in sensitive environments

Spot Integration and Interruption Handling

Karpenter handles Spot brilliantly. By allowing multiple instance types and sizes, you tap into Spot's diversified allocation strategy—AWS picks from available capacity pools, reducing interruption rates.

Create an SQS queue for interruption handling:

aws sqs create-queue --queue-name "${CLUSTER_NAME}" --attributes '{
  "MessageRetentionPeriod": "300",
  "SqsAccessPolicy": "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":[\"events.amazonaws.com\",\"sqs.amazonaws.com\"]},\"Action\":\"sqs:SendMessage\",\"Resource\":\"arn:aws:sqs:us-east-1:'${AWS_ACCOUNT_ID}':'${CLUSTER_NAME}'\"}]}"
}'

# Create EventBridge rules for Spot interruptions and rebalance recommendations
aws events put-rule \
  --name "${CLUSTER_NAME}-spot-interruption" \
  --event-pattern '{"source":["aws.ec2"],"detail-type":["EC2 Spot Instance Interruption Warning"]}'

aws events put-targets \
  --rule "${CLUSTER_NAME}-spot-interruption" \
  --targets "Id"="1","Arn"="arn:aws:sqs:us-east-1:${AWS_ACCOUNT_ID}:${CLUSTER_NAME}"

When AWS sends a 2-minute Spot interruption notice, Karpenter cordons the node, drains pods gracefully, and they're rescheduled on new capacity before termination.

Measuring Real Cost Savings

After running Karpenter for 30 days, here's how I measured impact:

Before (Cluster Autoscaler):

  • Average node utilization: 34%
  • Spot usage: 45% (manually managed node groups)
  • Monthly compute: $47,200

After (Karpenter):

  • Average node utilization: 71%
  • Spot usage: 78% (automatic with fallback)
  • Monthly compute: $28,100

The utilization jump comes from right-sizing. Karpenter provisions c5.xlarge for a pod requesting 3 vCPU instead of c5.4xlarge because that's what the node group specified.

Use these commands to monitor your migration:

# Check Karpenter's provisioning decisions
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -c controller | grep -E "(Provisioned|Deprovisioned)"

# View current node composition
kubectl get nodes -L karpenter.sh/capacity-type,node.kubernetes.io/instance-type,karpenter.sh/nodepool

# Check consolidation candidates
kubectl get nodes -L karpenter.sh/nodepool -o custom-columns=NAME:.metadata.name,CPU:.status.allocatable.cpu,USED:.status.capacity.cpu

Migration Path from Cluster Autoscaler

Don't rip and replace. Run both systems in parallel:

  1. Install Karpenter with a NodePool that has a unique taint
  2. Deploy non-critical workloads with tolerations for that taint
  3. Monitor for a week—watch provisioning times, Spot interruption handling, consolidation behavior
  4. Gradually migrate workloads by removing taints from the NodePool
  5. Scale down Cluster Autoscaler node groups as utilization drops
  6. Delete Cluster Autoscaler when all workloads run on Karpenter nodes

The parallel approach costs a few extra nodes for a week. That's nothing compared to the risk of a botched migration during peak traffic.

Tag your Karpenter-provisioned nodes and set up a CloudWatch dashboard showing cost by tag. You'll see the crossover point clearly—usually within the first week.

Start with your dev cluster tomorrow. The 20-minute installation pays back in hours of saved node-group wrangling, and the cost savings fund themselves within the first billing cycle.

Written by GeekOnCloud

DevOps & Infrastructure engineer at geekoncloud.com

Read more posts →