FinOps · Kubernetes
Why your Kubernetes cluster wastes 40% of its budget
Most Kubernetes clusters run at 35 to 50% real utilization. The rest is capacity you are paying for and not using. The good news: almost all of that waste comes from four predictable leaks, and you can close them in a few weeks without touching application code or sacrificing reliability.
TL;DR
- Clusters typically operate at 35 to 50% real utilization. The rest is paid-for headroom.
- Four leaks account for almost all of it: over-requested resource limits, poor bin-packing, idle non-production environments, and orphaned storage and load balancers.
- Karpenter, VPA in recommendation mode, and a deliberate spot and on-demand mix recover 30 to 50% of compute spend in 4 to 8 weeks.
- None of it requires application changes or reliability trade-offs.
Your bill says one thing, your dashboards say another
Engineers set defensive resource requests, say 2 CPU and 4Gi of memory, for a service that actually uses 400m CPU and 800Mi. That headroom exists only on paper, but the cluster autoscaler provisions nodes based on requested capacity, not used capacity, so you pay for the gap. One company we looked at watched its bill climb from $38K to $96K a month over fourteen months while headcount grew only 40%.
Four leaks, four fixes
- Over-requested resources (15 to 25%). Requests run 2 to 4x real p95 usage. Fix: VPA in recommendation mode.
- Poor bin-packing (5 to 15%). Half-empty nodes. Fix: Karpenter consolidation.
- Idle non-production (5 to 10%). Dev and staging running 24/7. Fix: scheduled scale-to-zero.
- Orphaned storage & load balancers (1 to 5%). Resources outliving workloads. Fix: a monthly sweep.
1. Over-requested resources (15 to 25% of spend)
Compare actual usage (kubectl top pods) against resources.requests over two weeks, at p95 rather than averages. Requested CPU typically runs 2 to 4x actual p95 usage. Deploy the Vertical Pod Autoscaler in recommendation mode, not auto, for two weeks, then batch-apply cuts of 30 to 50% on high-spend, low-risk services.
2. Poor bin-packing (5 to 15% of spend)
The default Cluster Autoscaler spreads pods across nodes, so you end up running six half-empty m5.2xlarge instances instead of three full ones. Replace it with Karpenter, which provisions right-sized nodes for pending pods, consolidates aggressively, and mixes instance types. That alone usually cuts node count by 20 to 35% for the same workload.
3. Idle non-production environments (5 to 10% of spend)
Dev and staging clusters run around the clock at production scale, for no reason. Scale them to zero outside business hours with a CronJob or kube-downscaler. That recovers 60 to 70% of non-production compute cost.
4. Orphaned storage and load balancers (1 to 5% of spend)
PersistentVolumeClaims and LoadBalancer Services outlive the workloads that created them. A monthly automated sweep, comparing kubectl get pvc across namespaces and orphaned ELBs and NLBs via cloud tags, keeps the long tail in check.
What this looks like in practice
One Series C SaaS team cut its EKS bill from $96K to $61K a month, a 36% reduction. The findings were familiar: CPU requests averaging 3.1x actual p95 usage, eleven nodes running under 35% utilization, and three full-size staging environments running continuously. A VPA rollout in batches, a Karpenter migration, and scheduled scale-down for non-prod took six weeks, with zero customer-facing incidents.
Trade-offs worth knowing
- Don't run VPA in auto mode on stateful or latency-sensitive services at first. It restarts pods to apply new values.
- Spot instances suit batch jobs, CI runners, and stateless services with graceful shutdown, not anything that needs high availability.
- Don't chase the last 5%. The first 30% of savings costs a few weeks of focused work; the rest costs far more for far less.
What to do next
Run a two-week comparison of actual p95 usage against requested resources, per namespace. Start VPA in recommendation mode on your ten highest-spend services. If you want a second set of eyes, a Cloud Infrastructure Assessment will surface the waste with specific instance IDs and a prioritized fix list.