Kubernetes Cost Audit Blueprint: The 60% Reduction Playbook

Kubernetes Cost Audit Blueprint - The 60% Reduction Playbook

A structured approach to finding the waste in Kubernetes clusters - based on the methodology that reduced one client's AWS spend by $420k a year.

Kubernetes FinOps AWS EKS Cost Optimisation Karpenter Spot Instances Rightsizing

Most Kubernetes clusters we audit are running at 15–25% of their requested resource capacity. The waste is invisible because the cluster looks healthy - pods are running, nothing is alerting, the dashboards are green. The bill just keeps growing. This blueprint documents the methodology we used to reduce one client's Kubernetes spend from $700k to $280k annually. The numbers are real. The methodology is repeatable. We've run the same process across four different clusters since then with similar results.

What's inside

The document is structured into 7 sections. Each is self-contained - you can use individual sections as standalone references or work through the document in sequence.

Phase 1: Baseline measurement

How to pull two weeks of actual CPU and memory utilisation data using Prometheus and kube-state-metrics, and how to read the output in a way that identifies rightsizing opportunities vs genuine load. Includes the specific PromQL queries that surface the worst offenders.

Phase 2: Rightsizing workloads

The Goldilocks + VPA setup we use for recommendations, the decision rules for applying recommendations safely vs staging them in canary deployments first, and the categories of workload that need manual review regardless of what VPA suggests.

Phase 3: Node group restructuring

How to identify which instance families fit your workload profiles (compute-optimised for API servers, memory-optimised for caches), how to configure node affinity to route workloads correctly, and the node group sizing changes that reduce bin-packing waste.

Phase 4: Spot instance migration

The workload categorisation framework - Spot-safe, mixed, On-Demand only - and the Karpenter NodePool configuration for mixed provisioning. Includes the pod topology spread constraints that prevent single-AZ Spot concentration.

Phase 5: Autoscaling

The HPA configuration changes that pair with rightsizing, the KEDA ScaledObject setup using Prometheus RPS metrics for services where CPU is a lagging indicator, and the off-peak scale-down configuration that reduces overnight costs without SLA risk.

Phase 6: Orphaned resource cleanup

The audit process for identifying unused PersistentVolumes, idle LoadBalancers, stale snapshots, and forgotten node groups - the category that's often overlooked but consistently delivers $15–30k of quick wins before any structural change.

FinOps tagging policy

The Kubernetes label taxonomy and AWS tag propagation setup that makes cost attribution by team, environment, and workload actually work. The policy template we use on every EKS engagement, with the Terraform module for enforcement.

What this doesn't cover

This blueprint is written for AWS EKS. The rightsizing methodology and Spot migration patterns apply across cloud providers, but the specific tooling references (Karpenter, AWS cost explorer queries, RDS connection pooling) are AWS-specific. GCP GKE and Azure AKS equivalents are noted where they differ significantly.

Who this is for

Platform engineers or DevOps leads with an EKS cluster and a growing AWS bill

Engineering managers preparing a FinOps initiative for their organisation

CTOs evaluating whether Kubernetes cost optimisation is worth the engineering time (it almost always is)

Teams that have done basic rightsizing but aren't sure what else to look at

How it was built

Built from the methodology used in a real Q4 2024 EKS engagement. The Terraform modules are from the same codebase used in production. Updated December 2024 to include Karpenter v1 API changes.

Every resource Sequere publishes is written by the engineers who ran the actual engagement - not by a content team working from secondhand notes. The trade-off is that we publish less frequently. The benefit is that the specifics are real.

Download

This resource is free to download with no account or signup required. The PDF downloads immediately.

If you use this resource on a real project and have feedback - things that were missing, out of date, or wrong - we want to hear it. Every update to this document has come from people who used it in production.