Cost Optimization
Resource Right-Sizing
Analyze Current Usage
# View resource requests vs actual usage
kubectl top pods -n production
# Detailed resource metrics (requires metrics-server)
kubectl get pods -n production -o custom-columns=\
"NAME:.metadata.name,\
CPU_REQ:.spec.containers[*].resources.requests.cpu,\
CPU_LIM:.spec.containers[*].resources.limits.cpu,\
MEM_REQ:.spec.containers[*].resources.requests.memory,\
MEM_LIM:.spec.containers[*].resources.limits.memory"
# Get VPA recommendations (if VPA installed)
kubectl get vpa -n production -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{.status.recommendation.containerRecommendations[*]}{"\n\n"}{end}'
Right-Sized Resource Spec
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
spec:
template:
spec:
containers:
- name: myapp
resources:
requests:
# Set to average usage + 10-20% buffer
cpu: 100m
memory: 128Mi
limits:
# CPU: 2-4x requests for burst capacity
# Memory: 1.5-2x requests (OOM prevention)
cpu: 500m
memory: 256Mi
Vertical Pod Autoscaler (VPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
# Off - only provide recommendations
# Initial - apply only on pod creation
# Auto - apply on pod creation and during runtime (with restart)
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: myapp
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2000m
memory: 2Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
VPA Recommendation Only
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa-recommender
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Off"
Horizontal Pod Autoscaler (HPA) Tuning
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 20
metrics:
# CPU-based scaling
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Memory-based scaling
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# Custom metrics (e.g., requests per second)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Spot/Preemptible Instances
Node Pool with Spot Instances (GKE)
apiVersion: container.google.com/v1
kind: NodePool
metadata:
name: spot-pool
spec:
config:
machineType: e2-standard-4
preemptible: true
taints:
- key: cloud.google.com/gke-spot
value: "true"
effect: NoSchedule
autoscaling:
enabled: true
minNodeCount: 0
maxNodeCount: 10
Workload Tolerating Spot Nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
namespace: production
spec:
template:
spec:
tolerations:
- key: cloud.google.com/gke-spot
operator: Equal
value: "true"
effect: NoSchedule
- key: kubernetes.azure.com/scalesetpriority
operator: Equal
value: spot
effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: cloud.google.com/gke-spot
operator: In
values: ["true"]
containers:
- name: processor
# ... container spec
Pod Disruption Budget for Spot
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
namespace: production
spec:
minAvailable: 2
# OR maxUnavailable: 1
selector:
matchLabels:
app: myapp
Namespace Quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
persistentvolumeclaims: "10"
requests.storage: 500Gi
pods: "50"
services: "20"
secrets: "50"
configmaps: "50"
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-object-counts
namespace: production
spec:
hard:
count/deployments.apps: "20"
count/statefulsets.apps: "5"
count/jobs.batch: "10"
LimitRange
apiVersion: v1
kind: LimitRange
metadata:
name: production-limits
namespace: production
spec:
limits:
# Default limits for containers
- type: Container
default:
cpu: 500m
memory: 256Mi
defaultRequest:
cpu: 100m
memory: 128Mi
min:
cpu: 50m
memory: 64Mi
max:
cpu: 4000m
memory: 8Gi
# Pod-level limits
- type: Pod
max:
cpu: 8000m
memory: 16Gi
# PVC limits
- type: PersistentVolumeClaim
min:
storage: 1Gi
max:
storage: 100Gi
Cluster Autoscaler Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-config
namespace: kube-system
data:
config: |
{
"scaleDownDelayAfterAdd": "10m",
"scaleDownDelayAfterDelete": "0s",
"scaleDownDelayAfterFailure": "3m",
"scaleDownUnneededTime": "10m",
"scaleDownUnreadyTime": "20m",
"scaleDownUtilizationThreshold": "0.5",
"skipNodesWithLocalStorage": "false",
"skipNodesWithSystemPods": "true",
"balanceSimilarNodeGroups": "true",
"expander": "least-waste"
}
Cost Monitoring
Kubecost Deployment
# Install Kubecost
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="YOUR_TOKEN"
Prometheus Cost Metrics
# Pod cost label for attribution
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
cost-center: engineering
team: platform
environment: production
spec:
template:
metadata:
labels:
cost-center: engineering
team: platform
Scheduled Scaling
# Scale down dev environments overnight
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-dev
namespace: development
spec:
schedule: "0 20 * * 1-5" # 8 PM Mon-Fri
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=0 -n development
restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-up-dev
namespace: development
spec:
schedule: "0 8 * * 1-5" # 8 AM Mon-Fri
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment frontend --replicas=2 -n development
kubectl scale deployment backend --replicas=2 -n development
restartPolicy: OnFailure
Priority Classes
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 100
globalDefault: false
preemptionPolicy: Never
description: "Batch jobs that can be preempted"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-job
spec:
template:
spec:
priorityClassName: low-priority
# ...
Best Practices
- Set resource requests on all containers (enables efficient scheduling)
- Use VPA recommendations to right-size workloads
- Tune HPA stabilization to prevent thrashing
- Leverage spot instances for fault-tolerant workloads
- Implement PDBs to maintain availability during disruptions
- Set namespace quotas to prevent resource hogging
- Use LimitRanges to enforce sensible defaults
- Label resources for cost attribution
- Schedule dev environments to scale down off-hours
- Monitor with Kubecost or cloud cost tools
- Use priority classes to ensure critical workloads run
- Review unused resources regularly (idle deployments, orphaned PVCs)