Skip to content

Kubernetes AutoScaling

Kubernetes AutoScaling is the automated process of adjusting system resources to handle fluctuations in workload.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md] Instead of manually modifying configurations, AutoScaling mechanisms monitor resource usage and perform horizontal, vertical, or multidimensional scaling based on predefined settings.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

Effective AutoScaling relies on two prerequisites: proper resource configuration (requests and limits) and the installation of a Metrics Server to provide the necessary usage data.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

Types of Autoscalers

Kubernetes offers several autoscaling components that operate at different levels of the infrastructure.

Cluster Autoscaler (CA)

The Cluster Autoscaler operates at the cluster level, managing the number of nodes (instances) in the node pool.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

  • Scale-up: Triggered when pods exist in an unschedulable state for approximately ten seconds due to insufficient resources.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md] While the decision to scale happens quickly, provisioning new nodes may take several minutes.
  • Scale-down: The CA periodically checks (default 10 seconds) if the sum of CPU and memory requests is below 50% and that no scheduling restrictions (like Pod Disruption Budgets) prevent removal.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]
  • Protection: Nodes can be protected from being scaled down using the annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true".^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

This functionality is primarily relevant for cloud-based Kubernetes platforms (e.g., GCP GKE, AWS EKS) rather than local single-node environments like Docker Desktop.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler operates at the Pod level, automatically adjusting the number of replicas in a deployment (scaling out or in).^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

  • Mechanism: It monitors the Metrics Server; if resource usage exceeds the target threshold, it increases replicas. If usage drops, it decreases them.
  • Overrides: Any replica settings defined directly in the deployment are overridden by the HPA configuration.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]
  • Stabilization: After a scaling event, HPA waits 3–5 minutes for the system to stabilize before checking Metrics again.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]
  • Calculation: The target replica count is calculated using the formula: ceil[currentReplicas * (currentMetricValue / desiredMetricValue)].^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]
  • API Versions: The API evolves rapidly; v2beta2 and later support memory Metrics, whereas v1 only supported CPU utilization.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

Vertical Pod Autoscaler (VPA)

The Vertical Pod Autoscaler automatically sets optimal resource requests and limits (CPU and Memory) for pods, eliminating the need for manual tuning.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

  • Updates: When the VPA determines values need to change, it updates the resource requests and necessitates a Pod restart (eviction and recreation) to apply the new settings.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]
  • History: Recommendations are based on the Pod's historical usage data.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]
  • Limitations: VPA and HPA generally cannot be used together on the same metric (e.g., CPU) because they may conflict.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]
  • Implementation: Like Metrics Server, VPA is a custom resource and may not be installed by default in all Kubernetes clusters.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

Multidim Pod Autoscaler (MPA)

Multidimensional Pod Autoscaling (MPA) allows for scaling using multiple methods simultaneously.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]

  • Availability: This feature is currently specific to GCP GKE (Beta version) and is not available in the open-source Kubernetes community or other platforms.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]
  • Strategy: It enables using HPA for CPU scaling while simultaneously using VPA for memory scaling.^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md] This requires CPU requests/limits to be predefined in the deployment, as VPA only handles memory in this configuration.

Sources

^[400-devops-06-kubernetes-k8s-ithelp-day25-readme.md]