Skip to content

Canary deployment with autoscaling

Canary deployment with autoscaling combines the practice of routing a subset of traffic to a new software version with dynamic resource scaling. This pattern demonstrates how version routing strategies interact with cluster performance tools.

Overview

In this hybrid approach, service versions (e.g., v1 and v2) are managed via Istio for traffic routing while simultaneously being managed by a Kubernetes Horizontal Pod Autoscaler (HPA) for capacity management^[400-devops__07-Monitoring-and-Observability__k8s-istio__samples__helloworld__README.md].

Prerequisites for Autoscaling

For the Horizontal Pod Autoscaler (HPA) to function correctly, a specific requirement must be met: all containers within the pods must request CPU resources^[400-devops__07-Monitoring-and-Observability__k8s-istio__samples__helloworld__README.md].

In an Istio-enabled environment, this includes both the application container configuration defined in the deployment and the injected istio-proxy sidecar container^[400-devops__07-Monitoring-and-Observability__k8s-istio__samples__helloworld__README.md]. If the main application containers lack a CPU request, the autoscaler will not be able to calculate metrics and scale the deployment^[400-devops__07-Monitoring-and-Observability__k8s-istio__samples__helloworld__README.md].

Configuration

To enable autoscaling on specific versions of a deployment, the kubectl autoscale command is applied to each service version individually^[400-devops__07-Monitoring-and-Observability__k8s-istio__samples__helloworld__README.md].

For example, to autoscale versions v1 and v2 with a target CPU utilization of 50%, the following commands are used^[400-devops__07-Monitoring-and-Observability__k8s-istio__samples__helloworld__README.md]:

kubectl autoscale deployment helloworld-v1 --cpu-percent=50 --min=1 --max=10
kubectl autoscale deployment helloworld-v2 --cpu-percent=50 --min=1 --max=10

You can verify the status of the autoscalers using kubectl get hpa^[400-devops__07-Monitoring-and-Observability__k8s-istio__samples__helloworld__README.md].

Verification

To confirm that the autoscaler reacts correctly to the traffic routed to the canary versions, a load generator script can be executed to simulate traffic^[400-devops__07-Monitoring-and-Observability__k8s-istio__samples__helloworld__README.md].

After running the load tests, checking the HPA status should reveal that the REPLICAS count has increased (e.g., to a value > 1) to handle the demand^[400-devops__07-Monitoring-and-Observability__k8s-istio__samples__helloworld__README.md].

Sources