Skip to content

Blue-green node pool migration

Blue-green node pool migration is a deployment strategy used to upgrade or modify Kubernetes node pools with minimal downtime and service interruption^[400-devops-05-cloud-provider-gcp.md]. This method involves creating a temporary set of nodes (the "green" environment) to handle traffic while the original nodes (the "blue" environment) are being updated^[400-devops-05-cloud-provider-gcp.md].

Procedure

The migration process involves creating a temporary node pool, shifting workloads to it, updating the original pool, and then shifting workloads back^[400-devops-05-cloud-provider-gcp.md].

  1. Infrastructure Update: Configure the desired GKE version in the Terraform variables (e.g., gke_version) and apply changes to create a new resource definition^[400-devops-05-cloud-provider-gcp.md].
  2. Temporary Pool Creation: Rename the existing application configuration (e.g., 11-app.tf) to a backup and create a new configuration to provision a temporary node pool (e.g., named node_pool_1 or temp)^[400-devops-05-cloud-provider-gcp.md].
  3. Workload Migration (Cordon & Drain): Modify Kubernetes manifests (e.g., nodeSelector: pool: app to nodeSelector: pool: temp) and apply the changes^[400-devops-05-cloud-provider-gcp.md]. To ensure stability, workloads such as Kafka should be migrated sequentially (e.g.,轮流指向), with approximately 3-minute intervals between shifts^[400-devops-05-cloud-provider-gcp.md].
  4. Original Pool Upgrade: Update the infrastructure configuration for the original node pool (e.g., in 03-node-pool.tf) to the new GKE version and apply the upgrade^[400-devops-05-cloud-provider-gcp.md].
  5. Restoration: Redirect the nodeSelector back to the original (upgraded) pool (e.g., pool: app) and re-apply the Kubernetes manifests to move workloads back^[400-devops-05-cloud-provider-gcp.md].
  6. Cleanup: Remove the temporary configuration file and perform necessary manual checks, such as adjusting VCP network operations IPs^[400-devops-05-cloud-provider-gcp.md].

Sources

  • 400-devops-05-cloud-provider-gcp.md