Skip to content

GKE version control strategy

The GKE version control strategy outlines a procedure for upgrading the Google Kubernetes Engine (GKE) version while minimizing downtime and ensuring service continuity.^[400-devops-05-cloud-provider-gcp.md]

Preparation

Before starting the upgrade, ensure the environment is authenticated and the local Terraform state matches the actual infrastructure.^[400-devops-05-cloud-provider-gcp.md] The specific GKE version must be identified from the official release notes by looking for the No channel version.^[400-devops-05-cloud-provider-gcp.md]

The configuration is then updated by modifying the gke_version variable in the relevant Terraform files to target the desired version (e.g., 1.21.13-gke.900).^[400-devops-05-cloud-provider-gcp.md]

Migration Procedure

The strategy utilizes a blue-green deployment approach using a temporary node pool to facilitate a safe transition.

  1. Enable Configuration: Restore the backend configuration file by renaming 11-app.tf.back to 11-app.tf.^[400-devops-05-cloud-provider-gcp.md]
  2. Create Temporary Pool: Rename the existing pool definition to "temp" in the configuration and run terraform apply to create a new node pool named node_pool_1 (temp).^[400-devops-05-cloud-provider-gcp.md] This temporary pool acts as a buffer to host workloads during the upgrade.
  3. Migrate Workloads (Out): Modify the Kubernetes manifests to change the nodeSelector from pool: app to pool: temp.^[400-devops-05-cloud-provider-gcp.md] This directs traffic (e.g., Kafka pods) to the temporary nodes. It is recommended to roll this change out sequentially (e.g., for kafka1, 2, 3) with approximately 3-minute intervals to maintain stability.^[400-devops-05-cloud-provider-gcp.md]
  4. Upgrade Original Pool: Update the original google_container_node_pool resource (specifically the node_pool in the site module) to the new GKE version and apply the changes.^[400-devops-05-cloud-provider-gcp.md]
  5. Migrate Workloads (In): Once the upgrade is complete, switch the nodeSelector back from pool: temp to pool: app to move workloads back to the upgraded node pool.^[400-devops-05-cloud-provider-gcp.md]
  6. Cleanup: Finally, rename the active Terraform file to 11-app.tf.back to effectively disable the temporary pool configuration for future runs.^[400-devops-05-cloud-provider-gcp.md]

Post-Upgrade Tasks

After the infrastructure upgrade, manual intervention is required to update network configurations.^[400-devops-05-cloud-provider-gcp.md] Specifically, the VCP network IP reserved for Jenkins (or the DevOps environment) may need to be adjusted manually in the GCP console.^[400-devops-05-cloud-provider-gcp.md]

Sources

  • 400-devops-05-cloud-provider-gcp.md