GKE version control strategy¶
The GKE version control strategy outlines a procedure for upgrading the Google Kubernetes Engine (GKE) version while minimizing downtime and ensuring service continuity.^[400-devops-05-cloud-provider-gcp.md]
Preparation¶
Before starting the upgrade, ensure the environment is authenticated and the local Terraform state matches the actual infrastructure.^[400-devops-05-cloud-provider-gcp.md] The specific GKE version must be identified from the official release notes by looking for the No channel version.^[400-devops-05-cloud-provider-gcp.md]
The configuration is then updated by modifying the gke_version variable in the relevant Terraform files to target the desired version (e.g., 1.21.13-gke.900).^[400-devops-05-cloud-provider-gcp.md]
Migration Procedure¶
The strategy utilizes a blue-green deployment approach using a temporary node pool to facilitate a safe transition.
- Enable Configuration: Restore the backend configuration file by renaming
11-app.tf.backto11-app.tf.^[400-devops-05-cloud-provider-gcp.md] - Create Temporary Pool: Rename the existing pool definition to "temp" in the configuration and run
terraform applyto create a new node pool namednode_pool_1(temp).^[400-devops-05-cloud-provider-gcp.md] This temporary pool acts as a buffer to host workloads during the upgrade. - Migrate Workloads (Out): Modify the Kubernetes manifests to change the
nodeSelectorfrompool: apptopool: temp.^[400-devops-05-cloud-provider-gcp.md] This directs traffic (e.g., Kafka pods) to the temporary nodes. It is recommended to roll this change out sequentially (e.g., for kafka1, 2, 3) with approximately 3-minute intervals to maintain stability.^[400-devops-05-cloud-provider-gcp.md] - Upgrade Original Pool: Update the original
google_container_node_poolresource (specifically thenode_poolin the site module) to the new GKE version and apply the changes.^[400-devops-05-cloud-provider-gcp.md] - Migrate Workloads (In): Once the upgrade is complete, switch the
nodeSelectorback frompool: temptopool: appto move workloads back to the upgraded node pool.^[400-devops-05-cloud-provider-gcp.md] - Cleanup: Finally, rename the active Terraform file to
11-app.tf.backto effectively disable the temporary pool configuration for future runs.^[400-devops-05-cloud-provider-gcp.md]
Post-Upgrade Tasks¶
After the infrastructure upgrade, manual intervention is required to update network configurations.^[400-devops-05-cloud-provider-gcp.md] Specifically, the VCP network IP reserved for Jenkins (or the DevOps environment) may need to be adjusted manually in the GCP console.^[400-devops-05-cloud-provider-gcp.md]
Related Concepts¶
- Blue-Green Deployment
- Terraform
- [[Kubernetes Node Selectors]]
Sources¶
- 400-devops-05-cloud-provider-gcp.md