Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Diagnose Azure service issues, query logs, and troubleshoot failures using GitHub Copilot for Azure
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
troubleshooting/aks/upgrade-operations.md
1# Upgrade Operations23Use this guide when node image rotation, Kubernetes version changes, or node-pool upgrade settings appear to be the failure domain.45## Node Image / OS Upgrade Issues67> ⚠️ **Warning:** `az aks nodepool upgrade` and `az aks nodepool update --max-surge ...` change cluster state. During diagnostics, do not recommend or run upgrade actions by default. Only surface these commands after the user explicitly approves remediation or confirms the change window / change-control context.89```bash10# Check current node image versions11az aks nodepool show -g <rg> --cluster-name <cluster> -n <nodepool> \12--query "{nodeImageVersion:nodeImageVersion, osType:osType}"1314# Check available upgrades15az aks nodepool get-upgrades -g <rg> --cluster-name <cluster> --nodepool-name <nodepool>1617# Upgrade node image (non-disruptive with surge)18az aks nodepool upgrade -g <rg> --cluster-name <cluster> -n <nodepool> --node-image-only19```2021---2223## Kubernetes Version Upgrade Failures2425**Pre-upgrade check:**2627```bash28# Check for deprecated API usage before upgrading29kubectl get --raw /metrics | grep apiserver_requested_deprecated_apis3031# Verify available upgrade paths (can only skip one minor version)32az aks get-upgrades -g <rg> -n <cluster> -o table33```3435**Upgrade stuck or failed:**3637```bash38# Check control plane provisioning state39az aks show -g <rg> -n <cluster> --query "provisioningState"4041# If stuck: check AKS diagnostics blade in portal42# Azure Portal -> AKS cluster -> Diagnose and solve problems -> Upgrade43```4445Common causes: PDB blocking drain (`kubectl get pdb -A`), deprecated APIs in use, custom admission webhooks failing (`kubectl get validatingwebhookconfiguration`).4647---4849## Zero-Downtime Node Pool Upgrades5051`maxSurge` controls how many extra nodes are provisioned during upgrade.5253```bash54# Check current maxSurge55az aks nodepool show -g <rg> --cluster-name <cluster> -n <nodepool> \56--query "upgradeSettings.maxSurge"5758az aks nodepool update -g <rg> --cluster-name <cluster> -n <nodepool> \59--max-surge 33%60```6162**Upgrade stuck / nodes not draining:**6364```bash65kubectl get pdb -A66kubectl describe pdb <pdb-name> -n <ns>67```6869If `DisruptionsAllowed: 0`, scale up the workload or temporarily relax `minAvailable`.70