Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Debug and troubleshoot Azure Container Apps and Function Apps using logs, KQL, and health checks.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
troubleshooting/aks/upgrade-operations.md
1# Upgrade Operations23Use this guide when node image rotation, Kubernetes version changes, or node-pool upgrade settings appear to be the failure domain.45## Node Image / OS Upgrade Issues67> ⚠️ **Warning:** `az aks nodepool upgrade` and `az aks nodepool update --max-surge ...` change cluster state. During diagnostics, do not recommend or run upgrade actions by default. Only surface these commands after the user explicitly approves remediation or confirms the change window / change-control context.89```bash10# Check current node image versions11az aks nodepool show -g <rg> --cluster-name <cluster> -n <nodepool> \12--query "{nodeImageVersion:nodeImageVersion, osType:osType}"1314# Check available upgrades15az aks nodepool get-upgrades -g <rg> --cluster-name <cluster> --nodepool-name <nodepool>1617# Upgrade node image (non-disruptive with surge)18az aks nodepool upgrade -g <rg> --cluster-name <cluster> -n <nodepool> --node-image-only19```2021---2223## Kubernetes Version Upgrade Failures2425**Pre-upgrade check:**2627```bash28# Check for deprecated API usage before upgrading29kubectl get --raw /metrics | grep apiserver_requested_deprecated_apis3031# Verify available upgrade paths (can only skip one minor version)32az aks get-upgrades -g <rg> -n <cluster> -o table33```3435**Upgrade stuck or failed:**3637```bash38# Check control plane provisioning state39az aks show -g <rg> -n <cluster> --query "provisioningState"4041# If stuck: check AKS diagnostics blade in portal42# Azure Portal -> AKS cluster -> Diagnose and solve problems -> Upgrade43```4445Common causes: PDB blocking drain (`kubectl get pdb -A`), deprecated APIs in use, custom admission webhooks failing (`kubectl get validatingwebhookconfiguration`).4647---4849## Zero-Downtime Node Pool Upgrades5051`maxSurge` controls how many extra nodes are provisioned during upgrade.5253```bash54# Check current maxSurge55az aks nodepool show -g <rg> --cluster-name <cluster> -n <nodepool> \56--query "upgradeSettings.maxSurge"5758az aks nodepool update -g <rg> --cluster-name <cluster> -n <nodepool> \59--max-surge 33%60```6162**Upgrade stuck / nodes not draining:**6364```bash65kubectl get pdb -A66kubectl describe pdb <pdb-name> -n <ns>67```6869If `DisruptionsAllowed: 0`, scale up the workload or temporarily relax `minAvailable`.70