General AKS Investigation & Diagnostics
"What happened in my cluster?"
When a user asks a broad question like "what happened in my AKS cluster?" or "check my AKS status", follow this systematic flow:
- Cluster health
- Recent events
- Node status
- Unhealthy pods
- All pods overview
- System pods health
- Activity log
az aks show -g <rg> -n <cluster> --query "provisioningState"
kubectl get events -A --sort-by='.lastTimestamp' | head -40
kubectl get nodes -o wide
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded
kubectl get pods -A -o wide
kubectl get pods -n kube-system -o wide
az monitor activity-log list -g <rg> --max-events 20 -o tableAKS CLI Tools
# Get cluster credentials (required before kubectl commands)
az aks get-credentials -g <rg> -n <cluster>
# View node pools
az aks nodepool list -g <rg> --cluster-name <cluster> -o tableAppLens (MCP) for AKS
For AI-powered diagnostics:
mcp_azure_mcp_applens
intent: "diagnose AKS cluster issues"
command: "diagnose"
parameters:
resourceId: "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ContainerService/managedClusters/<cluster>"💡 Tip: AppLens automatically detects common issues and provides remediation recommendations using the cluster resource ID.
Best Practices
- Start with kubectl get/describe - Always check basic status first
- Check events -
kubectl get events -Areveals recent issues - Use systematic isolation - Pod -> Node -> Cluster -> Network
- Document changes - Note what you tried and what worked
- Escalate when needed - For control plane issues, contact Azure support