General AKS Investigation & Diagnostics

"What happened in my cluster?"

When a user asks a broad question like "what happened in my AKS cluster?" or "check my AKS status", follow this systematic flow:

Cluster health
Recent events
Node status
Unhealthy pods
All pods overview
System pods health
Activity log

az aks show -g <rg> -n <cluster> --query "provisioningState"
kubectl get events -A --sort-by='.lastTimestamp' | head -40
kubectl get nodes -o wide
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded
kubectl get pods -A -o wide
kubectl get pods -n kube-system -o wide
az monitor activity-log list -g <rg> --max-events 20 -o table

AKS CLI Tools

# Get cluster credentials (required before kubectl commands)
az aks get-credentials -g <rg> -n <cluster>

# View node pools
az aks nodepool list -g <rg> --cluster-name <cluster> -o table

AppLens (MCP) for AKS

For AI-powered diagnostics:

mcp_azure_mcp_applens
  intent: "diagnose AKS cluster issues"
  command: "diagnose"
  parameters:
    resourceId: "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ContainerService/managedClusters/<cluster>"

💡 Tip: AppLens automatically detects common issues and provides remediation recommendations using the cluster resource ID.

Best Practices

Start with kubectl get/describe - Always check basic status first
Check events - kubectl get events -A reveals recent issues
Use systematic isolation - Pod -> Node -> Cluster -> Network
Document changes - Note what you tried and what worked
Escalate when needed - For control plane issues, contact Azure support

General AKS Investigation & Diagnostics

"What happened in my cluster?"

When a user asks a broad question like "what happened in my AKS cluster?" or "check my AKS status", follow this systematic flow:

Cluster health
Recent events
Node status
Unhealthy pods
All pods overview
System pods health
Activity log

az aks show -g <rg> -n <cluster> --query "provisioningState"
kubectl get events -A --sort-by='.lastTimestamp' | head -40
kubectl get nodes -o wide
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded
kubectl get pods -A -o wide
kubectl get pods -n kube-system -o wide
az monitor activity-log list -g <rg> --max-events 20 -o table

AKS CLI Tools

# Get cluster credentials (required before kubectl commands)
az aks get-credentials -g <rg> -n <cluster>

# View node pools
az aks nodepool list -g <rg> --cluster-name <cluster> -o table

AppLens (MCP) for AKS

For AI-powered diagnostics:

mcp_azure_mcp_applens
  intent: "diagnose AKS cluster issues"
  command: "diagnose"
  parameters:
    resourceId: "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ContainerService/managedClusters/<cluster>"

💡 Tip: AppLens automatically detects common issues and provides remediation recommendations using the cluster resource ID.

Best Practices

Start with kubectl get/describe - Always check basic status first
Check events - kubectl get events -A reveals recent issues
Use systematic isolation - Pod -> Node -> Cluster -> Network
Document changes - Note what you tried and what worked
Escalate when needed - For control plane issues, contact Azure support

Azure Diagnostics

troubleshooting/aks/general-diagnostics.md

General AKS Investigation & Diagnostics

"What happened in my cluster?"

AKS CLI Tools

AppLens (MCP) for AKS

Best Practices

Preparing the source view

Azure Diagnostics

troubleshooting/aks/general-diagnostics.md

General AKS Investigation & Diagnostics

"What happened in my cluster?"

AKS CLI Tools

AppLens (MCP) for AKS

Best Practices