Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Diagnose Azure service issues, query logs, and troubleshoot failures using GitHub Copilot for Azure
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
troubleshooting/aks/networking.md
1# Networking Troubleshooting23For CNI-specific issues, check CNI pod health and review [AKS networking concepts](https://learn.microsoft.com/azure/aks/concepts-network).45## Service Unreachable / Connection Refused67**Diagnostics - always start here:**89```bash10# 1. Verify service exists and has endpoints (read-only)11kubectl get svc <service-name> -n <ns>12kubectl get endpoints <service-name> -n <ns>1314# 2. Optional connectivity test from inside the namespace15# This creates a temporary pod. Prefer read-only checks first.16# Only use it after the user explicitly approves a mutating test.17kubectl run netdebug --image=curlimages/curl -it --rm -n <ns> -- \18curl -sv http://<service>.<ns>.svc.cluster.local:<port>/healthz19```2021**Decision tree:**2223| Observation | Cause | Fix |24| --------------------------------------- | ---------------------------------- | ----------------------------------------------- |25| Endpoints shows `<none>` | Label selector mismatch | Align selector with pod labels; check for typos |26| Endpoints has IPs but unreachable | Port mismatch or app not listening | Confirm `targetPort` = actual container port |27| Works from some pods, fails from others | Network policy blocking | See Network Policy section |28| Works inside cluster, fails externally | Load balancer issue | See Load Balancer section |29| `ECONNREFUSED` immediately | App not listening on that port | Check listening ports in the pod |3031Pods that are running but not Ready are removed from Endpoints. Check `kubectl get pod <pod> -n <ns>`.3233---3435## DNS Resolution Failures3637**Diagnostics:**3839The live DNS test creates a temporary pod. Prefer `get`, `describe`, `logs`, or `exec` into an existing pod first. Only use it after the user explicitly approves creating the test pod.4041```bash42# Confirm CoreDNS is running and healthy (read-only)43kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide44kubectl top pod -n kube-system -l k8s-app=kube-dns4546# Optional live DNS test from the same namespace as the failing pod47kubectl run dnstest --image=busybox:1.28 -it --rm -n <ns> -- \48nslookup <service-name>.<ns>.svc.cluster.local4950# CoreDNS logs - errors show here first51kubectl logs -n kube-system -l k8s-app=kube-dns --tail=10052```5354**DNS failure patterns:**5556| Symptom | Cause | Fix |57| ------------------------------------- | -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |58| `NXDOMAIN` for `svc.cluster.local` | CoreDNS down or pod network broken | After confirming the diagnostics above, coordinate with the cluster operator to restart or redeploy CoreDNS and verify CNI |59| Internal resolves, external NXDOMAIN | Custom DNS not forwarding to `168.63.129.16` | Fix upstream forwarder |60| Intermittent SERVFAIL under load | CoreDNS CPU throttled | Remove CPU limits or add replicas |61| Private cluster - external names fail | Custom DNS missing privatelink forwarder | Add conditional forwarder to Azure DNS |62| `i/o timeout` not `NXDOMAIN` | Port 53 blocked by NetworkPolicy or NSG | Allow UDP/TCP 53 from pods to kube-dns ClusterIP |6364> ⚠️ **Warning:** The fixes in this table can change cluster state. Use them only after performing the read-only diagnostics above, and only with explicit confirmation from the cluster owner or operator.6566```bash67kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}'68```6970Custom VNet DNS must forward `.cluster.local` to the CoreDNS ClusterIP and other lookups to `168.63.129.16`.7172---7374## Detailed Networking Guides7576- [Load Balancer And Ingress Troubleshooting](load-balancer-and-ingress.md) for pending services, ingress controller state, backend routing, and TLS failures.77- [Network Policy Troubleshooting](network-policy.md) for default-deny checks, Azure NPM or Calico validation, and ingress or egress rule audits.78