Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Diagnose Azure service issues, query logs, and troubleshoot failures using GitHub Copilot for Azure
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
troubleshooting/aks/networking.md
1# Networking Troubleshooting23For CNI-specific issues, check CNI pod health and review [AKS networking concepts](https://learn.microsoft.com/azure/aks/concepts-network).45## Service Unreachable / Connection Refused67**Diagnostics - always start here:**89```bash10# 1. Verify service exists and has endpoints (read-only)11kubectl get svc <service-name> -n <ns>12kubectl get endpoints <service-name> -n <ns>1314# 2. Optional connectivity test from inside the namespace15# This creates a temporary pod. Prefer read-only checks first.16# Only use it after the user explicitly approves a mutating test.17kubectl run netdebug --image=curlimages/curl -it --rm -n <ns> -- \18curl -sv http://<service>.<ns>.svc.cluster.local:<port>/healthz19```2021**Decision tree:**2223| Observation | Cause | Fix |24| --------------------------------------- | ---------------------------------- | ----------------------------------------------- |25| Endpoints shows `<none>` | Label selector mismatch | Align selector with pod labels; check for typos |26| Endpoints has IPs but unreachable | Port mismatch or app not listening | Confirm `targetPort` = actual container port |27| Works from some pods, fails from others | Network policy blocking | See Network Policy section |28| Works inside cluster, fails externally | Load balancer issue | See Load Balancer section |29| `ECONNREFUSED` immediately | App not listening on that port | Check listening ports in the pod |3031Pods that are running but not Ready are removed from Endpoints. Check `kubectl get pod <pod> -n <ns>`.3233**Deep diagnostics with Inspektor Gadget** (when the above checks are inconclusive):3435Use the [IG base command pattern](references/inspektor-gadget.md) with `--k8s-namespace <ns> --k8s-podname <pod-name>` and these gadgets:3637- `snapshot_socket` (timeout 5) — check what ports the pod is listening on38- `trace_tcp` (timeout 30) — trace connect/accept/close events39- `trace_tcpretrans` (timeout 30) — packet retransmissions4041See [references/inspektor-gadget.md](references/inspektor-gadget.md).4243---4445## DNS Resolution Failures4647**Diagnostics:**4849The live DNS test creates a temporary pod. Prefer `get`, `describe`, `logs`, or `exec` into an existing pod first. Only use it after the user explicitly approves creating the test pod.5051```bash52# Confirm CoreDNS is running and healthy (read-only)53kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide54kubectl top pod -n kube-system -l k8s-app=kube-dns5556# Optional live DNS test from the same namespace as the failing pod57kubectl run dnstest --image=busybox:1.28 -it --rm -n <ns> -- \58nslookup <service-name>.<ns>.svc.cluster.local5960# CoreDNS logs - errors show here first61kubectl logs -n kube-system -l k8s-app=kube-dns --tail=10062```6364**DNS failure patterns:**6566| Symptom | Cause | Fix |67| ------------------------------------- | -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |68| `NXDOMAIN` for `svc.cluster.local` | CoreDNS down or pod network broken | After confirming the diagnostics above, coordinate with the cluster operator to restart or redeploy CoreDNS and verify CNI |69| Internal resolves, external NXDOMAIN | Custom DNS not forwarding to `168.63.129.16` | Fix upstream forwarder |70| Intermittent SERVFAIL under load | CoreDNS CPU throttled | Remove CPU limits or add replicas |71| Private cluster - external names fail | Custom DNS missing privatelink forwarder | Add conditional forwarder to Azure DNS |72| `i/o timeout` not `NXDOMAIN` | Port 53 blocked by NetworkPolicy or NSG | Allow UDP/TCP 53 from pods to kube-dns ClusterIP |7374> ⚠️ **Warning:** The fixes in this table can change cluster state. Use them only after performing the read-only diagnostics above, and only with explicit confirmation from the cluster owner or operator.7576```bash77kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}'78```7980Custom VNet DNS must forward `.cluster.local` to the CoreDNS ClusterIP and other lookups to `168.63.129.16`.8182**Deep diagnostics with Inspektor Gadget** (when the above checks are inconclusive):8384Use the [IG base command pattern](references/inspektor-gadget.md) with `--k8s-namespace <ns> --k8s-podname <pod-name>` and `trace_dns` (timeout 30). Key signals: `rcode=3` (NXDOMAIN), `rcode=2` (SERVFAIL), high `latency` values, queries going to unexpected destinations.8586See [references/inspektor-gadget.md](references/inspektor-gadget.md).8788---8990## Detailed Networking Guides9192- [Load Balancer And Ingress Troubleshooting](load-balancer-and-ingress.md) for pending services, ingress controller state, backend routing, and TLS failures.93- [Network Policy Troubleshooting](network-policy.md) for default-deny checks, Azure NPM or Calico validation, and ingress or egress rule audits.94