Networking Troubleshooting

For CNI-specific issues, check CNI pod health and review AKS networking concepts.

Service Unreachable / Connection Refused

Diagnostics - always start here:

# 1. Verify service exists and has endpoints (read-only)
kubectl get svc <service-name> -n <ns>
kubectl get endpoints <service-name> -n <ns>

# 2. Optional connectivity test from inside the namespace
# This creates a temporary pod. Prefer read-only checks first.
# Only use it after the user explicitly approves a mutating test.
kubectl run netdebug --image=curlimages/curl -it --rm -n <ns> -- \
  curl -sv http://<service>.<ns>.svc.cluster.local:<port>/healthz

Decision tree:

Observation	Cause	Fix
Endpoints shows `<none>`	Label selector mismatch	Align selector with pod labels; check for typos
Endpoints has IPs but unreachable	Port mismatch or app not listening	Confirm `targetPort` = actual container port
Works from some pods, fails from others	Network policy blocking	See Network Policy section
Works inside cluster, fails externally	Load balancer issue	See Load Balancer section
`ECONNREFUSED` immediately	App not listening on that port	Check listening ports in the pod

Pods that are running but not Ready are removed from Endpoints. Check kubectl get pod <pod> -n <ns>.

Deep diagnostics with Inspektor Gadget (when the above checks are inconclusive):

Use the IG base command pattern with --k8s-namespace <ns> --k8s-podname <pod-name> and these gadgets:

snapshot_socket (timeout 5) — check what ports the pod is listening on
trace_tcp (timeout 30) — trace connect/accept/close events
trace_tcpretrans (timeout 30) — packet retransmissions

See references/inspektor-gadget.md.

DNS Resolution Failures

Diagnostics:

The live DNS test creates a temporary pod. Prefer get, describe, logs, or exec into an existing pod first. Only use it after the user explicitly approves creating the test pod.

# Confirm CoreDNS is running and healthy (read-only)
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
kubectl top pod -n kube-system -l k8s-app=kube-dns

# Optional live DNS test from the same namespace as the failing pod
kubectl run dnstest --image=busybox:1.28 -it --rm -n <ns> -- \
  nslookup <service-name>.<ns>.svc.cluster.local

# CoreDNS logs - errors show here first
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100

DNS failure patterns:

Symptom	Cause	Fix
`NXDOMAIN` for `svc.cluster.local`	CoreDNS down or pod network broken	After confirming the diagnostics above, coordinate with the cluster operator to restart or redeploy CoreDNS and verify CNI
Internal resolves, external NXDOMAIN	Custom DNS not forwarding to `168.63.129.16`	Fix upstream forwarder
Intermittent SERVFAIL under load	CoreDNS CPU throttled	Remove CPU limits or add replicas
Private cluster - external names fail	Custom DNS missing privatelink forwarder	Add conditional forwarder to Azure DNS
`i/o timeout` not `NXDOMAIN`	Port 53 blocked by NetworkPolicy or NSG	Allow UDP/TCP 53 from pods to kube-dns ClusterIP

⚠️ Warning: The fixes in this table can change cluster state. Use them only after performing the read-only diagnostics above, and only with explicit confirmation from the cluster owner or operator.

kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}'

Custom VNet DNS must forward .cluster.local to the CoreDNS ClusterIP and other lookups to 168.63.129.16.

Deep diagnostics with Inspektor Gadget (when the above checks are inconclusive):

Use the IG base command pattern with --k8s-namespace <ns> --k8s-podname <pod-name> and trace_dns (timeout 30). Key signals: rcode=3 (NXDOMAIN), rcode=2 (SERVFAIL), high latency values, queries going to unexpected destinations.

See references/inspektor-gadget.md.

Detailed Networking Guides

Load Balancer And Ingress Troubleshooting for pending services, ingress controller state, backend routing, and TLS failures.
Network Policy Troubleshooting for default-deny checks, Azure NPM or Calico validation, and ingress or egress rule audits.

Preparing the source view

Azure Diagnostics

troubleshooting/aks/networking.md

Networking Troubleshooting

Service Unreachable / Connection Refused

DNS Resolution Failures

Detailed Networking Guides