Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Debug and troubleshoot Azure Container Apps and Function Apps using logs, KQL, and health checks.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
troubleshooting/aks/networking.md
1# Networking Troubleshooting23For CNI-specific issues, check CNI pod health and review [AKS networking concepts](https://learn.microsoft.com/azure/aks/concepts-network).45## Service Unreachable / Connection Refused67**Diagnostics - always start here:**89```bash10# 1. Verify service exists and has endpoints (read-only)11kubectl get svc <service-name> -n <ns>12kubectl get endpoints <service-name> -n <ns>1314# 2. Optional connectivity test from inside the namespace15# This creates a temporary pod. Prefer read-only checks first.16# Only use it after the user explicitly approves a mutating test.17kubectl run netdebug --image=curlimages/curl -it --rm -n <ns> -- \18curl -sv http://<service>.<ns>.svc.cluster.local:<port>/healthz19```2021**Decision tree:**2223| Observation | Cause | Fix |24| --------------------------------------- | ---------------------------------- | ----------------------------------------------- |25| Endpoints shows `<none>` | Label selector mismatch | Align selector with pod labels; check for typos |26| Endpoints has IPs but unreachable | Port mismatch or app not listening | Confirm `targetPort` = actual container port |27| Works from some pods, fails from others | Network policy blocking | See Network Policy section |28| Works inside cluster, fails externally | Load balancer issue | See Load Balancer section |29| `ECONNREFUSED` immediately | App not listening on that port | Check listening ports in the pod |3031Pods that are running but not Ready are removed from Endpoints. Check `kubectl get pod <pod> -n <ns>`.3233**Deep diagnostics with Inspektor Gadget** (when the above checks are inconclusive):3435Use the [IG base command pattern](references/inspektor-gadget.md) with `--k8s-namespace <ns> --k8s-podname <pod-name>` and these gadgets:3637- `snapshot_socket` (timeout 5) — check what ports the pod is listening on38- `trace_tcp` (timeout 30) — trace connect/accept/close events39- `trace_tcpretrans` (timeout 30) — packet retransmissions4041See [references/inspektor-gadget.md](references/inspektor-gadget.md).4243---4445## DNS Resolution Failures4647**Diagnostics:**4849The live DNS test creates a temporary pod. Prefer `get`, `describe`, `logs`, or `exec` into an existing pod first. Only use it after the user explicitly approves creating the test pod.5051```bash52# Confirm CoreDNS is running and healthy (read-only)53kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide54kubectl top pod -n kube-system -l k8s-app=kube-dns5556# Optional live DNS test from the same namespace as the failing pod57kubectl run dnstest --image=busybox:1.28 -it --rm -n <ns> -- \58nslookup <service-name>.<ns>.svc.cluster.local5960# CoreDNS logs - errors show here first61kubectl logs -n kube-system -l k8s-app=kube-dns --tail=10062```6364**DNS failure patterns:**6566| Symptom | Cause | Fix |67| ------------------------------------- | -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |68| `NXDOMAIN` for `svc.cluster.local` | CoreDNS down or pod network broken | After confirming the diagnostics above, coordinate with the cluster operator to restart or redeploy CoreDNS and verify CNI |69| Internal resolves, external NXDOMAIN | Custom DNS not forwarding to `168.63.129.16` | Fix upstream forwarder |70| Intermittent SERVFAIL under load | CoreDNS CPU throttled | Remove CPU limits or add replicas |71| Private cluster - external names fail | Custom DNS missing privatelink forwarder | Add conditional forwarder to Azure DNS |72| `i/o timeout` not `NXDOMAIN` | Port 53 blocked by NetworkPolicy or NSG | Allow UDP/TCP 53 from pods to kube-dns ClusterIP |7374> ⚠️ **Warning:** The fixes in this table can change cluster state. Use them only after performing the read-only diagnostics above, and only with explicit confirmation from the cluster owner or operator.7576```bash77kubectl get svc kube-dns -n kube-system -o jsonpath='{.spec.clusterIP}'78```7980Custom VNet DNS must forward `.cluster.local` to the CoreDNS ClusterIP and other lookups to `168.63.129.16`.8182**Deep diagnostics with Inspektor Gadget** (when the above checks are inconclusive):8384Use the [IG base command pattern](references/inspektor-gadget.md) with `--k8s-namespace <ns> --k8s-podname <pod-name>` and `trace_dns` (timeout 30). Key signals: `rcode=3` (NXDOMAIN), `rcode=2` (SERVFAIL), high `latency` values, queries going to unexpected destinations.8586See [references/inspektor-gadget.md](references/inspektor-gadget.md).8788---8990## Detailed Networking Guides9192- [Load Balancer And Ingress Troubleshooting](load-balancer-and-ingress.md) for pending services, ingress controller state, backend routing, and TLS failures.93- [Network Policy Troubleshooting](network-policy.md) for default-deny checks, Azure NPM or Calico validation, and ingress or egress rule audits.94