Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Select, configure, and scale Azure compute resources—VMs, App Service, AKS, and Container Apps
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
workflows/vm-troubleshooter/vm-troubleshooter.md
1# Azure VM Connectivity Troubleshooting23> Diagnose and resolve Azure VM connectivity failures (RDP/SSH) by identifying symptoms, routing to the right solution, fetching the latest Microsoft documentation, and guiding the user through resolution.45## Quick Reference67| Property | Details |8| ------------- | ----------------------------------------------------------------------------------------------------------------- |9| Best for | RDP/SSH connection failures, NSG/firewall misconfig, credential resets, NIC issues |10| Primary tools | Azure CLI, Azure PowerShell, Serial Console, Boot Diagnostics, Run Command |11| Reference | [references/cannot-connect-to-vm.md](references/cannot-connect-to-vm.md) |1213## MCP Tools1415| Tool | Purpose | Parameters |16| --------------- | ------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------- |17| `fetch_webpage` | Fetch latest Microsoft troubleshooting docs at runtime | `urls` (Required): Array of doc URLs from reference file; `query` (Optional): User's symptom for relevant extraction |1819## Triggers2021Activate this skill when user mentions:2223- "can't connect to my VM" / "can't RDP" / "can't SSH"24- "RDP not working" / "SSH refused" / "connection timed out"25- "black screen" on VM26- "reset VM password" / "forgot password"27- "NSG blocking" / "firewall blocking" / "port 3389"28- "serial console" access29- "internal error" on RDP30- "VM not reachable" / "public IP not working"31- "RDP disconnects" / "session dropped"3233---3435## Guardrails3637- **Default to read-only diagnostics.** Gather evidence before suggesting any fix.38- Do not run extension-backed commands (`az vm user update`, `az vm user reset-ssh`, `az vm user reset-remote-desktop`, `az vm run-command invoke`) without first passing [Pre-Flight Safety Checks](#phase-25-pre-flight-safety-checks-before-extension-backed-operations).39- Do not restart, redeploy, deallocate, or delete a VM unless the user explicitly asks for remediation.40- Do not conclude root cause without quoting the evidence that supports it (e.g., NSG rule output, VM agent status, extension state).41- When multiple issues are found (e.g., NSG + credential), fix the network-layer issue first before attempting agent-dependent fixes.4243## Evidence Order4445Gather diagnostic evidence in this order before suggesting remediation:46471. **VM state:** power state, provisioning state, VM agent health, extension states482. **Network layer:** public IP, NSG rules (NIC + subnet), effective routes, IP flow verify493. **Guest OS layer (if agent is healthy):** service status via Run Command, firewall rules, sshd/TermService config5051---5253## Workflow5455### Phase 1: Determine User Intent5657Infer the connectivity issue from the user's message. If the issue is clear, proceed to Phase 2. If ambiguous, ask **one** clarifying question:5859| Signal in User Message | Inferred Category |60| ------------------------------------------------------------------------- | ------------------ |61| "can't RDP", "RDP timeout", "RDP error", "black screen", "internal error" | Unable to RDP |62| "can't SSH", "SSH refused", "permission denied", "publickey" | Unable to SSH |63| "NSG", "firewall", "port blocked", "no public IP", "NIC disabled" | Network / Firewall |64| "credentials", "password", "wrong password", "access denied" | Credential / Auth |65| "VM agent", "Run Command not working", "Serial Console" | VM Agent / Tools |6667If unclear, ask: **"Are you trying to connect via RDP (Windows) or SSH (Linux), and what error message or behavior are you seeing?"**6869If the user shares an Azure VM name or resource ID, attempt to use the azure-resource-lookup skill if available. If not available, attempt to use the Azure CLI.7071### Phase 2: Route to Solution7273Open [references/cannot-connect-to-vm.md](references/cannot-connect-to-vm.md) and use its routing table to identify the symptom category and open the matching sub-reference for the full **Symptoms → Solutions** table and Quick Commands.7475If additional details are needed to narrow to a specific solution row, ask the user. For example:76- "What error message do you see in the RDP dialog?"77- "Does the connection time out, or do you get an error immediately?"78- "Is this a Windows or Linux VM?"7980### Phase 2.5: Pre-Flight Safety Checks (Before Extension-Backed Operations)8182> ⚠️ **Warning:** This phase is **mandatory** before running any command that depends on the VM agent or extensions. Skipping these checks can deadlock the VM and require manual portal recovery.8384**Extension-backed commands include:** `az vm user update`, `az vm user reset-ssh`, `az vm user reset-remote-desktop`, `az vm run-command invoke`, and any operation that installs or invokes a VM extension.8586Run the pre-flight checks from [references/cannot-connect-to-vm.md — Pre-Flight Safety Checks](references/cannot-connect-to-vm.md#pre-flight-safety-checks) and evaluate:8788| Check | Required Value | If Failed |89| ----- | -------------- | --------- |90| VM power state | `PowerState/running` | Start the VM first |91| VM provisioning state | `ProvisioningState/succeeded` | Do NOT run extension commands. Wait for current operation to complete, or use Serial Console / offline repair |92| VM agent status | `Ready` | Do NOT run extension commands. Use Serial Console or offline repair instead |93| Existing extensions | No extensions in `Creating`, `Updating`, or `Deleting` state | Do NOT add new extensions. Wait for completion, remove stuck extensions via Portal, or use Serial Console |9495> 💡 **Tip:** If any check returns `null`, empty, or the CLI command itself errors, treat the result as **unsafe**.9697**If any check fails:**981. **Stop.** Do NOT attempt any extension-backed remediation.992. **Inform the user** which check(s) failed and what the current state is.1003. **Suggest non-agent alternatives:** Serial Console, offline repair VM, or Portal-based actions.1014. If the state appears transient (e.g., VM just started), wait 30–60 seconds and **re-run the pre-flight checks** — do not run the extension command until all checks pass.102103### Phase 3: Fetch Documentation104105Once you've identified the specific solution row, fetch the linked Microsoft documentation URL for the latest troubleshooting guidance:106107```javascript108fetch_webpage({109urls: ["<documentation-url-from-solution-row>"],110query: "<user's specific symptom or error message>"111})112```113114This ensures the user gets current guidance even if Microsoft updates their docs.115116### Phase 4: Diagnose and Respond117118Combine the fetched documentation with the quick commands from the reference file to give the user a response:1191201. **Explain the likely cause** based on their symptom1212. **Provide the immediate diagnostic/fix commands** from the reference file's Quick Commands section1223. **Summarize the key resolution steps** from the fetched documentation1234. **If the user is logged into Azure**, offer to run diagnostic CLI commands to confirm the root cause before applying fixes1245. **Recommend next steps** — what to verify after the fix, and what to do if it doesn't work125126### Phase 5: Escalation (if needed)127128If the symptom doesn't match any solution in the reference file, or the fix doesn't resolve the issue:1291301. Check Azure Resource Health: `az vm get-instance-view --name <vm> -g <rg> --query "instanceView.statuses" -o table`1312. Offer to restart the VM (requires user approval): `az vm restart --name <vm> -g <rg>`1323. Offer to redeploy the VM (requires user approval — moves to new host): `az vm redeploy --name <vm> -g <rg>`1334. Fetch the comprehensive guide: [Troubleshoot RDP connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection) or [Troubleshoot SSH connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)134135---136137## Error Handling138139| Error | Likely Cause | Action |140| -------------------------------------- | ------------------------------- | ---------------------------------------------------------------------------------- |141| `fetch_webpage` fails or returns empty | URL may have changed | Fall back to quick commands in reference file; suggest user check the URL manually |142| CLI command fails with "not found" | VM name or resource group wrong | Ask user to verify VM name and resource group |143| Run Command times out | VM agent not responding | Route to "VM Agent Not Responding" section in reference file |144| Serial Console not available | Boot diagnostics not enabled | Run `az vm boot-diagnostics enable` first |145| Password reset fails | VMAccess extension error | Check reference file for VMAccess alternatives (offline reset, Serial Console) |146| VM stuck in "Updating" after extension op | Extension deadlocked the VM agent | Do NOT add more extensions. Remove stuck extensions via Portal, then restart. See [Pre-Flight Safety Checks](references/cannot-connect-to-vm.md#pre-flight-safety-checks) |147| `VMAgentStatusCommunicationError` | Agent not reporting status | Do NOT run extension commands. Use Serial Console or offline repair VM |148149---150151## References152153- [Cannot Connect to VM — Symptom Router](references/cannot-connect-to-vm.md)