Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Get Azure VM and VM Scale Set recommendations based on workload, performance, and budget needs.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
workflows/vm-troubleshooter/vm-troubleshooter.md
1# Azure VM Connectivity Troubleshooting23> Diagnose and resolve Azure VM connectivity failures (RDP/SSH) by identifying symptoms, routing to the right solution, fetching the latest Microsoft documentation, and guiding the user through resolution.45## Quick Reference67| Property | Details |8| ------------- | ----------------------------------------------------------------------------------------------------------------- |9| Best for | RDP/SSH connection failures, NSG/firewall misconfig, credential resets, NIC issues |10| Primary tools | Azure CLI, Azure PowerShell, Serial Console, Boot Diagnostics, Run Command |11| Reference | [references/cannot-connect-to-vm.md](references/cannot-connect-to-vm.md) |1213## MCP Tools1415| Tool | Purpose | Parameters |16| --------------- | ------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------- |17| `fetch_webpage` | Fetch latest Microsoft troubleshooting docs at runtime | `urls` (Required): Array of doc URLs from reference file; `query` (Optional): User's symptom for relevant extraction |1819## Triggers2021Activate this skill when user mentions:2223- "can't connect to my VM" / "can't RDP" / "can't SSH"24- "RDP not working" / "SSH refused" / "connection timed out"25- "black screen" on VM26- "reset VM password" / "forgot password"27- "NSG blocking" / "firewall blocking" / "port 3389"28- "serial console" access29- "internal error" on RDP30- "VM not reachable" / "public IP not working"31- "RDP disconnects" / "session dropped"3233---3435## Guardrails3637- **Default to read-only diagnostics.** Gather evidence before suggesting any fix.38- Do not run extension-backed commands (`az vm user update`, `az vm user reset-ssh`, `az vm user reset-remote-desktop`, `az vm run-command invoke`) without first passing [Pre-Flight Safety Checks](#phase-25-pre-flight-safety-checks-before-extension-backed-operations).39- Do not restart, redeploy, deallocate, or delete a VM unless the user explicitly asks for remediation.40- Do not conclude root cause without quoting the evidence that supports it (e.g., NSG rule output, VM agent status, extension state).41- When multiple issues are found (e.g., NSG + credential), fix the network-layer issue first before attempting agent-dependent fixes.4243## Evidence Order4445Gather diagnostic evidence in this order before suggesting remediation:46471. **VM state:** power state, provisioning state, VM agent health, extension states482. **Network layer:** public IP, NSG rules (NIC + subnet), effective routes, IP flow verify493. **Guest OS layer (if agent is healthy):** service status via Run Command, firewall rules, sshd/TermService config5051---5253## Workflow5455### Phase 1: Determine User Intent5657Infer the connectivity issue from the user's message. If the issue is clear, proceed to Phase 2. If ambiguous, ask **one** clarifying question:5859| Signal in User Message | Inferred Category |60| ------------------------------------------------------------------------- | ------------------ |61| "can't RDP", "RDP timeout", "RDP error", "black screen", "internal error" | Unable to RDP |62| "can't SSH", "SSH refused", "permission denied", "publickey" | Unable to SSH |63| "NSG", "firewall", "port blocked", "no public IP", "NIC disabled" | Network / Firewall |64| "credentials", "password", "wrong password", "access denied" | Credential / Auth |65| "VM agent", "Run Command not working", "Serial Console" | VM Agent / Tools |6667If unclear, ask: **"Are you trying to connect via RDP (Windows) or SSH (Linux), and what error message or behavior are you seeing?"**6869If the user shares an Azure VM name or resource ID, attempt to use the azure-resource-lookup skill if available. If not available, attempt to use the Azure CLI.7071### Phase 2: Route to Solution7273Open [references/cannot-connect-to-vm.md](references/cannot-connect-to-vm.md) and use its routing table to identify the symptom category and open the matching sub-reference for the full **Symptoms โ Solutions** table and Quick Commands.7475If additional details are needed to narrow to a specific solution row, ask the user. For example:76- "What error message do you see in the RDP dialog?"77- "Does the connection time out, or do you get an error immediately?"78- "Is this a Windows or Linux VM?"7980### Phase 2.5: Pre-Flight Safety Checks (Before Extension-Backed Operations)8182> โ ๏ธ **Warning:** This phase is **mandatory** before running any command that depends on the VM agent or extensions. Skipping these checks can deadlock the VM and require manual portal recovery.8384**Extension-backed commands include:** `az vm user update`, `az vm user reset-ssh`, `az vm user reset-remote-desktop`, `az vm run-command invoke`, and any operation that installs or invokes a VM extension.8586Run the pre-flight checks from [references/cannot-connect-to-vm.md โ Pre-Flight Safety Checks](references/cannot-connect-to-vm.md#pre-flight-safety-checks) and evaluate:8788| Check | Required Value | If Failed |89| ----- | -------------- | --------- |90| VM power state | `PowerState/running` | Start the VM first |91| VM provisioning state | `ProvisioningState/succeeded` | Do NOT run extension commands. Wait for current operation to complete, or use Serial Console / offline repair |92| VM agent status | `Ready` | Do NOT run extension commands. Use Serial Console or offline repair instead |93| Existing extensions | No extensions in `Creating`, `Updating`, or `Deleting` state | Do NOT add new extensions. Wait for completion, remove stuck extensions via Portal, or use Serial Console |9495> ๐ก **Tip:** If any check returns `null`, empty, or the CLI command itself errors, treat the result as **unsafe**.9697**If any check fails:**981. **Stop.** Do NOT attempt any extension-backed remediation.992. **Inform the user** which check(s) failed and what the current state is.1003. **Suggest non-agent alternatives:** Serial Console, offline repair VM, or Portal-based actions.1014. If the state appears transient (e.g., VM just started), wait 30โ60 seconds and **re-run the pre-flight checks** โ do not run the extension command until all checks pass.102103### Phase 3: Fetch Documentation104105Once you've identified the specific solution row, fetch the linked Microsoft documentation URL for the latest troubleshooting guidance:106107```javascript108fetch_webpage({109urls: ["<documentation-url-from-solution-row>"],110query: "<user's specific symptom or error message>"111})112```113114This ensures the user gets current guidance even if Microsoft updates their docs.115116### Phase 4: Diagnose and Respond117118Combine the fetched documentation with the quick commands from the reference file to give the user a response:1191201. **Explain the likely cause** based on their symptom1212. **Provide the immediate diagnostic/fix commands** from the reference file's Quick Commands section1223. **Summarize the key resolution steps** from the fetched documentation1234. **If the user is logged into Azure**, offer to run diagnostic CLI commands to confirm the root cause before applying fixes1245. **Recommend next steps** โ what to verify after the fix, and what to do if it doesn't work125126### Phase 5: Escalation (if needed)127128If the symptom doesn't match any solution in the reference file, or the fix doesn't resolve the issue:1291301. Check Azure Resource Health: `az vm get-instance-view --name <vm> -g <rg> --query "instanceView.statuses" -o table`1312. Offer to restart the VM (requires user approval): `az vm restart --name <vm> -g <rg>`1323. Offer to redeploy the VM (requires user approval โ moves to new host): `az vm redeploy --name <vm> -g <rg>`1334. Fetch the comprehensive guide: [Troubleshoot RDP connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection) or [Troubleshoot SSH connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)134135---136137## Error Handling138139| Error | Likely Cause | Action |140| -------------------------------------- | ------------------------------- | ---------------------------------------------------------------------------------- |141| `fetch_webpage` fails or returns empty | URL may have changed | Fall back to quick commands in reference file; suggest user check the URL manually |142| CLI command fails with "not found" | VM name or resource group wrong | Ask user to verify VM name and resource group |143| Run Command times out | VM agent not responding | Route to "VM Agent Not Responding" section in reference file |144| Serial Console not available | Boot diagnostics not enabled | Run `az vm boot-diagnostics enable` first |145| Password reset fails | VMAccess extension error | Check reference file for VMAccess alternatives (offline reset, Serial Console) |146| VM stuck in "Updating" after extension op | Extension deadlocked the VM agent | Do NOT add more extensions. Remove stuck extensions via Portal, then restart. See [Pre-Flight Safety Checks](references/cannot-connect-to-vm.md#pre-flight-safety-checks) |147| `VMAgentStatusCommunicationError` | Agent not reporting status | Do NOT run extension commands. Use Serial Console or offline repair VM |148149---150151## References152153- [Cannot Connect to VM โ Symptom Router](references/cannot-connect-to-vm.md)