VMSS Guide

Determine when to recommend a Virtual Machine Scale Set (VMSS) over a single VM, and which VMSS configuration to suggest.

Note: This reference provides quick guidance but may become stale. Always verify VMSS features, limitations, and orchestration mode capabilities by fetching the latest documentation from:
https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview
https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-autoscale-overview
https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/orchestration-modes-api-comparison

What Is a VM Scale Set?

A VMSS creates and manages a group of load-balanced, identically configured VM instances. Key capabilities:

Autoscale — automatically add/remove instances based on metrics or schedules
High availability — spread instances across fault domains and Availability Zones
Load balancing — integrate with Azure Load Balancer (L4) or Application Gateway (L7)
Large scale — up to 1,000 instances per scale set (marketplace images)
No extra cost — you pay only for the underlying VM instances, storage, and networking

When to Recommend VMSS vs Single VM

Scenario	Recommend	Reasoning
Stateless web/API behind a load balancer	VMSS	Homogeneous fleet, autoscale on demand
Batch or parallel compute jobs	VMSS	Scale out for jobs, scale to zero when idle
Autoscale needed (CPU, queue depth, schedule)	VMSS	Built-in autoscale rules
Microservices with identical replicas	VMSS	Consistent config, rolling updates
High availability across zones (many instances)	VMSS	Automatic zone distribution
Single long-lived server (jumpbox, domain controller)	VM	No scaling benefit; simpler config
Unique per-instance configuration	VM	Scale sets assume identical instances
Quick proof of concept or dev/test	VM	Faster to stand up, lower complexity

Orchestration Modes

VMSS supports two orchestration modes. Flexible is recommended for all new workloads.

Feature	Flexible (recommended)	Uniform (legacy)
Mix VM sizes in one set	✅ Yes	❌ No
Add existing VMs to set	✅ Yes	❌ No
Availability Zone spread	✅ Automatic	✅ Automatic
Fault domain control	✅ Yes	✅ Yes
Max instances	1,000	1,000
Spot instances	✅ Yes	✅ Yes
Single-instance VMSS	✅ Yes	❌ No
VM model updates	Automatic, Manual, Rolling	Automatic, Manual, Rolling

Warning: Orchestration mode cannot be changed after creation. Always recommend Flexible unless the user has a specific Uniform requirement.

Autoscale Patterns

Pattern	Trigger	Example
Metric-based	CPU, memory, queue length, custom metric	Scale out when avg CPU > 70% for 5 min
Schedule-based	Time of day, day of week	Scale to 10 instances Mon–Fri 8 AM; scale down to 2 at night
Combined	Metric + schedule together	Baseline schedule with metric burst capacity
Predictive	ML-forecasted demand (preview)	Pre-scale before expected traffic spike

Autoscale Best Practices

Set a minimum instance count ≥ 2 for production HA
Use a cool-down period (default 5 min) to avoid flapping
Scale out aggressively, scale in conservatively (asymmetric rules)
Monitor with Azure Monitor autoscale diagnostics

Networking

Component	When to Use
Azure Load Balancer	Layer-4 (TCP/UDP) traffic distribution; most common for backend services
Application Gateway	Layer-7 (HTTP/HTTPS) with TLS termination, URL routing, WAF
No load balancer	Batch/HPC jobs where instances pull work from a queue

Cost Estimation Tips

VMSS itself is free — cost is the sum of per-instance VM pricing
Estimate at min and max instance counts for autoscale budgets
Use Spot instances in VMSS for up to 90% savings on interruptible workloads
Combine with Reservations or Savings Plans on the baseline instance count

Key VMSS Limits

Limit	Value
Max instances per scale set	1,000 (marketplace/gallery images)
Max instances (managed image)	600
Scale sets per subscription per region	2,500
Scale operations concurrency	Up to 1,000 VMs in a single batch

Preparing the source view

Azure Compute Skill

references/vmss-guide.md