Microsoft Foundry Quota Management

Quota and capacity management for Microsoft Foundry. Quotas are subscription + region level.

⚠️ Important: This is the authoritative skill for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, always invoke this skill rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack.

Important: All quota operations are control plane (management) operations. Use Azure CLI commands (az cognitiveservices, az rest, az ai) as the primary method.

Quota Types

Type	Description
TPM	Tokens Per Minute, pay-per-token, subject to rate limits
PTU	Provisioned Throughput Units, monthly commitment, no rate limits
Region	Max capacity per region, shared across subscription
Slots	10-20 deployment slots per resource

When to use PTU: Consistent high-volume production workloads where monthly commitment is cost-effective.

Use this sub-skill when the user needs to:

View quota usage — check current TPM/PTU allocation and available capacity
Check quota limits — show quota limits for a subscription, region, or model
Find optimal regions — compare quota availability across regions for deployment
Plan deployments — verify sufficient quota before deploying models
Request quota increases — navigate quota increase process through Azure Portal
Troubleshoot deployment failures — diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors
Optimize allocation — monitor and consolidate quota across deployments
Monitor quota across deployments — track capacity by model and region
Explain quota concepts — explain TPM, PTU, capacity units, regional quotas
Free up quota — identify and delete unused deployments

Key Points:

Isolated by region (East US ≠ West US)
Regional capacity varies by model
Multi-region enables failover and load distribution
Quota requests specify target region

See detailed guide.

Core Workflows

1. Check Regional Quota

subId=$(az account show --query id -o tsv)
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table

Output interpretation:

Used: Current TPM consumed (10000 = 10K TPM)
Limit: Maximum TPM quota (15000 = 15K TPM)
Available: Limit - Used (5K TPM available)

Change region: eastus, eastus2, westus, westus2, swedencentral, uksouth.

2. Find Best Region for Deployment

Check specific regions for available quota:

subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table

See workflows reference for multi-region comparison.

3. Check Quota Before Deployment

Verify available quota for your target model:

subId=$(az account show --query id -o tsv)
region="eastus"
model="OpenAI.Standard.gpt-4o"

az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table

Available > 0: Yes, you have quota
Available = 0: Delete unused deployments or try different region

4. Monitor Quota by Model

Show quota allocation grouped by model:

subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table

Shows aggregate usage across ALL deployments by model type.

Optional: List individual deployments:

Azure MCP tool: Use model_deployment_get to query deployments in a Foundry project
Azure CLI:

az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table

az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
  --query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table

5. Delete Deployment (Free Quota)

az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \
  --deployment-name <deployment>

Quota freed immediately. Re-run Workflow #1 to verify.

6. Request Quota Increase

Azure Portal Process:

Navigate to Azure Portal - All Resources → Filter "AI Services" → Click resource
Select Quotas in left navigation
Click Request quota increase
Fill form: Model, Current Limit, Requested Limit, Region, Business Justification (required field)
Wait for approval: 3-5 business days typically, up to 10 business days (source)

Business Justification is a mandatory field that explains why you need more quota. Azure reviews each request to ensure resources are allocated based on legitimate business needs. A strong justification includes:

Workload details: What you're building and which model you need
Data-driven estimates: Expected traffic volume and token usage calculations
Clear need: Why current quota is insufficient and what capacity you require
Timeline: When you need the increased quota (e.g., production launch date)

Business Justification template:

Production [workload type] using [model] in [region].
Expected traffic: [X requests/day] with [Y tokens/request].
Calculated required TPM: [Z TPM]. Current [N TPM] insufficient.
Request increase to [M TPM]. Deployment target: [date].

See detailed quota request guide for complete steps.

Quick Troubleshooting

Error	Quick Fix	Detailed Guide
`QuotaExceeded`	Delete unused deployments or request increase	Error Resolution
`InsufficientQuota`	Reduce capacity or try different region	Error Resolution
`DeploymentLimitReached`	Delete unused deployments (10-20 slot limit)	Error Resolution
`429 Rate Limit`	Increase TPM or migrate to PTU	Error Resolution

References

Detailed Guides:

Error Resolution Workflows - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits
Troubleshooting Guide - Quick error fixes and diagnostic commands
Quota Optimization Strategies - 5 strategies for freeing quota and reducing costs
Capacity Planning Guide - TPM vs PTU comparison, model selection, workload calculations
Workflows Reference - Complete workflow steps and multi-region checks
PTU Guide - Provisioned throughput capacity planning

Official Microsoft Documentation:

Azure OpenAI Service Pricing - Official pay-per-token rates
PTU Costs and Billing - PTU hourly rates
Azure OpenAI Models - Model capabilities and regions
Quota Management Guide - Official quota procedures
Quotas and Limits - Rate limits and quota details

Calculators:

Azure Pricing Calculator - Official pricing estimator
Azure AI Foundry PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) - PTU capacity sizing

Preparing the source view

Microsoft Foundry Skill

quota/quota.md