Source from repo
Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page
Files
152
Skill
n/a
Size
941.0 KB
Entrypoint
SKILL.md
Format
git-repo
Open file
quota/quota.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown195 linesFree
quota/quota.md
1# Microsoft Foundry Quota Management
2 
3Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level.
4 
5> ⚠️ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack.
6 
7> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** (`az cognitiveservices`, `az rest`, `az ai`) as the primary method.
8 
9## Quota Types
10 
11| Type | Description |
12|------|-------------|
13| **TPM** | Tokens Per Minute, pay-per-token, subject to rate limits |
14| **PTU** | Provisioned Throughput Units, monthly commitment, no rate limits |
15| **Region** | Max capacity per region, shared across subscription |
16| **Slots** | 10-20 deployment slots per resource |
17 
18**When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective.
19 
20---
21 
22Use this sub-skill when the user needs to:
23 
24- **View quota usage** — check current TPM/PTU allocation and available capacity
25- **Check quota limits** — show quota limits for a subscription, region, or model
26- **Find optimal regions** — compare quota availability across regions for deployment
27- **Plan deployments** — verify sufficient quota before deploying models
28- **Request quota increases** — navigate quota increase process through Azure Portal
29- **Troubleshoot deployment failures** — diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors
30- **Optimize allocation** — monitor and consolidate quota across deployments
31- **Monitor quota across deployments** — track capacity by model and region
32- **Explain quota concepts** — explain TPM, PTU, capacity units, regional quotas
33- **Free up quota** — identify and delete unused deployments
34 
35**Key Points:**
361. Isolated by region (East US ≠ West US)
372. Regional capacity varies by model
383. Multi-region enables failover and load distribution
394. Quota requests specify target region
40 
41See [detailed guide](./references/workflows.md#regional-quota).
42 
43---
44 
45## Core Workflows
46 
47### 1. Check Regional Quota
48 
49```bash
50subId=$(az account show --query id -o tsv)
51az rest --method get \
52  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
53  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
54```
55 
56**Output interpretation:**
57- **Used**: Current TPM consumed (10000 = 10K TPM)
58- **Limit**: Maximum TPM quota (15000 = 15K TPM)
59- **Available**: Limit - Used (5K TPM available)
60 
61Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`.
62 
63---
64 
65### 2. Find Best Region for Deployment
66 
67Check specific regions for available quota:
68 
69```bash
70subId=$(az account show --query id -o tsv)
71region="eastus"
72az rest --method get \
73  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
74  --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
75```
76 
77See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison.
78 
79---
80 
81### 3. Check Quota Before Deployment
82 
83Verify available quota for your target model:
84 
85```bash
86subId=$(az account show --query id -o tsv)
87region="eastus"
88model="OpenAI.Standard.gpt-4o"
89 
90az rest --method get \
91  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
92  --query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
93```
94 
95- **Available > 0**: Yes, you have quota
96- **Available = 0**: Delete unused deployments or try different region
97 
98---
99 
100### 4. Monitor Quota by Model
101 
102Show quota allocation grouped by model:
103 
104```bash
105subId=$(az account show --query id -o tsv)
106region="eastus"
107az rest --method get \
108  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
109  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
110```
111 
112Shows aggregate usage across ALL deployments by model type.
113 
114**Optional:** List individual deployments:
115- **Azure MCP tool**: Use `model_deployment_get` to query deployments in a Foundry project
116- **Azure CLI**:
117```bash
118az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table
119 
120az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
121  --query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table
122```
123 
124---
125 
126### 5. Delete Deployment (Free Quota)
127 
128```bash
129az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \
130  --deployment-name <deployment>
131```
132 
133Quota freed **immediately**. Re-run Workflow #1 to verify.
134 
135---
136 
137### 6. Request Quota Increase
138 
139**Azure Portal Process:**
1401. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) → Filter "AI Services" → Click resource
1412. Select **Quotas** in left navigation
1423. Click **Request quota increase**
1434. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification** (required field)
1445. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota))
145 
146**Business Justification** is a mandatory field that explains why you need more quota. Azure reviews each request to ensure resources are allocated based on legitimate business needs. A strong justification includes:
147- **Workload details**: What you're building and which model you need
148- **Data-driven estimates**: Expected traffic volume and token usage calculations
149- **Clear need**: Why current quota is insufficient and what capacity you require
150- **Timeline**: When you need the increased quota (e.g., production launch date)
151 
152**Business Justification template:**
153```
154Production [workload type] using [model] in [region].
155Expected traffic: [X requests/day] with [Y tokens/request].
156Calculated required TPM: [Z TPM]. Current [N TPM] insufficient.
157Request increase to [M TPM]. Deployment target: [date].
158```
159 
160See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps.
161 
162---
163 
164## Quick Troubleshooting
165 
166| Error | Quick Fix | Detailed Guide |
167|-------|-----------|----------------|
168| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) |
169| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) |
170| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) |
171| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) |
172 
173---
174 
175## References
176 
177**Detailed Guides:**
178- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits
179- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands
180- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs
181- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations
182- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks
183- [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning
184 
185**Official Microsoft Documentation:**
186- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates
187- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates
188- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions
189- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures
190- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details
191 
192**Calculators:**
193- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator
194- Azure AI Foundry PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) - PTU capacity sizing
195
Preparing the source view

Microsoft Foundry Skill

quota/quota.md