Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

154

Skill

n/a

Size

976.2 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

quota/quota.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown195 linesFree

quota/quota.md

1# Microsoft Foundry Quota Management
2 
3Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level.
4 
5> ⚠️ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack.
6 
7> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** (`az cognitiveservices`, `az rest`, `az ai`) as the primary method.
8 
9## Quota Types
10 
11| Type | Description |
12|------|-------------|
13| **TPM** | Tokens Per Minute, pay-per-token, subject to rate limits |
14| **PTU** | Provisioned Throughput Units, monthly commitment, no rate limits |
15| **Region** | Max capacity per region, shared across subscription |
16| **Slots** | 10-20 deployment slots per resource |
17 
18**When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective.
19 
20---
21 
22Use this sub-skill when the user needs to:
23 
24- **View quota usage** — check current TPM/PTU allocation and available capacity
25- **Check quota limits** — show quota limits for a subscription, region, or model
26- **Find optimal regions** — compare quota availability across regions for deployment
27- **Plan deployments** — verify sufficient quota before deploying models
28- **Request quota increases** — navigate quota increase process through Azure Portal
29- **Troubleshoot deployment failures** — diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors
30- **Optimize allocation** — monitor and consolidate quota across deployments
31- **Monitor quota across deployments** — track capacity by model and region
32- **Explain quota concepts** — explain TPM, PTU, capacity units, regional quotas
33- **Free up quota** — identify and delete unused deployments
34 
35**Key Points:**
361. Isolated by region (East US ≠ West US)
372. Regional capacity varies by model
383. Multi-region enables failover and load distribution
394. Quota requests specify target region
40 
41See [detailed guide](./references/workflows.md#regional-quota).
42 
43---
44 
45## Core Workflows
46 
47### 1. Check Regional Quota
48 
49```bash
50subId=$(az account show --query id -o tsv)
51az rest --method get \
52  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
53  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
54```
55 
56**Output interpretation:**
57- **Used**: Current TPM consumed (10000 = 10K TPM)
58- **Limit**: Maximum TPM quota (15000 = 15K TPM)
59- **Available**: Limit - Used (5K TPM available)
60 
61Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`.
62 
63---
64 
65### 2. Find Best Region for Deployment
66 
67Check specific regions for available quota:
68 
69```bash
70subId=$(az account show --query id -o tsv)
71region="eastus"
72az rest --method get \
73  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
74  --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
75```
76 
77See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison.
78 
79---
80 
81### 3. Check Quota Before Deployment
82 
83Verify available quota for your target model:
84 
85```bash
86subId=$(az account show --query id -o tsv)
87region="eastus"
88model="OpenAI.Standard.gpt-4o"
89 
90az rest --method get \
91  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
92  --query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
93```
94 
95- **Available > 0**: Yes, you have quota
96- **Available = 0**: Delete unused deployments or try different region
97 
98---
99 
100### 4. Monitor Quota by Model
101 
102Show quota allocation grouped by model:
103 
104```bash
105subId=$(az account show --query id -o tsv)
106region="eastus"
107az rest --method get \
108  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
109  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
110```
111 
112Shows aggregate usage across ALL deployments by model type.
113 
114**Optional:** List individual deployments:
115- **Azure MCP tool**: Use `model_deployment_get` to query deployments in a Foundry project
116- **Azure CLI**:
117```bash
118az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table
119 
120az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
121  --query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table
122```
123 
124---
125 
126### 5. Delete Deployment (Free Quota)
127 
128```bash
129az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \
130  --deployment-name <deployment>
131```
132 
133Quota freed **immediately**. Re-run Workflow #1 to verify.
134 
135---
136 
137### 6. Request Quota Increase
138 
139**Azure Portal Process:**
1401. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) → Filter "AI Services" → Click resource
1412. Select **Quotas** in left navigation
1423. Click **Request quota increase**
1434. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification** (required field)
1445. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota))
145 
146**Business Justification** is a mandatory field that explains why you need more quota. Azure reviews each request to ensure resources are allocated based on legitimate business needs. A strong justification includes:
147- **Workload details**: What you're building and which model you need
148- **Data-driven estimates**: Expected traffic volume and token usage calculations
149- **Clear need**: Why current quota is insufficient and what capacity you require
150- **Timeline**: When you need the increased quota (e.g., production launch date)
151 
152**Business Justification template:**
153```
154Production [workload type] using [model] in [region].
155Expected traffic: [X requests/day] with [Y tokens/request].
156Calculated required TPM: [Z TPM]. Current [N TPM] insufficient.
157Request increase to [M TPM]. Deployment target: [date].
158```
159 
160See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps.
161 
162---
163 
164## Quick Troubleshooting
165 
166| Error | Quick Fix | Detailed Guide |
167|-------|-----------|----------------|
168| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) |
169| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) |
170| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) |
171| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) |
172 
173---
174 
175## References
176 
177**Detailed Guides:**
178- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits
179- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands
180- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs
181- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations
182- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks
183- [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning
184 
185**Official Microsoft Documentation:**
186- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates
187- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates
188- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions
189- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures
190- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details
191 
192**Calculators:**
193- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator
194- Azure AI Foundry PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) - PTU capacity sizing
195

Marketplace

Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

154

Skill

n/a

Size

976.2 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

quota/quota.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown195 linesFree

quota/quota.md

1# Microsoft Foundry Quota Management
2 
3Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level.
4 
5> ⚠️ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack.
6 
7> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** (`az cognitiveservices`, `az rest`, `az ai`) as the primary method.
8 
9## Quota Types
10 
11| Type | Description |
12|------|-------------|
13| **TPM** | Tokens Per Minute, pay-per-token, subject to rate limits |
14| **PTU** | Provisioned Throughput Units, monthly commitment, no rate limits |
15| **Region** | Max capacity per region, shared across subscription |
16| **Slots** | 10-20 deployment slots per resource |
17 
18**When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective.
19 
20---
21 
22Use this sub-skill when the user needs to:
23 
24- **View quota usage** — check current TPM/PTU allocation and available capacity
25- **Check quota limits** — show quota limits for a subscription, region, or model
26- **Find optimal regions** — compare quota availability across regions for deployment
27- **Plan deployments** — verify sufficient quota before deploying models
28- **Request quota increases** — navigate quota increase process through Azure Portal
29- **Troubleshoot deployment failures** — diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors
30- **Optimize allocation** — monitor and consolidate quota across deployments
31- **Monitor quota across deployments** — track capacity by model and region
32- **Explain quota concepts** — explain TPM, PTU, capacity units, regional quotas
33- **Free up quota** — identify and delete unused deployments
34 
35**Key Points:**
361. Isolated by region (East US ≠ West US)
372. Regional capacity varies by model
383. Multi-region enables failover and load distribution
394. Quota requests specify target region
40 
41See [detailed guide](./references/workflows.md#regional-quota).
42 
43---
44 
45## Core Workflows
46 
47### 1. Check Regional Quota
48 
49```bash
50subId=$(az account show --query id -o tsv)
51az rest --method get \
52  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
53  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
54```
55 
56**Output interpretation:**
57- **Used**: Current TPM consumed (10000 = 10K TPM)
58- **Limit**: Maximum TPM quota (15000 = 15K TPM)
59- **Available**: Limit - Used (5K TPM available)
60 
61Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`.
62 
63---
64 
65### 2. Find Best Region for Deployment
66 
67Check specific regions for available quota:
68 
69```bash
70subId=$(az account show --query id -o tsv)
71region="eastus"
72az rest --method get \
73  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
74  --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
75```
76 
77See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison.
78 
79---
80 
81### 3. Check Quota Before Deployment
82 
83Verify available quota for your target model:
84 
85```bash
86subId=$(az account show --query id -o tsv)
87region="eastus"
88model="OpenAI.Standard.gpt-4o"
89 
90az rest --method get \
91  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
92  --query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
93```
94 
95- **Available > 0**: Yes, you have quota
96- **Available = 0**: Delete unused deployments or try different region
97 
98---
99 
100### 4. Monitor Quota by Model
101 
102Show quota allocation grouped by model:
103 
104```bash
105subId=$(az account show --query id -o tsv)
106region="eastus"
107az rest --method get \
108  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
109  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
110```
111 
112Shows aggregate usage across ALL deployments by model type.
113 
114**Optional:** List individual deployments:
115- **Azure MCP tool**: Use `model_deployment_get` to query deployments in a Foundry project
116- **Azure CLI**:
117```bash
118az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table
119 
120az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
121  --query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table
122```
123 
124---
125 
126### 5. Delete Deployment (Free Quota)
127 
128```bash
129az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \
130  --deployment-name <deployment>
131```
132 
133Quota freed **immediately**. Re-run Workflow #1 to verify.
134 
135---
136 
137### 6. Request Quota Increase
138 
139**Azure Portal Process:**
1401. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) → Filter "AI Services" → Click resource
1412. Select **Quotas** in left navigation
1423. Click **Request quota increase**
1434. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification** (required field)
1445. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota))
145 
146**Business Justification** is a mandatory field that explains why you need more quota. Azure reviews each request to ensure resources are allocated based on legitimate business needs. A strong justification includes:
147- **Workload details**: What you're building and which model you need
148- **Data-driven estimates**: Expected traffic volume and token usage calculations
149- **Clear need**: Why current quota is insufficient and what capacity you require
150- **Timeline**: When you need the increased quota (e.g., production launch date)
151 
152**Business Justification template:**
153```
154Production [workload type] using [model] in [region].
155Expected traffic: [X requests/day] with [Y tokens/request].
156Calculated required TPM: [Z TPM]. Current [N TPM] insufficient.
157Request increase to [M TPM]. Deployment target: [date].
158```
159 
160See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps.
161 
162---
163 
164## Quick Troubleshooting
165 
166| Error | Quick Fix | Detailed Guide |
167|-------|-----------|----------------|
168| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) |
169| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) |
170| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) |
171| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) |
172 
173---
174 
175## References
176 
177**Detailed Guides:**
178- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits
179- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands
180- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs
181- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations
182- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks
183- [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning
184 
185**Official Microsoft Documentation:**
186- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates
187- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates
188- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions
189- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures
190- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details
191 
192**Calculators:**
193- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator
194- Azure AI Foundry PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) - PTU capacity sizing
195

Microsoft Foundry Skill

quota/quota.md

Preparing the source view

Microsoft Foundry Skill

quota/quota.md