Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
quota/quota.md
1# Microsoft Foundry Quota Management23Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level.45> ⚠️ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack.67> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** (`az cognitiveservices`, `az rest`, `az ai`) as the primary method.89## Quota Types1011| Type | Description |12|------|-------------|13| **TPM** | Tokens Per Minute, pay-per-token, subject to rate limits |14| **PTU** | Provisioned Throughput Units, monthly commitment, no rate limits |15| **Region** | Max capacity per region, shared across subscription |16| **Slots** | 10-20 deployment slots per resource |1718**When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective.1920---2122Use this sub-skill when the user needs to:2324- **View quota usage** — check current TPM/PTU allocation and available capacity25- **Check quota limits** — show quota limits for a subscription, region, or model26- **Find optimal regions** — compare quota availability across regions for deployment27- **Plan deployments** — verify sufficient quota before deploying models28- **Request quota increases** — navigate quota increase process through Azure Portal29- **Troubleshoot deployment failures** — diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors30- **Optimize allocation** — monitor and consolidate quota across deployments31- **Monitor quota across deployments** — track capacity by model and region32- **Explain quota concepts** — explain TPM, PTU, capacity units, regional quotas33- **Free up quota** — identify and delete unused deployments3435**Key Points:**361. Isolated by region (East US ≠ West US)372. Regional capacity varies by model383. Multi-region enables failover and load distribution394. Quota requests specify target region4041See [detailed guide](./references/workflows.md#regional-quota).4243---4445## Core Workflows4647### 1. Check Regional Quota4849```bash50subId=$(az account show --query id -o tsv)51az rest --method get \52--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \53--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table54```5556**Output interpretation:**57- **Used**: Current TPM consumed (10000 = 10K TPM)58- **Limit**: Maximum TPM quota (15000 = 15K TPM)59- **Available**: Limit - Used (5K TPM available)6061Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`.6263---6465### 2. Find Best Region for Deployment6667Check specific regions for available quota:6869```bash70subId=$(az account show --query id -o tsv)71region="eastus"72az rest --method get \73--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \74--query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table75```7677See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison.7879---8081### 3. Check Quota Before Deployment8283Verify available quota for your target model:8485```bash86subId=$(az account show --query id -o tsv)87region="eastus"88model="OpenAI.Standard.gpt-4o"8990az rest --method get \91--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \92--query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table93```9495- **Available > 0**: Yes, you have quota96- **Available = 0**: Delete unused deployments or try different region9798---99100### 4. Monitor Quota by Model101102Show quota allocation grouped by model:103104```bash105subId=$(az account show --query id -o tsv)106region="eastus"107az rest --method get \108--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \109--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table110```111112Shows aggregate usage across ALL deployments by model type.113114**Optional:** List individual deployments:115- **Azure MCP tool**: Use `model_deployment_get` to query deployments in a Foundry project116- **Azure CLI**:117```bash118az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table119120az cognitiveservices account deployment list --name <resource> --resource-group <rg> \121--query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table122```123124---125126### 5. Delete Deployment (Free Quota)127128```bash129az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \130--deployment-name <deployment>131```132133Quota freed **immediately**. Re-run Workflow #1 to verify.134135---136137### 6. Request Quota Increase138139**Azure Portal Process:**1401. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) → Filter "AI Services" → Click resource1412. Select **Quotas** in left navigation1423. Click **Request quota increase**1434. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification** (required field)1445. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota))145146**Business Justification** is a mandatory field that explains why you need more quota. Azure reviews each request to ensure resources are allocated based on legitimate business needs. A strong justification includes:147- **Workload details**: What you're building and which model you need148- **Data-driven estimates**: Expected traffic volume and token usage calculations149- **Clear need**: Why current quota is insufficient and what capacity you require150- **Timeline**: When you need the increased quota (e.g., production launch date)151152**Business Justification template:**153```154Production [workload type] using [model] in [region].155Expected traffic: [X requests/day] with [Y tokens/request].156Calculated required TPM: [Z TPM]. Current [N TPM] insufficient.157Request increase to [M TPM]. Deployment target: [date].158```159160See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps.161162---163164## Quick Troubleshooting165166| Error | Quick Fix | Detailed Guide |167|-------|-----------|----------------|168| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) |169| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) |170| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) |171| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) |172173---174175## References176177**Detailed Guides:**178- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits179- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands180- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs181- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations182- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks183- [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning184185**Official Microsoft Documentation:**186- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates187- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates188- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions189- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures190- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details191192**Calculators:**193- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator194- Azure AI Foundry PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) - PTU capacity sizing195