Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
quota/quota.md
1# Microsoft Foundry Quota Management23Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level.45> ⚠️ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack.67> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** (`az cognitiveservices`, `az rest`, `az ai`) as the primary method.89## Quota Types1011| Type | Description |12|------|-------------|13| **TPM** | Tokens Per Minute, pay-per-token, subject to rate limits |14| **PTU** | Provisioned Throughput Units, monthly commitment, no rate limits |15| **Region** | Max capacity per region, shared across subscription |16| **Slots** | 10-20 deployment slots per resource |1718**When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective.1920---2122Use this sub-skill when the user needs to:2324- **View quota usage** — check current TPM/PTU allocation and available capacity25- **Check quota limits** — show quota limits for a subscription, region, or model26- **Find optimal regions** — compare quota availability across regions for deployment27- **Plan deployments** — verify sufficient quota before deploying models28- **Request quota increases** — navigate quota increase process through Azure Portal29- **Troubleshoot deployment failures** — diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors30- **Optimize allocation** — monitor and consolidate quota across deployments31- **Monitor quota across deployments** — track capacity by model and region32- **Explain quota concepts** — explain TPM, PTU, capacity units, regional quotas33- **Free up quota** — identify and delete unused deployments3435**Key Points:**361. Isolated by region (East US ≠ West US)372. Regional capacity varies by model383. Multi-region enables failover and load distribution394. Quota requests specify target region4041See [detailed guide](./references/workflows.md#regional-quota).4243---4445## Core Workflows4647### 1. Check Regional Quota4849```bash50subId=$(az account show --query id -o tsv)51az rest --method get \52--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \53--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table54```5556**Output interpretation:**57- **Used**: Current TPM consumed (10000 = 10K TPM)58- **Limit**: Maximum TPM quota (15000 = 15K TPM)59- **Available**: Limit - Used (5K TPM available)6061Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`.6263---6465### 2. Find Best Region for Deployment6667Check specific regions for available quota:6869```bash70subId=$(az account show --query id -o tsv)71region="eastus"72az rest --method get \73--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \74--query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table75```7677See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison.7879---8081### 3. Check Quota Before Deployment8283Verify available quota for your target model:8485```bash86subId=$(az account show --query id -o tsv)87region="eastus"88model="OpenAI.Standard.gpt-4o"8990az rest --method get \91--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \92--query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table93```9495- **Available > 0**: Yes, you have quota96- **Available = 0**: Delete unused deployments or try different region9798---99100### 4. Monitor Quota by Model101102Show quota allocation grouped by model:103104```bash105subId=$(az account show --query id -o tsv)106region="eastus"107az rest --method get \108--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \109--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table110```111112Shows aggregate usage across ALL deployments by model type.113114**Optional:** List individual deployments:115- **Azure MCP tool**: Use `model_deployment_get` to query deployments in a Foundry project116- **Azure CLI**:117```bash118az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table119120az cognitiveservices account deployment list --name <resource> --resource-group <rg> \121--query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table122```123124---125126### 5. Delete Deployment (Free Quota)127128```bash129az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \130--deployment-name <deployment>131```132133Quota freed **immediately**. Re-run Workflow #1 to verify.134135---136137### 6. Request Quota Increase138139**Azure Portal Process:**1401. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) → Filter "AI Services" → Click resource1412. Select **Quotas** in left navigation1423. Click **Request quota increase**1434. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification** (required field)1445. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota))145146**Business Justification** is a mandatory field that explains why you need more quota. Azure reviews each request to ensure resources are allocated based on legitimate business needs. A strong justification includes:147- **Workload details**: What you're building and which model you need148- **Data-driven estimates**: Expected traffic volume and token usage calculations149- **Clear need**: Why current quota is insufficient and what capacity you require150- **Timeline**: When you need the increased quota (e.g., production launch date)151152**Business Justification template:**153```154Production [workload type] using [model] in [region].155Expected traffic: [X requests/day] with [Y tokens/request].156Calculated required TPM: [Z TPM]. Current [N TPM] insufficient.157Request increase to [M TPM]. Deployment target: [date].158```159160See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps.161162---163164## Quick Troubleshooting165166| Error | Quick Fix | Detailed Guide |167|-------|-----------|----------------|168| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) |169| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) |170| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) |171| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) |172173---174175## References176177**Detailed Guides:**178- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits179- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands180- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs181- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations182- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks183- [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning184185**Official Microsoft Documentation:**186- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates187- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates188- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions189- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures190- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details191192**Calculators:**193- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator194- Azure AI Foundry PTU calculator (Microsoft Foundry → Operate → Quota → Provisioned Throughput Unit tab) - PTU capacity sizing195