Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
quota/references/ptu-guide.md
1# Provisioned Throughput Units (PTU) Guide23**Table of Contents:** [Understanding PTU vs Standard TPM](#understanding-ptu-vs-standard-tpm) · [When to Use PTU](#when-to-use-ptu) · [PTU Capacity Planning](#ptu-capacity-planning) · [Deploy Model with PTU](#deploy-model-with-ptu) · [Request PTU Quota Increase](#request-ptu-quota-increase) · [Understanding Region and Deployment Quotas](#understanding-region-and-deployment-quotas) · [External Resources](#external-resources)45## Understanding PTU vs Standard TPM67Microsoft Foundry offers two quota types:89### Standard TPM (Tokens Per Minute)10- Pay-as-you-go model, charged per token11- Each deployment consumes capacity units (e.g., 10K TPM, 50K TPM)12- Total regional quota shared across all deployments13- Subject to rate limiting during high demand (429 errors possible)14- Best for: Variable workloads, development, testing, bursty traffic1516### Provisioned Throughput Units (PTU)17- Monthly commitment for guaranteed throughput18- No rate limiting, consistent latency19- Measured in PTU units (not TPM)20- Best for: Predictable, high-volume production workloads21- More cost-effective when consistent token usage justifies monthly commitment2223## When to Use PTU2425| Factor | Standard (TPM) | Provisioned (PTU) |26|--------|----------------|-------------------|27| **Best For** | Variable workloads, development, testing | Predictable production workloads |28| **Pricing** | Pay-per-token | Monthly commitment (hourly rate per PTU) |29| **Rate Limits** | Yes (429 errors possible) | No (guaranteed throughput) |30| **Latency** | Variable | Consistent |31| **Cost Decision** | Lower upfront commitment | More economical for consistent, high-volume usage |32| **Flexibility** | Scale up/down instantly | Requires planning and commitment |33| **Use Case** | Prototyping, bursty traffic | Production apps, high-volume APIs |3435**Use PTU when:**36- Consistent, predictable token usage where monthly commitment is cost-effective37- Need guaranteed throughput (no 429 rate limit errors)38- Require consistent latency with performance SLA39- High-volume production workloads with stable traffic patterns4041**Decision Guidance:**42Compare your current pay-as-you-go costs with PTU pricing. PTU may be more economical when consistent usage justifies the monthly commitment.4344## PTU Capacity Planning4546### Official Calculation Methods4748> **Agent Instruction:** Only present official Azure capacity calculator methods below. Do NOT generate or suggest estimated PTU formulas, TPM-per-PTU conversion tables, or reference deprecated calculators (oai.azure.com/portal/calculator).4950Calculate PTU requirements using these official methods:5152**Method 1: Microsoft Foundry Portal**531. Navigate to Microsoft Foundry portal542. Go to **Operate** → **Quota**553. Select **Provisioned throughput unit** tab564. Click **Capacity calculator** button575. Enter workload parameters (model, tokens/call, RPM, latency target)586. Calculator returns exact PTU count needed5960**Method 2: Using Azure REST API**61```bash62# Calculate required PTU capacity63curl -X POST "https://management.azure.com/subscriptions/<subscription-id>/providers/Microsoft.CognitiveServices/calculateModelCapacity?api-version=2024-10-01" \64-H "Authorization: Bearer <access-token>" \65-H "Content-Type: application/json" \66-d '{67"model": {68"format": "OpenAI",69"name": "gpt-4o",70"version": "2024-05-13"71},72"workload": {73"requestPerMin": 100,74"tokensPerMin": 50000,75"peakRequestsPerMin": 15076}77}'78```7980## Deploy Model with PTU8182### Step 1: Calculate PTU Requirements8384Use the official capacity calculator methods above to determine required PTU capacity.8586### Step 2: Deploy with PTU8788```bash89# Deploy model with calculated PTU capacity90az cognitiveservices account deployment create \91--name <resource-name> \92--resource-group <rg> \93--deployment-name gpt-4o-ptu-deployment \94--model-name gpt-4o \95--model-version "2024-05-13" \96--model-format OpenAI \97--sku-name ProvisionedManaged \98--sku-capacity 10099100# Check PTU deployment status101az cognitiveservices account deployment show \102--name <resource-name> \103--resource-group <rg> \104--deployment-name gpt-4o-ptu-deployment105```106107**Key Differences from Standard TPM:**108- SKU name: `ProvisionedManaged` (not `Standard`)109- Capacity: Measured in PTU units (not K TPM)110- Billing: Monthly commitment regardless of usage111- No rate limiting (guaranteed throughput)112113## Request PTU Quota Increase114115PTU quota is separate from TPM quota and requires specific justification:1161171. Navigate to Azure Portal → Foundry resource → **Quotas**1182. Select **Provisioned throughput unit** tab1193. Identify model needing PTU increase (e.g., "GPT-4o PTU")1204. Click **Request quota increase**1215. Fill form:122- Model name123- Requested PTU quota124- Include capacity calculator results in business justification125- Explain workload characteristics (volume, latency requirements)1266. Submit and monitor status127128**Processing Time:** Typically 3-5 business days (longer than standard quota requests)129**Note:** PTU quota requests typically require stronger business justification due to commitment nature130131**Alternative:** Deploy to different region with available PTU quota132133## Understanding Region and Deployment Quotas134135### Region Quota136- Maximum PTU capacity available in an Azure region137- Varies by model type (GPT-4, GPT-4o, etc.)138- Shared across subscription resources in same region139- Separate from TPM quota (you have both TPM and PTU quotas)140141### Deployment Slots142- Number of concurrent model deployments allowed143- Typically 10-20 slots per resource144- Each PTU deployment uses one slot (same as TPM deployments)145- Deployment count limit is independent of capacity146147## External Resources148149- [Understanding PTU Costs](https://learn.microsoft.com/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)150- [What Is Provisioned Throughput](https://learn.microsoft.com/azure/ai-foundry/openai/concepts/provisioned-throughput)151- [Calculate Model Capacity API](https://learn.microsoft.com/rest/api/aiservices/accountmanagement/calculate-model-capacity/calculate-model-capacity?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP)152- [PTU Overview](https://learn.microsoft.com/azure/ai-services/openai/concepts/provisioned-throughput)153