Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
quota/references/ptu-guide.md
1# Provisioned Throughput Units (PTU) Guide23**Table of Contents:** [Understanding PTU vs Standard TPM](#understanding-ptu-vs-standard-tpm) · [When to Use PTU](#when-to-use-ptu) · [PTU Capacity Planning](#ptu-capacity-planning) · [Deploy Model with PTU](#deploy-model-with-ptu) · [Request PTU Quota Increase](#request-ptu-quota-increase) · [Understanding Region and Deployment Quotas](#understanding-region-and-deployment-quotas) · [External Resources](#external-resources)45## Understanding PTU vs Standard TPM67Microsoft Foundry offers two quota types:89### Standard TPM (Tokens Per Minute)10- Pay-as-you-go model, charged per token11- Each deployment consumes capacity units (e.g., 10K TPM, 50K TPM)12- Total regional quota shared across all deployments13- Subject to rate limiting during high demand (429 errors possible)14- Best for: Variable workloads, development, testing, bursty traffic1516### Provisioned Throughput Units (PTU)17- Monthly commitment for guaranteed throughput18- No rate limiting, consistent latency19- Measured in PTU units (not TPM)20- Best for: Predictable, high-volume production workloads21- More cost-effective when consistent token usage justifies monthly commitment2223## When to Use PTU2425| Factor | Standard (TPM) | Provisioned (PTU) |26|--------|----------------|-------------------|27| **Best For** | Variable workloads, development, testing | Predictable production workloads |28| **Pricing** | Pay-per-token | Monthly commitment (hourly rate per PTU) |29| **Rate Limits** | Yes (429 errors possible) | No (guaranteed throughput) |30| **Latency** | Variable | Consistent |31| **Cost Decision** | Lower upfront commitment | More economical for consistent, high-volume usage |32| **Flexibility** | Scale up/down instantly | Requires planning and commitment |33| **Use Case** | Prototyping, bursty traffic | Production apps, high-volume APIs |3435**Use PTU when:**36- Consistent, predictable token usage where monthly commitment is cost-effective37- Need guaranteed throughput (no 429 rate limit errors)38- Require consistent latency with performance SLA39- High-volume production workloads with stable traffic patterns4041**Decision Guidance:**42Compare your current pay-as-you-go costs with PTU pricing. PTU may be more economical when consistent usage justifies the monthly commitment.4344## PTU Capacity Planning4546### Official Calculation Methods4748> **Agent Instruction:** Only present official Azure capacity calculator methods below. Do NOT generate or suggest estimated PTU formulas, TPM-per-PTU conversion tables, or reference deprecated calculators (oai.azure.com/portal/calculator).4950Calculate PTU requirements using these official methods:5152**Method 1: Microsoft Foundry Portal**531. Navigate to Microsoft Foundry portal542. Go to **Operate** → **Quota**553. Select **Provisioned throughput unit** tab564. Click **Capacity calculator** button575. Enter workload parameters (model, tokens/call, RPM, latency target)586. Calculator returns exact PTU count needed5960**Method 2: Using Azure REST API**61```bash62# Calculate required PTU capacity63curl -X POST "https://management.azure.com/subscriptions/<subscription-id>/providers/Microsoft.CognitiveServices/calculateModelCapacity?api-version=2024-10-01" \64-H "Authorization: Bearer <access-token>" \65-H "Content-Type: application/json" \66-d '{67"model": {68"format": "OpenAI",69"name": "gpt-4o",70"version": "2024-05-13"71},72"workload": {73"requestPerMin": 100,74"tokensPerMin": 50000,75"peakRequestsPerMin": 15076}77}'78```7980## Deploy Model with PTU8182### Step 1: Calculate PTU Requirements8384Use the official capacity calculator methods above to determine required PTU capacity.8586### Step 2: Deploy with PTU8788```bash89# Deploy model with calculated PTU capacity90az cognitiveservices account deployment create \91--name <resource-name> \92--resource-group <rg> \93--deployment-name gpt-4o-ptu-deployment \94--model-name gpt-4o \95--model-version "2024-05-13" \96--model-format OpenAI \97--sku-name ProvisionedManaged \98--sku-capacity 10099100# Check PTU deployment status101az cognitiveservices account deployment show \102--name <resource-name> \103--resource-group <rg> \104--deployment-name gpt-4o-ptu-deployment105```106107**Key Differences from Standard TPM:**108- SKU name: `ProvisionedManaged` (not `Standard`)109- Capacity: Measured in PTU units (not K TPM)110- Billing: Monthly commitment regardless of usage111- No rate limiting (guaranteed throughput)112113## Request PTU Quota Increase114115PTU quota is separate from TPM quota and requires specific justification:1161171. Navigate to Azure Portal → Foundry resource → **Quotas**1182. Select **Provisioned throughput unit** tab1193. Identify model needing PTU increase (e.g., "GPT-4o PTU")1204. Click **Request quota increase**1215. Fill form:122- Model name123- Requested PTU quota124- Include capacity calculator results in business justification125- Explain workload characteristics (volume, latency requirements)1266. Submit and monitor status127128**Processing Time:** Typically 3-5 business days (longer than standard quota requests)129**Note:** PTU quota requests typically require stronger business justification due to commitment nature130131**Alternative:** Deploy to different region with available PTU quota132133## Understanding Region and Deployment Quotas134135### Region Quota136- Maximum PTU capacity available in an Azure region137- Varies by model type (GPT-4, GPT-4o, etc.)138- Shared across subscription resources in same region139- Separate from TPM quota (you have both TPM and PTU quotas)140141### Deployment Slots142- Number of concurrent model deployments allowed143- Typically 10-20 slots per resource144- Each PTU deployment uses one slot (same as TPM deployments)145- Deployment count limit is independent of capacity146147## External Resources148149- [Understanding PTU Costs](https://learn.microsoft.com/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)150- [What Is Provisioned Throughput](https://learn.microsoft.com/azure/ai-foundry/openai/concepts/provisioned-throughput)151- [Calculate Model Capacity API](https://learn.microsoft.com/rest/api/aiservices/accountmanagement/calculate-model-capacity/calculate-model-capacity?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP)152- [PTU Overview](https://learn.microsoft.com/azure/ai-services/openai/concepts/provisioned-throughput)153