Source from repo
Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page
Files
Skill
n/a
Size
546.7 KB
Entrypoint
SKILL.md
Format
git-repo
Open file
quota/references/ptu-guide.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown153 linesFree
quota/references/ptu-guide.md
1# Provisioned Throughput Units (PTU) Guide
2 
3**Table of Contents:** [Understanding PTU vs Standard TPM](#understanding-ptu-vs-standard-tpm) · [When to Use PTU](#when-to-use-ptu) · [PTU Capacity Planning](#ptu-capacity-planning) · [Deploy Model with PTU](#deploy-model-with-ptu) · [Request PTU Quota Increase](#request-ptu-quota-increase) · [Understanding Region and Deployment Quotas](#understanding-region-and-deployment-quotas) · [External Resources](#external-resources)
4 
5## Understanding PTU vs Standard TPM
6 
7Microsoft Foundry offers two quota types:
8 
9### Standard TPM (Tokens Per Minute)
10- Pay-as-you-go model, charged per token
11- Each deployment consumes capacity units (e.g., 10K TPM, 50K TPM)
12- Total regional quota shared across all deployments
13- Subject to rate limiting during high demand (429 errors possible)
14- Best for: Variable workloads, development, testing, bursty traffic
15 
16### Provisioned Throughput Units (PTU)
17- Monthly commitment for guaranteed throughput
18- No rate limiting, consistent latency
19- Measured in PTU units (not TPM)
20- Best for: Predictable, high-volume production workloads
21- More cost-effective when consistent token usage justifies monthly commitment
22 
23## When to Use PTU
24 
25| Factor | Standard (TPM) | Provisioned (PTU) |
26|--------|----------------|-------------------|
27| **Best For** | Variable workloads, development, testing | Predictable production workloads |
28| **Pricing** | Pay-per-token | Monthly commitment (hourly rate per PTU) |
29| **Rate Limits** | Yes (429 errors possible) | No (guaranteed throughput) |
30| **Latency** | Variable | Consistent |
31| **Cost Decision** | Lower upfront commitment | More economical for consistent, high-volume usage |
32| **Flexibility** | Scale up/down instantly | Requires planning and commitment |
33| **Use Case** | Prototyping, bursty traffic | Production apps, high-volume APIs |
34 
35**Use PTU when:**
36- Consistent, predictable token usage where monthly commitment is cost-effective
37- Need guaranteed throughput (no 429 rate limit errors)
38- Require consistent latency with performance SLA
39- High-volume production workloads with stable traffic patterns
40 
41**Decision Guidance:**
42Compare your current pay-as-you-go costs with PTU pricing. PTU may be more economical when consistent usage justifies the monthly commitment.
43 
44## PTU Capacity Planning
45 
46### Official Calculation Methods
47 
48> **Agent Instruction:** Only present official Azure capacity calculator methods below. Do NOT generate or suggest estimated PTU formulas, TPM-per-PTU conversion tables, or reference deprecated calculators (oai.azure.com/portal/calculator).
49 
50Calculate PTU requirements using these official methods:
51 
52**Method 1: Microsoft Foundry Portal**
531. Navigate to Microsoft Foundry portal
542. Go to **Operate** → **Quota**
553. Select **Provisioned throughput unit** tab
564. Click **Capacity calculator** button
575. Enter workload parameters (model, tokens/call, RPM, latency target)
586. Calculator returns exact PTU count needed
59 
60**Method 2: Using Azure REST API**
61```bash
62# Calculate required PTU capacity
63curl -X POST "https://management.azure.com/subscriptions/<subscription-id>/providers/Microsoft.CognitiveServices/calculateModelCapacity?api-version=2024-10-01" \
64  -H "Authorization: Bearer <access-token>" \
65  -H "Content-Type: application/json" \
66  -d '{
67    "model": {
68      "format": "OpenAI",
69      "name": "gpt-4o",
70      "version": "2024-05-13"
71    },
72    "workload": {
73      "requestPerMin": 100,
74      "tokensPerMin": 50000,
75      "peakRequestsPerMin": 150
76    }
77  }'
78```
79 
80## Deploy Model with PTU
81 
82### Step 1: Calculate PTU Requirements
83 
84Use the official capacity calculator methods above to determine required PTU capacity.
85 
86### Step 2: Deploy with PTU
87 
88```bash
89# Deploy model with calculated PTU capacity
90az cognitiveservices account deployment create \
91  --name <resource-name> \
92  --resource-group <rg> \
93  --deployment-name gpt-4o-ptu-deployment \
94  --model-name gpt-4o \
95  --model-version "2024-05-13" \
96  --model-format OpenAI \
97  --sku-name ProvisionedManaged \
98  --sku-capacity 100
99 
100# Check PTU deployment status
101az cognitiveservices account deployment show \
102  --name <resource-name> \
103  --resource-group <rg> \
104  --deployment-name gpt-4o-ptu-deployment
105```
106 
107**Key Differences from Standard TPM:**
108- SKU name: `ProvisionedManaged` (not `Standard`)
109- Capacity: Measured in PTU units (not K TPM)
110- Billing: Monthly commitment regardless of usage
111- No rate limiting (guaranteed throughput)
112 
113## Request PTU Quota Increase
114 
115PTU quota is separate from TPM quota and requires specific justification:
116 
1171. Navigate to Azure Portal → Foundry resource → **Quotas**
1182. Select **Provisioned throughput unit** tab
1193. Identify model needing PTU increase (e.g., "GPT-4o PTU")
1204. Click **Request quota increase**
1215. Fill form:
122   - Model name
123   - Requested PTU quota
124   - Include capacity calculator results in business justification
125   - Explain workload characteristics (volume, latency requirements)
1266. Submit and monitor status
127 
128**Processing Time:** Typically 3-5 business days (longer than standard quota requests)
129**Note:** PTU quota requests typically require stronger business justification due to commitment nature
130 
131**Alternative:** Deploy to different region with available PTU quota
132 
133## Understanding Region and Deployment Quotas
134 
135### Region Quota
136- Maximum PTU capacity available in an Azure region
137- Varies by model type (GPT-4, GPT-4o, etc.)
138- Shared across subscription resources in same region
139- Separate from TPM quota (you have both TPM and PTU quotas)
140 
141### Deployment Slots
142- Number of concurrent model deployments allowed
143- Typically 10-20 slots per resource
144- Each PTU deployment uses one slot (same as TPM deployments)
145- Deployment count limit is independent of capacity
146 
147## External Resources
148 
149- [Understanding PTU Costs](https://learn.microsoft.com/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)
150- [What Is Provisioned Throughput](https://learn.microsoft.com/azure/ai-foundry/openai/concepts/provisioned-throughput)
151- [Calculate Model Capacity API](https://learn.microsoft.com/rest/api/aiservices/accountmanagement/calculate-model-capacity/calculate-model-capacity?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP)
152- [PTU Overview](https://learn.microsoft.com/azure/ai-services/openai/concepts/provisioned-throughput)
153
Preparing the source view

Microsoft Foundry Skill

quota/references/ptu-guide.md