Source from repo
Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page
Files
Skill
n/a
Size
560.1 KB
Entrypoint
SKILL.md
Format
git-repo
Open file
quota/references/ptu-guide.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown153 linesFree
quota/references/ptu-guide.md
1# Provisioned Throughput Units (PTU) Guide
2 
3**Table of Contents:** [Understanding PTU vs Standard TPM](#understanding-ptu-vs-standard-tpm) · [When to Use PTU](#when-to-use-ptu) · [PTU Capacity Planning](#ptu-capacity-planning) · [Deploy Model with PTU](#deploy-model-with-ptu) · [Request PTU Quota Increase](#request-ptu-quota-increase) · [Understanding Region and Deployment Quotas](#understanding-region-and-deployment-quotas) · [External Resources](#external-resources)
4 
5## Understanding PTU vs Standard TPM
6 
7Microsoft Foundry offers two quota types:
8 
9### Standard TPM (Tokens Per Minute)
10- Pay-as-you-go model, charged per token
11- Each deployment consumes capacity units (e.g., 10K TPM, 50K TPM)
12- Total regional quota shared across all deployments
13- Subject to rate limiting during high demand (429 errors possible)
14- Best for: Variable workloads, development, testing, bursty traffic
15 
16### Provisioned Throughput Units (PTU)
17- Monthly commitment for guaranteed throughput
18- No rate limiting, consistent latency
19- Measured in PTU units (not TPM)
20- Best for: Predictable, high-volume production workloads
21- More cost-effective when consistent token usage justifies monthly commitment
22 
23## When to Use PTU
24 
25| Factor | Standard (TPM) | Provisioned (PTU) |
26|--------|----------------|-------------------|
27| **Best For** | Variable workloads, development, testing | Predictable production workloads |
28| **Pricing** | Pay-per-token | Monthly commitment (hourly rate per PTU) |
29| **Rate Limits** | Yes (429 errors possible) | No (guaranteed throughput) |
30| **Latency** | Variable | Consistent |
31| **Cost Decision** | Lower upfront commitment | More economical for consistent, high-volume usage |
32| **Flexibility** | Scale up/down instantly | Requires planning and commitment |
33| **Use Case** | Prototyping, bursty traffic | Production apps, high-volume APIs |
34 
35**Use PTU when:**
36- Consistent, predictable token usage where monthly commitment is cost-effective
37- Need guaranteed throughput (no 429 rate limit errors)
38- Require consistent latency with performance SLA
39- High-volume production workloads with stable traffic patterns
40 
41**Decision Guidance:**
42Compare your current pay-as-you-go costs with PTU pricing. PTU may be more economical when consistent usage justifies the monthly commitment.
43 
44## PTU Capacity Planning
45 
46### Official Calculation Methods
47 
48> **Agent Instruction:** Only present official Azure capacity calculator methods below. Do NOT generate or suggest estimated PTU formulas, TPM-per-PTU conversion tables, or reference deprecated calculators (oai.azure.com/portal/calculator).
49 
50Calculate PTU requirements using these official methods:
51 
52**Method 1: Microsoft Foundry Portal**
531. Navigate to Microsoft Foundry portal
542. Go to **Operate** → **Quota**
553. Select **Provisioned throughput unit** tab
564. Click **Capacity calculator** button
575. Enter workload parameters (model, tokens/call, RPM, latency target)
586. Calculator returns exact PTU count needed
59 
60**Method 2: Using Azure REST API**
61```bash
62# Calculate required PTU capacity
63curl -X POST "https://management.azure.com/subscriptions/<subscription-id>/providers/Microsoft.CognitiveServices/calculateModelCapacity?api-version=2024-10-01" \
64  -H "Authorization: Bearer <access-token>" \
65  -H "Content-Type: application/json" \
66  -d '{
67    "model": {
68      "format": "OpenAI",
69      "name": "gpt-4o",
70      "version": "2024-05-13"
71    },
72    "workload": {
73      "requestPerMin": 100,
74      "tokensPerMin": 50000,
75      "peakRequestsPerMin": 150
76    }
77  }'
78```
79 
80## Deploy Model with PTU
81 
82### Step 1: Calculate PTU Requirements
83 
84Use the official capacity calculator methods above to determine required PTU capacity.
85 
86### Step 2: Deploy with PTU
87 
88```bash
89# Deploy model with calculated PTU capacity
90az cognitiveservices account deployment create \
91  --name <resource-name> \
92  --resource-group <rg> \
93  --deployment-name gpt-4o-ptu-deployment \
94  --model-name gpt-4o \
95  --model-version "2024-05-13" \
96  --model-format OpenAI \
97  --sku-name ProvisionedManaged \
98  --sku-capacity 100
99 
100# Check PTU deployment status
101az cognitiveservices account deployment show \
102  --name <resource-name> \
103  --resource-group <rg> \
104  --deployment-name gpt-4o-ptu-deployment
105```
106 
107**Key Differences from Standard TPM:**
108- SKU name: `ProvisionedManaged` (not `Standard`)
109- Capacity: Measured in PTU units (not K TPM)
110- Billing: Monthly commitment regardless of usage
111- No rate limiting (guaranteed throughput)
112 
113## Request PTU Quota Increase
114 
115PTU quota is separate from TPM quota and requires specific justification:
116 
1171. Navigate to Azure Portal → Foundry resource → **Quotas**
1182. Select **Provisioned throughput unit** tab
1193. Identify model needing PTU increase (e.g., "GPT-4o PTU")
1204. Click **Request quota increase**
1215. Fill form:
122   - Model name
123   - Requested PTU quota
124   - Include capacity calculator results in business justification
125   - Explain workload characteristics (volume, latency requirements)
1266. Submit and monitor status
127 
128**Processing Time:** Typically 3-5 business days (longer than standard quota requests)
129**Note:** PTU quota requests typically require stronger business justification due to commitment nature
130 
131**Alternative:** Deploy to different region with available PTU quota
132 
133## Understanding Region and Deployment Quotas
134 
135### Region Quota
136- Maximum PTU capacity available in an Azure region
137- Varies by model type (GPT-4, GPT-4o, etc.)
138- Shared across subscription resources in same region
139- Separate from TPM quota (you have both TPM and PTU quotas)
140 
141### Deployment Slots
142- Number of concurrent model deployments allowed
143- Typically 10-20 slots per resource
144- Each PTU deployment uses one slot (same as TPM deployments)
145- Deployment count limit is independent of capacity
146 
147## External Resources
148 
149- [Understanding PTU Costs](https://learn.microsoft.com/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)
150- [What Is Provisioned Throughput](https://learn.microsoft.com/azure/ai-foundry/openai/concepts/provisioned-throughput)
151- [Calculate Model Capacity API](https://learn.microsoft.com/rest/api/aiservices/accountmanagement/calculate-model-capacity/calculate-model-capacity?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP)
152- [PTU Overview](https://learn.microsoft.com/azure/ai-services/openai/concepts/provisioned-throughput)
153
Preparing the source view

Microsoft Foundry Skill

quota/references/ptu-guide.md