Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

151

Skill

n/a

Size

940.9 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

models/deploy-model/customize/references/customize-guides.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown127 linesFree

models/deploy-model/customize/references/customize-guides.md

1# Customize Guides — Selection Guides & Advanced Topics
2 
3> Reference for: `models/deploy-model/customize/SKILL.md`
4 
5**Table of Contents:** [Selection Guides](#selection-guides) · [Advanced Topics](#advanced-topics)
6 
7## Selection Guides
8 
9### How to Choose SKU
10 
11| SKU | Best For | Cost | Availability |
12|-----|----------|------|--------------|
13| **GlobalStandard** | Production, high availability | Medium | Multi-region |
14| **Standard** | Development, testing | Low | Single region |
15| **ProvisionedManaged** | High-volume, predictable workloads | Fixed (PTU) | Reserved capacity |
16| **DataZoneStandard** | Data residency requirements | Medium | Specific zones |
17 
18**Decision Tree:**
19```
20Do you need guaranteed throughput?
21├─ Yes → ProvisionedManaged (PTU)
22└─ No → Do you need high availability?
23        ├─ Yes → GlobalStandard
24        └─ No → Standard
25```
26 
27### How to Choose Capacity
28 
29**For TPM-based SKUs (GlobalStandard, Standard):**
30 
31| Workload | Recommended Capacity |
32|----------|---------------------|
33| Development/Testing | 1K - 5K TPM |
34| Small Production | 5K - 20K TPM |
35| Medium Production | 20K - 100K TPM |
36| Large Production | 100K+ TPM |
37 
38**For PTU-based SKUs (ProvisionedManaged):**
39 
40Use the PTU calculator based on:
41- Input tokens per minute
42- Output tokens per minute
43- Requests per minute
44 
45**Capacity Planning Tips:**
46- Start with recommended capacity
47- Monitor usage and adjust
48- Enable dynamic quota for flexibility
49- Consider spillover for peak loads
50 
51### How to Choose RAI Policy
52 
53| Policy | Filtering Level | Use Case |
54|--------|----------------|----------|
55| **Microsoft.DefaultV2** | Balanced | Most applications |
56| **Microsoft.Prompt-Shield** | Enhanced | Security-sensitive apps |
57| **Custom** | Configurable | Specific requirements |
58 
59**Recommendation:** Start with `Microsoft.DefaultV2` and adjust based on application needs.
60 
61---
62 
63## Advanced Topics
64 
65### PTU (Provisioned Throughput Units) Deployments
66 
67**What is PTU?**
68- Reserved capacity with guaranteed throughput
69- Measured in PTU units, not TPM
70- Fixed cost regardless of usage
71- Best for high-volume, predictable workloads
72 
73**PTU Calculator:**
74 
75```
76Estimated PTU = (Input TPM × 0.001) + (Output TPM × 0.002) + (Requests/min × 0.1)
77 
78Example:
79- Input: 10,000 tokens/min
80- Output: 5,000 tokens/min
81- Requests: 100/min
82 
83PTU = (10,000 × 0.001) + (5,000 × 0.002) + (100 × 0.1)
84    = 10 + 10 + 10
85    = 30 PTU
86```
87 
88**PTU Deployment:**
89```bash
90az cognitiveservices account deployment create \
91  --name <account-name> \
92  --resource-group <resource-group> \
93  --deployment-name <deployment-name> \
94  --model-name <model-name> \
95  --model-version <version> \
96  --model-format "OpenAI" \
97  --sku-name "ProvisionedManaged" \
98  --sku-capacity 100  # PTU units
99```
100 
101### Spillover Configuration
102 
103**Spillover Workflow:**
1041. Primary deployment receives requests
1052. When capacity reached, requests overflow to spillover target
1063. Spillover target must be same model or compatible
1074. Configure via deployment properties
108 
109**Best Practices:**
110- Use spillover for peak load handling
111- Spillover target should have sufficient capacity
112- Monitor both deployments
113- Test failover behavior
114 
115### Priority Processing
116 
117**What is Priority Processing?**
118- Prioritizes your requests during high load
119- Available for ProvisionedManaged SKU
120- Additional charges apply
121- Ensures consistent performance
122 
123**When to Use:**
124- Mission-critical applications
125- SLA requirements
126- High-concurrency scenarios
127

Preparing the source view

Microsoft Foundry Skill

models/deploy-model/customize/references/customize-guides.md