Source from repo

Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

152

Skill

n/a

Size

941.0 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

models/deploy-model/customize/references/customize-guides.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown127 linesFree

models/deploy-model/customize/references/customize-guides.md

1# Customize Guides — Selection Guides & Advanced Topics
2 
3> Reference for: `models/deploy-model/customize/SKILL.md`
4 
5**Table of Contents:** [Selection Guides](#selection-guides) · [Advanced Topics](#advanced-topics)
6 
7## Selection Guides
8 
9### How to Choose SKU
10 
11| SKU | Best For | Cost | Availability |
12|-----|----------|------|--------------|
13| **GlobalStandard** | Production, high availability | Medium | Multi-region |
14| **Standard** | Development, testing | Low | Single region |
15| **ProvisionedManaged** | High-volume, predictable workloads | Fixed (PTU) | Reserved capacity |
16| **DataZoneStandard** | Data residency requirements | Medium | Specific zones |
17 
18**Decision Tree:**
19```
20Do you need guaranteed throughput?
21├─ Yes → ProvisionedManaged (PTU)
22└─ No → Do you need high availability?
23        ├─ Yes → GlobalStandard
24        └─ No → Standard
25```
26 
27### How to Choose Capacity
28 
29**For TPM-based SKUs (GlobalStandard, Standard):**
30 
31| Workload | Recommended Capacity |
32|----------|---------------------|
33| Development/Testing | 1K - 5K TPM |
34| Small Production | 5K - 20K TPM |
35| Medium Production | 20K - 100K TPM |
36| Large Production | 100K+ TPM |
37 
38**For PTU-based SKUs (ProvisionedManaged):**
39 
40Use the PTU calculator based on:
41- Input tokens per minute
42- Output tokens per minute
43- Requests per minute
44 
45**Capacity Planning Tips:**
46- Start with recommended capacity
47- Monitor usage and adjust
48- Enable dynamic quota for flexibility
49- Consider spillover for peak loads
50 
51### How to Choose RAI Policy
52 
53| Policy | Filtering Level | Use Case |
54|--------|----------------|----------|
55| **Microsoft.DefaultV2** | Balanced | Most applications |
56| **Microsoft.Prompt-Shield** | Enhanced | Security-sensitive apps |
57| **Custom** | Configurable | Specific requirements |
58 
59**Recommendation:** Start with `Microsoft.DefaultV2` and adjust based on application needs.
60 
61---
62 
63## Advanced Topics
64 
65### PTU (Provisioned Throughput Units) Deployments
66 
67**What is PTU?**
68- Reserved capacity with guaranteed throughput
69- Measured in PTU units, not TPM
70- Fixed cost regardless of usage
71- Best for high-volume, predictable workloads
72 
73**PTU Calculator:**
74 
75```
76Estimated PTU = (Input TPM × 0.001) + (Output TPM × 0.002) + (Requests/min × 0.1)
77 
78Example:
79- Input: 10,000 tokens/min
80- Output: 5,000 tokens/min
81- Requests: 100/min
82 
83PTU = (10,000 × 0.001) + (5,000 × 0.002) + (100 × 0.1)
84    = 10 + 10 + 10
85    = 30 PTU
86```
87 
88**PTU Deployment:**
89```bash
90az cognitiveservices account deployment create \
91  --name <account-name> \
92  --resource-group <resource-group> \
93  --deployment-name <deployment-name> \
94  --model-name <model-name> \
95  --model-version <version> \
96  --model-format "OpenAI" \
97  --sku-name "ProvisionedManaged" \
98  --sku-capacity 100  # PTU units
99```
100 
101### Spillover Configuration
102 
103**Spillover Workflow:**
1041. Primary deployment receives requests
1052. When capacity reached, requests overflow to spillover target
1063. Spillover target must be same model or compatible
1074. Configure via deployment properties
108 
109**Best Practices:**
110- Use spillover for peak load handling
111- Spillover target should have sufficient capacity
112- Monitor both deployments
113- Test failover behavior
114 
115### Priority Processing
116 
117**What is Priority Processing?**
118- Prioritizes your requests during high load
119- Available for ProvisionedManaged SKU
120- Additional charges apply
121- Ensures consistent performance
122 
123**When to Use:**
124- Mission-critical applications
125- SLA requirements
126- High-concurrency scenarios
127

Microsoft Foundry Skill

models/deploy-model/customize/references/customize-guides.md

Preparing the source view

Microsoft Foundry Skill

models/deploy-model/customize/references/customize-guides.md