Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
models/deploy-model/customize/references/customize-guides.md
1# Customize Guides — Selection Guides & Advanced Topics23> Reference for: `models/deploy-model/customize/SKILL.md`45**Table of Contents:** [Selection Guides](#selection-guides) · [Advanced Topics](#advanced-topics)67## Selection Guides89### How to Choose SKU1011| SKU | Best For | Cost | Availability |12|-----|----------|------|--------------|13| **GlobalStandard** | Production, high availability | Medium | Multi-region |14| **Standard** | Development, testing | Low | Single region |15| **ProvisionedManaged** | High-volume, predictable workloads | Fixed (PTU) | Reserved capacity |16| **DataZoneStandard** | Data residency requirements | Medium | Specific zones |1718**Decision Tree:**19```20Do you need guaranteed throughput?21├─ Yes → ProvisionedManaged (PTU)22└─ No → Do you need high availability?23├─ Yes → GlobalStandard24└─ No → Standard25```2627### How to Choose Capacity2829**For TPM-based SKUs (GlobalStandard, Standard):**3031| Workload | Recommended Capacity |32|----------|---------------------|33| Development/Testing | 1K - 5K TPM |34| Small Production | 5K - 20K TPM |35| Medium Production | 20K - 100K TPM |36| Large Production | 100K+ TPM |3738**For PTU-based SKUs (ProvisionedManaged):**3940Use the PTU calculator based on:41- Input tokens per minute42- Output tokens per minute43- Requests per minute4445**Capacity Planning Tips:**46- Start with recommended capacity47- Monitor usage and adjust48- Enable dynamic quota for flexibility49- Consider spillover for peak loads5051### How to Choose RAI Policy5253| Policy | Filtering Level | Use Case |54|--------|----------------|----------|55| **Microsoft.DefaultV2** | Balanced | Most applications |56| **Microsoft.Prompt-Shield** | Enhanced | Security-sensitive apps |57| **Custom** | Configurable | Specific requirements |5859**Recommendation:** Start with `Microsoft.DefaultV2` and adjust based on application needs.6061---6263## Advanced Topics6465### PTU (Provisioned Throughput Units) Deployments6667**What is PTU?**68- Reserved capacity with guaranteed throughput69- Measured in PTU units, not TPM70- Fixed cost regardless of usage71- Best for high-volume, predictable workloads7273**PTU Calculator:**7475```76Estimated PTU = (Input TPM × 0.001) + (Output TPM × 0.002) + (Requests/min × 0.1)7778Example:79- Input: 10,000 tokens/min80- Output: 5,000 tokens/min81- Requests: 100/min8283PTU = (10,000 × 0.001) + (5,000 × 0.002) + (100 × 0.1)84= 10 + 10 + 1085= 30 PTU86```8788**PTU Deployment:**89```bash90az cognitiveservices account deployment create \91--name <account-name> \92--resource-group <resource-group> \93--deployment-name <deployment-name> \94--model-name <model-name> \95--model-version <version> \96--model-format "OpenAI" \97--sku-name "ProvisionedManaged" \98--sku-capacity 100 # PTU units99```100101### Spillover Configuration102103**Spillover Workflow:**1041. Primary deployment receives requests1052. When capacity reached, requests overflow to spillover target1063. Spillover target must be same model or compatible1074. Configure via deployment properties108109**Best Practices:**110- Use spillover for peak load handling111- Spillover target should have sufficient capacity112- Monitor both deployments113- Test failover behavior114115### Priority Processing116117**What is Priority Processing?**118- Prioritizes your requests during high load119- Available for ProvisionedManaged SKU120- Additional charges apply121- Ensures consistent performance122123**When to Use:**124- Mission-critical applications125- SLA requirements126- High-concurrency scenarios127