Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
models/deploy-model/customize/references/customize-guides.md
1# Customize Guides — Selection Guides & Advanced Topics23> Reference for: `models/deploy-model/customize/SKILL.md`45**Table of Contents:** [Selection Guides](#selection-guides) · [Advanced Topics](#advanced-topics)67## Selection Guides89### How to Choose SKU1011| SKU | Best For | Cost | Availability |12|-----|----------|------|--------------|13| **GlobalStandard** | Production, high availability | Medium | Multi-region |14| **Standard** | Development, testing | Low | Single region |15| **ProvisionedManaged** | High-volume, predictable workloads | Fixed (PTU) | Reserved capacity |16| **DataZoneStandard** | Data residency requirements | Medium | Specific zones |1718**Decision Tree:**19```20Do you need guaranteed throughput?21├─ Yes → ProvisionedManaged (PTU)22└─ No → Do you need high availability?23├─ Yes → GlobalStandard24└─ No → Standard25```2627### How to Choose Capacity2829**For TPM-based SKUs (GlobalStandard, Standard):**3031| Workload | Recommended Capacity |32|----------|---------------------|33| Development/Testing | 1K - 5K TPM |34| Small Production | 5K - 20K TPM |35| Medium Production | 20K - 100K TPM |36| Large Production | 100K+ TPM |3738**For PTU-based SKUs (ProvisionedManaged):**3940Use the PTU calculator based on:41- Input tokens per minute42- Output tokens per minute43- Requests per minute4445**Capacity Planning Tips:**46- Start with recommended capacity47- Monitor usage and adjust48- Enable dynamic quota for flexibility49- Consider spillover for peak loads5051### How to Choose RAI Policy5253| Policy | Filtering Level | Use Case |54|--------|----------------|----------|55| **Microsoft.DefaultV2** | Balanced | Most applications |56| **Microsoft.Prompt-Shield** | Enhanced | Security-sensitive apps |57| **Custom** | Configurable | Specific requirements |5859**Recommendation:** Start with `Microsoft.DefaultV2` and adjust based on application needs.6061---6263## Advanced Topics6465### PTU (Provisioned Throughput Units) Deployments6667**What is PTU?**68- Reserved capacity with guaranteed throughput69- Measured in PTU units, not TPM70- Fixed cost regardless of usage71- Best for high-volume, predictable workloads7273**PTU Calculator:**7475```76Estimated PTU = (Input TPM × 0.001) + (Output TPM × 0.002) + (Requests/min × 0.1)7778Example:79- Input: 10,000 tokens/min80- Output: 5,000 tokens/min81- Requests: 100/min8283PTU = (10,000 × 0.001) + (5,000 × 0.002) + (100 × 0.1)84= 10 + 10 + 1085= 30 PTU86```8788**PTU Deployment:**89```bash90az cognitiveservices account deployment create \91--name <account-name> \92--resource-group <resource-group> \93--deployment-name <deployment-name> \94--model-name <model-name> \95--model-version <version> \96--model-format "OpenAI" \97--sku-name "ProvisionedManaged" \98--sku-capacity 100 # PTU units99```100101### Spillover Configuration102103**Spillover Workflow:**1041. Primary deployment receives requests1052. When capacity reached, requests overflow to spillover target1063. Spillover target must be same model or compatible1074. Configure via deployment properties108109**Best Practices:**110- Use spillover for peak load handling111- Spillover target should have sufficient capacity112- Monitor both deployments113- Test failover behavior114115### Priority Processing116117**What is Priority Processing?**118- Prioritizes your requests during high load119- Available for ProvisionedManaged SKU120- Additional charges apply121- Ensures consistent performance122123**When to Use:**124- Mission-critical applications125- SLA requirements126- High-concurrency scenarios127