Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
models/deploy-model/customize/EXAMPLES.md
1# customize Examples23## Example 1: Basic Deployment with Defaults45**Scenario:** Deploy gpt-4o accepting all defaults for quick setup.6**Config:** gpt-4o / GlobalStandard / 10K TPM / Dynamic Quota enabled7**Result:** Deployment `gpt-4o` created in ~2-3 min with auto-upgrade enabled.89## Example 2: Production Deployment with Custom Capacity1011**Scenario:** Deploy gpt-4o for production with high throughput.12**Config:** gpt-4o / GlobalStandard / 50K TPM / Dynamic Quota / Name: `gpt-4o-production`13**Result:** 50K TPM (500 req/10s). Suitable for moderate-to-high traffic production apps.1415## Example 3: PTU Deployment for High-Volume Workload1617**Scenario:** Deploy gpt-4o with reserved capacity (PTU) for predictable workload.18**Config:** gpt-4o / ProvisionedManaged / 200 PTU (min 50, max 1000) / Priority Processing enabled19**PTU sizing:** 40K input + 20K output tokens/min → ~100 PTU estimated → 200 PTU recommended (2x headroom)20**Result:** Guaranteed throughput, fixed monthly cost. Use case: customer service bots, document pipelines.2122## Example 4: Development Deployment with Standard SKU2324**Scenario:** Deploy gpt-4o-mini for dev/testing with minimal cost.25**Config:** gpt-4o-mini / Standard / 1K TPM / Name: `gpt-4o-mini-dev`26**Result:** 1K TPM, 10 req/10s. Minimal pay-per-use cost for development and prototyping.2728## Example 5: Spillover Configuration2930**Scenario:** Deploy gpt-4o with spillover to handle peak load overflow.31**Config:** gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover → `gpt-4o-backup`32**Result:** Primary handles up to 20K TPM; overflow auto-redirects to backup deployment.3334## Example 6: Anthropic Model Deployment (claude-sonnet-4-6)3536**Scenario:** Deploy claude-sonnet-4-6 with customized settings.37**Config:** claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering)38**Result:** User selected "Healthcare" as industry → tenant country code (US) and org name fetched automatically → deployed via ARM REST API with `modelProviderData` in ~2 min.3940---4142## Comparison Matrix4344| Scenario | Model | SKU | Capacity | Dynamic Quota | Priority | Spillover | Use Case |45|----------|-------|-----|----------|:---:|:---:|:---:|----------|46| Ex 1 | gpt-4o | GlobalStandard | 10K TPM | ✓ | - | - | Quick setup |47| Ex 2 | gpt-4o | GlobalStandard | 50K TPM | ✓ | - | - | Production |48| Ex 3 | gpt-4o | ProvisionedManaged | 200 PTU | - | ✓ | - | Predictable workload |49| Ex 4 | gpt-4o-mini | Standard | 1K TPM | - | - | - | Dev/testing |50| Ex 5 | gpt-4o | GlobalStandard | 20K TPM | ✓ | - | ✓ | Peak load |51| Ex 6 | claude-sonnet-4-6 | GlobalStandard | 1 (MaaS) | - | - | - | Anthropic model |5253## Common Patterns5455### Dev → Staging → Production5657| Stage | Model | SKU | Capacity | Extras |58|-------|-------|-----|----------|--------|59| Dev | gpt-4o-mini | Standard | 1K TPM | — |60| Staging | gpt-4o | GlobalStandard | 10K TPM | — |61| Production | gpt-4o | GlobalStandard | 50K TPM | Dynamic Quota + Spillover |6263### Cost Optimization6465- **High priority:** gpt-4o, ProvisionedManaged, 100 PTU, Priority Processing66- **Low priority:** gpt-4o-mini, Standard, 5K TPM6768---6970## Tips and Best Practices7172**Capacity:** Start conservative → monitor with Azure Monitor → scale gradually → use spillover for peaks.7374**SKU Selection:** Standard for dev → GlobalStandard + dynamic quota for variable production → ProvisionedManaged (PTU) for predictable load.7576**Cost:** Right-size capacity; use gpt-4o-mini where possible (80-90% accuracy at lower cost); enable dynamic quota; consider PTU for consistent high-volume.7778**Versions:** Auto-upgrade recommended; test new versions in staging first; pin only if compatibility requires it.7980**Content Filtering:** Start with DefaultV2; use custom policies only for specific needs; monitor filtered requests.8182---8384## Troubleshooting8586| Problem | Solution |87|---------|----------|88| `QuotaExceeded` | Check usage with `az cognitiveservices usage list`, reduce capacity, try different SKU, check other regions, or use the [quota skill](../../../quota/quota.md) to request an increase |89| Version not available for SKU | Check `az cognitiveservices account list-models --query "[?name=='gpt-4o'].version"`, use latest |90| Deployment name exists | Skill auto-generates unique name (e.g., `gpt-4o-2`), or specify custom name |91