Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
models/deploy-model/customize/EXAMPLES.md
1# customize Examples23## Example 1: Basic Deployment with Defaults45**Scenario:** Deploy gpt-4o accepting all defaults for quick setup.6**Config:** gpt-4o / GlobalStandard / 10K TPM / Dynamic Quota enabled7**Result:** Deployment `gpt-4o` created in ~2-3 min with auto-upgrade enabled.89## Example 2: Production Deployment with Custom Capacity1011**Scenario:** Deploy gpt-4o for production with high throughput.12**Config:** gpt-4o / GlobalStandard / 50K TPM / Dynamic Quota / Name: `gpt-4o-production`13**Result:** 50K TPM (500 req/10s). Suitable for moderate-to-high traffic production apps.1415## Example 3: PTU Deployment for High-Volume Workload1617**Scenario:** Deploy gpt-4o with reserved capacity (PTU) for predictable workload.18**Config:** gpt-4o / ProvisionedManaged / 200 PTU (min 50, max 1000) / Priority Processing enabled19**PTU sizing:** 40K input + 20K output tokens/min → ~100 PTU estimated → 200 PTU recommended (2x headroom)20**Result:** Guaranteed throughput, fixed monthly cost. Use case: customer service bots, document pipelines.2122## Example 4: Development Deployment with Standard SKU2324**Scenario:** Deploy gpt-4o-mini for dev/testing with minimal cost.25**Config:** gpt-4o-mini / Standard / 1K TPM / Name: `gpt-4o-mini-dev`26**Result:** 1K TPM, 10 req/10s. Minimal pay-per-use cost for development and prototyping.2728## Example 5: Spillover Configuration2930**Scenario:** Deploy gpt-4o with spillover to handle peak load overflow.31**Config:** gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover → `gpt-4o-backup`32**Result:** Primary handles up to 20K TPM; overflow auto-redirects to backup deployment.3334## Example 6: Anthropic Model Deployment (claude-sonnet-4-6)3536**Scenario:** Deploy claude-sonnet-4-6 with customized settings.37**Config:** claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering)38**Result:** User selected "Healthcare" as industry → tenant country code (US) and org name fetched automatically → deployed via ARM REST API with `modelProviderData` in ~2 min.3940---4142## Comparison Matrix4344| Scenario | Model | SKU | Capacity | Dynamic Quota | Priority | Spillover | Use Case |45|----------|-------|-----|----------|:---:|:---:|:---:|----------|46| Ex 1 | gpt-4o | GlobalStandard | 10K TPM | ✓ | - | - | Quick setup |47| Ex 2 | gpt-4o | GlobalStandard | 50K TPM | ✓ | - | - | Production |48| Ex 3 | gpt-4o | ProvisionedManaged | 200 PTU | - | ✓ | - | Predictable workload |49| Ex 4 | gpt-4o-mini | Standard | 1K TPM | - | - | - | Dev/testing |50| Ex 5 | gpt-4o | GlobalStandard | 20K TPM | ✓ | - | ✓ | Peak load |51| Ex 6 | claude-sonnet-4-6 | GlobalStandard | 1 (MaaS) | - | - | - | Anthropic model |5253## Common Patterns5455### Dev → Staging → Production5657| Stage | Model | SKU | Capacity | Extras |58|-------|-------|-----|----------|--------|59| Dev | gpt-4o-mini | Standard | 1K TPM | — |60| Staging | gpt-4o | GlobalStandard | 10K TPM | — |61| Production | gpt-4o | GlobalStandard | 50K TPM | Dynamic Quota + Spillover |6263### Cost Optimization6465- **High priority:** gpt-4o, ProvisionedManaged, 100 PTU, Priority Processing66- **Low priority:** gpt-4o-mini, Standard, 5K TPM6768---6970## Tips and Best Practices7172**Capacity:** Start conservative → monitor with Azure Monitor → scale gradually → use spillover for peaks.7374**SKU Selection:** Standard for dev → GlobalStandard + dynamic quota for variable production → ProvisionedManaged (PTU) for predictable load.7576**Cost:** Right-size capacity; use gpt-4o-mini where possible (80-90% accuracy at lower cost); enable dynamic quota; consider PTU for consistent high-volume.7778**Versions:** Auto-upgrade recommended; test new versions in staging first; pin only if compatibility requires it.7980**Content Filtering:** Start with DefaultV2; use custom policies only for specific needs; monitor filtered requests.8182---8384## Troubleshooting8586| Problem | Solution |87|---------|----------|88| `QuotaExceeded` | Check usage with `az cognitiveservices usage list`, reduce capacity, try different SKU, check other regions, or use the [quota skill](../../../quota/quota.md) to request an increase |89| Version not available for SKU | Check `az cognitiveservices account list-models --query "[?name=='gpt-4o'].version"`, use latest |90| Deployment name exists | Skill auto-generates unique name (e.g., `gpt-4o-2`), or specify custom name |91