customize Examples

Example 1: Basic Deployment with Defaults

Scenario: Deploy gpt-4o accepting all defaults for quick setup. Config: gpt-4o / GlobalStandard / 10K TPM / Dynamic Quota enabled Result: Deployment gpt-4o created in ~2-3 min with auto-upgrade enabled.

Example 2: Production Deployment with Custom Capacity

Scenario: Deploy gpt-4o for production with high throughput. Config: gpt-4o / GlobalStandard / 50K TPM / Dynamic Quota / Name: gpt-4o-production Result: 50K TPM (500 req/10s). Suitable for moderate-to-high traffic production apps.

Example 3: PTU Deployment for High-Volume Workload

Scenario: Deploy gpt-4o with reserved capacity (PTU) for predictable workload. Config: gpt-4o / ProvisionedManaged / 200 PTU (min 50, max 1000) / Priority Processing enabled PTU sizing: 40K input + 20K output tokens/min → ~100 PTU estimated → 200 PTU recommended (2x headroom) Result: Guaranteed throughput, fixed monthly cost. Use case: customer service bots, document pipelines.

Example 4: Development Deployment with Standard SKU

Scenario: Deploy gpt-4o-mini for dev/testing with minimal cost. Config: gpt-4o-mini / Standard / 1K TPM / Name: gpt-4o-mini-dev Result: 1K TPM, 10 req/10s. Minimal pay-per-use cost for development and prototyping.

Example 5: Spillover Configuration

Scenario: Deploy gpt-4o with spillover to handle peak load overflow. Config: gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover → gpt-4o-backup Result: Primary handles up to 20K TPM; overflow auto-redirects to backup deployment.

Example 6: Anthropic Model Deployment (claude-sonnet-4-6)

Scenario: Deploy claude-sonnet-4-6 with customized settings. Config: claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering) Result: User selected "Healthcare" as industry → tenant country code (US) and org name fetched automatically → deployed via ARM REST API with modelProviderData in ~2 min.

Comparison Matrix

Scenario	Model	SKU	Capacity	Dynamic Quota	Priority	Spillover	Use Case
Ex 1	gpt-4o	GlobalStandard	10K TPM	✓	-	-	Quick setup
Ex 2	gpt-4o	GlobalStandard	50K TPM	✓	-	-	Production
Ex 3	gpt-4o	ProvisionedManaged	200 PTU	-	✓	-	Predictable workload
Ex 4	gpt-4o-mini	Standard	1K TPM	-	-	-	Dev/testing
Ex 5	gpt-4o	GlobalStandard	20K TPM	✓	-	✓	Peak load
Ex 6	claude-sonnet-4-6	GlobalStandard	1 (MaaS)	-	-	-	Anthropic model

Common Patterns

Dev → Staging → Production

Stage	Model	SKU	Capacity	Extras
Dev	gpt-4o-mini	Standard	1K TPM	—
Staging	gpt-4o	GlobalStandard	10K TPM	—
Production	gpt-4o	GlobalStandard	50K TPM	Dynamic Quota + Spillover

Cost Optimization

High priority: gpt-4o, ProvisionedManaged, 100 PTU, Priority Processing
Low priority: gpt-4o-mini, Standard, 5K TPM

Tips and Best Practices

Capacity: Start conservative → monitor with Azure Monitor → scale gradually → use spillover for peaks.

SKU Selection: Standard for dev → GlobalStandard + dynamic quota for variable production → ProvisionedManaged (PTU) for predictable load.

Cost: Right-size capacity; use gpt-4o-mini where possible (80-90% accuracy at lower cost); enable dynamic quota; consider PTU for consistent high-volume.

Versions: Auto-upgrade recommended; test new versions in staging first; pin only if compatibility requires it.

Content Filtering: Start with DefaultV2; use custom policies only for specific needs; monitor filtered requests.

Troubleshooting

Problem	Solution
`QuotaExceeded`	Check usage with `az cognitiveservices usage list`, reduce capacity, try different SKU, check other regions, or use the quota skill to request an increase
Version not available for SKU	Check `az cognitiveservices account list-models --query "[?name=='gpt-4o'].version"`, use latest
Deployment name exists	Skill auto-generates unique name (e.g., `gpt-4o-2`), or specify custom name

Preparing the source view

Microsoft Foundry Skill

models/deploy-model/customize/EXAMPLES.md