Source from repo

Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

145

Skill

n/a

Size

893.9 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

models/deploy-model/customize/EXAMPLES.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown91 linesFree

models/deploy-model/customize/EXAMPLES.md

1# customize Examples
2 
3## Example 1: Basic Deployment with Defaults
4 
5**Scenario:** Deploy gpt-4o accepting all defaults for quick setup.
6**Config:** gpt-4o / GlobalStandard / 10K TPM / Dynamic Quota enabled
7**Result:** Deployment `gpt-4o` created in ~2-3 min with auto-upgrade enabled.
8 
9## Example 2: Production Deployment with Custom Capacity
10 
11**Scenario:** Deploy gpt-4o for production with high throughput.
12**Config:** gpt-4o / GlobalStandard / 50K TPM / Dynamic Quota / Name: `gpt-4o-production`
13**Result:** 50K TPM (500 req/10s). Suitable for moderate-to-high traffic production apps.
14 
15## Example 3: PTU Deployment for High-Volume Workload
16 
17**Scenario:** Deploy gpt-4o with reserved capacity (PTU) for predictable workload.
18**Config:** gpt-4o / ProvisionedManaged / 200 PTU (min 50, max 1000) / Priority Processing enabled
19**PTU sizing:** 40K input + 20K output tokens/min → ~100 PTU estimated → 200 PTU recommended (2x headroom)
20**Result:** Guaranteed throughput, fixed monthly cost. Use case: customer service bots, document pipelines.
21 
22## Example 4: Development Deployment with Standard SKU
23 
24**Scenario:** Deploy gpt-4o-mini for dev/testing with minimal cost.
25**Config:** gpt-4o-mini / Standard / 1K TPM / Name: `gpt-4o-mini-dev`
26**Result:** 1K TPM, 10 req/10s. Minimal pay-per-use cost for development and prototyping.
27 
28## Example 5: Spillover Configuration
29 
30**Scenario:** Deploy gpt-4o with spillover to handle peak load overflow.
31**Config:** gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover → `gpt-4o-backup`
32**Result:** Primary handles up to 20K TPM; overflow auto-redirects to backup deployment.
33 
34## Example 6: Anthropic Model Deployment (claude-sonnet-4-6)
35 
36**Scenario:** Deploy claude-sonnet-4-6 with customized settings.
37**Config:** claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering)
38**Result:** User selected "Healthcare" as industry → tenant country code (US) and org name fetched automatically → deployed via ARM REST API with `modelProviderData` in ~2 min.
39 
40---
41 
42## Comparison Matrix
43 
44| Scenario | Model | SKU | Capacity | Dynamic Quota | Priority | Spillover | Use Case |
45|----------|-------|-----|----------|:---:|:---:|:---:|----------|
46| Ex 1 | gpt-4o | GlobalStandard | 10K TPM | ✓ | - | - | Quick setup |
47| Ex 2 | gpt-4o | GlobalStandard | 50K TPM | ✓ | - | - | Production |
48| Ex 3 | gpt-4o | ProvisionedManaged | 200 PTU | - | ✓ | - | Predictable workload |
49| Ex 4 | gpt-4o-mini | Standard | 1K TPM | - | - | - | Dev/testing |
50| Ex 5 | gpt-4o | GlobalStandard | 20K TPM | ✓ | - | ✓ | Peak load |
51| Ex 6 | claude-sonnet-4-6 | GlobalStandard | 1 (MaaS) | - | - | - | Anthropic model |
52 
53## Common Patterns
54 
55### Dev → Staging → Production
56 
57| Stage | Model | SKU | Capacity | Extras |
58|-------|-------|-----|----------|--------|
59| Dev | gpt-4o-mini | Standard | 1K TPM | — |
60| Staging | gpt-4o | GlobalStandard | 10K TPM | — |
61| Production | gpt-4o | GlobalStandard | 50K TPM | Dynamic Quota + Spillover |
62 
63### Cost Optimization
64 
65- **High priority:** gpt-4o, ProvisionedManaged, 100 PTU, Priority Processing
66- **Low priority:** gpt-4o-mini, Standard, 5K TPM
67 
68---
69 
70## Tips and Best Practices
71 
72**Capacity:** Start conservative → monitor with Azure Monitor → scale gradually → use spillover for peaks.
73 
74**SKU Selection:** Standard for dev → GlobalStandard + dynamic quota for variable production → ProvisionedManaged (PTU) for predictable load.
75 
76**Cost:** Right-size capacity; use gpt-4o-mini where possible (80-90% accuracy at lower cost); enable dynamic quota; consider PTU for consistent high-volume.
77 
78**Versions:** Auto-upgrade recommended; test new versions in staging first; pin only if compatibility requires it.
79 
80**Content Filtering:** Start with DefaultV2; use custom policies only for specific needs; monitor filtered requests.
81 
82---
83 
84## Troubleshooting
85 
86| Problem | Solution |
87|---------|----------|
88| `QuotaExceeded` | Check usage with `az cognitiveservices usage list`, reduce capacity, try different SKU, check other regions, or use the [quota skill](../../../quota/quota.md) to request an increase |
89| Version not available for SKU | Check `az cognitiveservices account list-models --query "[?name=='gpt-4o'].version"`, use latest |
90| Deployment name exists | Skill auto-generates unique name (e.g., `gpt-4o-2`), or specify custom name |
91

Microsoft Foundry Skill

models/deploy-model/customize/EXAMPLES.md

Preparing the source view

Microsoft Foundry Skill

models/deploy-model/customize/EXAMPLES.md