Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

151

Skill

n/a

Size

940.9 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

models/deploy-model/customize/EXAMPLES.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown91 linesFree

models/deploy-model/customize/EXAMPLES.md

1# customize Examples
2 
3## Example 1: Basic Deployment with Defaults
4 
5**Scenario:** Deploy gpt-4o accepting all defaults for quick setup.
6**Config:** gpt-4o / GlobalStandard / 10K TPM / Dynamic Quota enabled
7**Result:** Deployment `gpt-4o` created in ~2-3 min with auto-upgrade enabled.
8 
9## Example 2: Production Deployment with Custom Capacity
10 
11**Scenario:** Deploy gpt-4o for production with high throughput.
12**Config:** gpt-4o / GlobalStandard / 50K TPM / Dynamic Quota / Name: `gpt-4o-production`
13**Result:** 50K TPM (500 req/10s). Suitable for moderate-to-high traffic production apps.
14 
15## Example 3: PTU Deployment for High-Volume Workload
16 
17**Scenario:** Deploy gpt-4o with reserved capacity (PTU) for predictable workload.
18**Config:** gpt-4o / ProvisionedManaged / 200 PTU (min 50, max 1000) / Priority Processing enabled
19**PTU sizing:** 40K input + 20K output tokens/min → ~100 PTU estimated → 200 PTU recommended (2x headroom)
20**Result:** Guaranteed throughput, fixed monthly cost. Use case: customer service bots, document pipelines.
21 
22## Example 4: Development Deployment with Standard SKU
23 
24**Scenario:** Deploy gpt-4o-mini for dev/testing with minimal cost.
25**Config:** gpt-4o-mini / Standard / 1K TPM / Name: `gpt-4o-mini-dev`
26**Result:** 1K TPM, 10 req/10s. Minimal pay-per-use cost for development and prototyping.
27 
28## Example 5: Spillover Configuration
29 
30**Scenario:** Deploy gpt-4o with spillover to handle peak load overflow.
31**Config:** gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover → `gpt-4o-backup`
32**Result:** Primary handles up to 20K TPM; overflow auto-redirects to backup deployment.
33 
34## Example 6: Anthropic Model Deployment (claude-sonnet-4-6)
35 
36**Scenario:** Deploy claude-sonnet-4-6 with customized settings.
37**Config:** claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering)
38**Result:** User selected "Healthcare" as industry → tenant country code (US) and org name fetched automatically → deployed via ARM REST API with `modelProviderData` in ~2 min.
39 
40---
41 
42## Comparison Matrix
43 
44| Scenario | Model | SKU | Capacity | Dynamic Quota | Priority | Spillover | Use Case |
45|----------|-------|-----|----------|:---:|:---:|:---:|----------|
46| Ex 1 | gpt-4o | GlobalStandard | 10K TPM | ✓ | - | - | Quick setup |
47| Ex 2 | gpt-4o | GlobalStandard | 50K TPM | ✓ | - | - | Production |
48| Ex 3 | gpt-4o | ProvisionedManaged | 200 PTU | - | ✓ | - | Predictable workload |
49| Ex 4 | gpt-4o-mini | Standard | 1K TPM | - | - | - | Dev/testing |
50| Ex 5 | gpt-4o | GlobalStandard | 20K TPM | ✓ | - | ✓ | Peak load |
51| Ex 6 | claude-sonnet-4-6 | GlobalStandard | 1 (MaaS) | - | - | - | Anthropic model |
52 
53## Common Patterns
54 
55### Dev → Staging → Production
56 
57| Stage | Model | SKU | Capacity | Extras |
58|-------|-------|-----|----------|--------|
59| Dev | gpt-4o-mini | Standard | 1K TPM | — |
60| Staging | gpt-4o | GlobalStandard | 10K TPM | — |
61| Production | gpt-4o | GlobalStandard | 50K TPM | Dynamic Quota + Spillover |
62 
63### Cost Optimization
64 
65- **High priority:** gpt-4o, ProvisionedManaged, 100 PTU, Priority Processing
66- **Low priority:** gpt-4o-mini, Standard, 5K TPM
67 
68---
69 
70## Tips and Best Practices
71 
72**Capacity:** Start conservative → monitor with Azure Monitor → scale gradually → use spillover for peaks.
73 
74**SKU Selection:** Standard for dev → GlobalStandard + dynamic quota for variable production → ProvisionedManaged (PTU) for predictable load.
75 
76**Cost:** Right-size capacity; use gpt-4o-mini where possible (80-90% accuracy at lower cost); enable dynamic quota; consider PTU for consistent high-volume.
77 
78**Versions:** Auto-upgrade recommended; test new versions in staging first; pin only if compatibility requires it.
79 
80**Content Filtering:** Start with DefaultV2; use custom policies only for specific needs; monitor filtered requests.
81 
82---
83 
84## Troubleshooting
85 
86| Problem | Solution |
87|---------|----------|
88| `QuotaExceeded` | Check usage with `az cognitiveservices usage list`, reduce capacity, try different SKU, check other regions, or use the [quota skill](../../../quota/quota.md) to request an increase |
89| Version not available for SKU | Check `az cognitiveservices account list-models --query "[?name=='gpt-4o'].version"`, use latest |
90| Deployment name exists | Skill auto-generates unique name (e.g., `gpt-4o-2`), or specify custom name |
91

Preparing the source view

Microsoft Foundry Skill

models/deploy-model/customize/EXAMPLES.md