Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
finetuning/references/deployment.md
1# Deployment Formats23## Model Format and SKU Mapping45| Base model family | `model.format` | `sku.name` | Endpoint type |6|-------------------|---------------|------------|---------------|7| gpt-4.1-mini | `"OpenAI"` | `"Standard"` | Project |8| gpt-4.1-nano | `"OpenAI"` | `"Standard"` | Project |9| o4-mini (RFT) | `"OpenAI"` | `"Standard"` | Project |10| gpt-oss-20b | `"Microsoft"` | `"GlobalStandard"` | Cognitive Services |11| Ministral-3B | `"Mistral AI"` | `"GlobalStandard"` | Cognitive Services |12| Llama-3.3-70B | `"Meta"` | `"GlobalStandard"` | Cognitive Services |13| Qwen-3-32B | `"Alibaba"` | `"GlobalStandard"` | Cognitive Services |1415**Format strings are case-sensitive.** `"Mistral AI"` works; `"mistral"` does not.1617## Two Endpoint Types1819**Project Endpoint** (OpenAI models): `https://<resource>.services.ai.azure.com/api/projects/<project>/openai/v1/`20- Use `openai.OpenAI(base_url=..., api_key=...)` — NOT `AzureOpenAI`2122**Cognitive Services Endpoint** (OSS models): `https://<resource>.cognitiveservices.azure.com/openai/deployments/<name>/chat/completions?api-version=2025-04-01-preview`23- Use `openai.AzureOpenAI(azure_endpoint=..., api_key=..., api_version=...)`2425## CLI Deployment (`az cognitiveservices`)2627The CLI uses **different** format strings than the ARM REST API for OSS models:2829```bash30az cognitiveservices account deployment create \31--name <resource> \32--resource-group <rg> \33--deployment-name <name> \34--model-name <model> \35--model-version "1" \36--model-format "OpenAI-OSS" \37--sku-capacity 100 \38--sku-name "GlobalStandard"39```4041| Base model family | ARM REST `model.format` | CLI `--model-format` |42|-------------------|------------------------|----------------------|43| gpt-4.1-mini/nano | `"OpenAI"` | `"OpenAI"` |44| gpt-oss-20b | `"Microsoft"` | `"OpenAI-OSS"` |45| Ministral-3B | `"Mistral AI"` | `"OpenAI-OSS"` |46| Llama-3.3-70B | `"Meta"` | `"OpenAI-OSS"` |47| Qwen-3-32B | `"Alibaba"` | `"OpenAI-OSS"` |4849> ⚠️ Using `"OpenAI-OSS"` in ARM REST or `"Microsoft"` in CLI will fail with HTTP 500.5051## ARM REST API Deployment5253```54PUT https://management.azure.com/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/deployments/{deploy_name}?api-version=2024-10-0155```5657```json58{59"sku": { "name": "GlobalStandard", "capacity": 100 },60"properties": {61"model": {62"format": "Microsoft",63"name": "gpt-oss-20b.ft-{jobid}-suffix",64"version": "1"65}66}67}68```6970**ARM token:** `az account get-access-token --query accessToken -o tsv` (expires ~60min).7172## Capacity Notes7374- Capacity = tokens-per-minute in thousands. `100` = 100K TPM.75- Set capacity ≥ 100 for eval workloads. At capacity=1, OSS FT models hit "Failed to load LoRA" errors.76- Quota is per-resource. After deleting a deployment, wait 15–20s before creating a new one.77- Deployment names: max 64 chars, alphanumeric + hyphens, unique within resource.7879## Common Deployment Errors8081| Error | Cause | Fix |82|-------|-------|-----|83| HTTP 500, no message | Wrong `model.format` | Check format table above |84| HTTP 409, deployment exists | Name collision | Use unique deployment name |85| HTTP 403 | ARM token expired | Refresh token |86| HTTP 400, "api-version not allowed" | `AzureOpenAI` client on `/v1/` endpoint | Switch to `openai.OpenAI` |87| HTTP 429, quota exceeded | Too many deployments | Delete unused, wait 20s |88| ProvisioningState: Failed | Model not available in region | Try different region |89