Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
finetuning/references/deployment.md
1# Deployment Formats23## Model Format and SKU Mapping45| Base model family | `model.format` | `sku.name` | Endpoint type |6|-------------------|---------------|------------|---------------|7| gpt-4.1-mini | `"OpenAI"` | `"Standard"` | Project |8| gpt-4.1-nano | `"OpenAI"` | `"Standard"` | Project |9| o4-mini (RFT) | `"OpenAI"` | `"Standard"` | Project |10| gpt-oss-20b | `"Microsoft"` | `"GlobalStandard"` | Cognitive Services |11| Ministral-3B | `"Mistral AI"` | `"GlobalStandard"` | Cognitive Services |12| Llama-3.3-70B | `"Meta"` | `"GlobalStandard"` | Cognitive Services |13| Qwen-3-32B | `"Alibaba"` | `"GlobalStandard"` | Cognitive Services |1415**Format strings are case-sensitive.** `"Mistral AI"` works; `"mistral"` does not.1617## Two Endpoint Types1819**Project Endpoint** (OpenAI models): `https://<resource>.services.ai.azure.com/api/projects/<project>/openai/v1/`20- Use `openai.OpenAI(base_url=..., api_key=...)` — NOT `AzureOpenAI`2122**Cognitive Services Endpoint** (OSS models): `https://<resource>.cognitiveservices.azure.com/openai/deployments/<name>/chat/completions?api-version=2025-04-01-preview`23- Use `openai.AzureOpenAI(azure_endpoint=..., api_key=..., api_version=...)`2425## CLI Deployment (`az cognitiveservices`)2627The CLI uses **different** format strings than the ARM REST API for OSS models:2829```bash30az cognitiveservices account deployment create \31--name <resource> \32--resource-group <rg> \33--deployment-name <name> \34--model-name <model> \35--model-version "1" \36--model-format "OpenAI-OSS" \37--sku-capacity 100 \38--sku-name "GlobalStandard"39```4041| Base model family | ARM REST `model.format` | CLI `--model-format` |42|-------------------|------------------------|----------------------|43| gpt-4.1-mini/nano | `"OpenAI"` | `"OpenAI"` |44| gpt-oss-20b | `"Microsoft"` | `"OpenAI-OSS"` |45| Ministral-3B | `"Mistral AI"` | `"OpenAI-OSS"` |46| Llama-3.3-70B | `"Meta"` | `"OpenAI-OSS"` |47| Qwen-3-32B | `"Alibaba"` | `"OpenAI-OSS"` |4849> ⚠️ Using `"OpenAI-OSS"` in ARM REST or `"Microsoft"` in CLI will fail with HTTP 500.5051## ARM REST API Deployment5253```54PUT https://management.azure.com/subscriptions/{sub_id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/deployments/{deploy_name}?api-version=2024-10-0155```5657```json58{59"sku": { "name": "GlobalStandard", "capacity": 100 },60"properties": {61"model": {62"format": "Microsoft",63"name": "gpt-oss-20b.ft-{jobid}-suffix",64"version": "1"65}66}67}68```6970**ARM token:** `az account get-access-token --query accessToken -o tsv` (expires ~60min).7172## Capacity Notes7374- Capacity = tokens-per-minute in thousands. `100` = 100K TPM.75- Set capacity ≥ 100 for eval workloads. At capacity=1, OSS FT models hit "Failed to load LoRA" errors.76- Quota is per-resource. After deleting a deployment, wait 15–20s before creating a new one.77- Deployment names: max 64 chars, alphanumeric + hyphens, unique within resource.7879## Common Deployment Errors8081| Error | Cause | Fix |82|-------|-------|-----|83| HTTP 500, no message | Wrong `model.format` | Check format table above |84| HTTP 409, deployment exists | Name collision | Use unique deployment name |85| HTTP 403 | ARM token expired | Refresh token |86| HTTP 400, "api-version not allowed" | `AzureOpenAI` client on `/v1/` endpoint | Switch to `openai.OpenAI` |87| HTTP 429, quota exceeded | Too many deployments | Delete unused, wait 20s |88| ProvisioningState: Failed | Model not available in region | Try different region |89