Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Design and configure Azure API Management as an AI Gateway for LLM traffic routing and rate limiting
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
SKILL.md
1---2name: azure-aigateway3description: "Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. WHEN: semantic caching, token limit, content safety, load balancing, AI model governance, MCP rate limiting, jailbreak detection, add Azure OpenAI backend, add AI Foundry model, test AI gateway, LLM policies, configure AI backend, token metrics, AI cost control, convert API to MCP, import OpenAPI to gateway."4license: MIT5metadata:6author: Microsoft7version: "0.0.0-placeholder"8compatibility: Requires Azure CLI (az) for configuration and testing9---1011# Azure AI Gateway1213Configure Azure API Management (APIM) as an AI Gateway for governing AI models, MCP tools, and agents.1415> **To deploy APIM**, use the **azure-prepare** skill. See [APIM deployment guide](https://learn.microsoft.com/azure/api-management/get-started-create-service-instance).1617## When to Use This Skill1819| Category | Triggers |20|----------|----------|21| **Model Governance** | "semantic caching", "token limits", "load balance AI", "track token usage" |22| **Tool Governance** | "rate limit MCP", "protect my tools", "configure my tool", "convert API to MCP" |23| **Agent Governance** | "content safety", "jailbreak detection", "filter harmful content" |24| **Configuration** | "add Azure OpenAI backend", "configure my model", "add AI Foundry model" |25| **Testing** | "test AI gateway", "call OpenAI through gateway" |2627---2829## Quick Reference3031| Policy | Purpose | Details |32|--------|---------|---------|33| `azure-openai-token-limit` | Cost control | [Model Policies](references/policies.md#token-rate-limiting) |34| `azure-openai-semantic-cache-lookup/store` | 60-80% cost savings | [Model Policies](references/policies.md#semantic-caching) |35| `azure-openai-emit-token-metric` | Observability | [Model Policies](references/policies.md#token-metrics) |36| `llm-content-safety` | Safety & compliance | [Agent Policies](references/policies.md#content-safety) |37| `rate-limit-by-key` | MCP/tool protection | [Tool Policies](references/policies.md#request-rate-limiting) |3839---4041## Get Gateway Details4243```bash44# Get gateway URL45az apim show --name <apim-name> --resource-group <rg> --query "gatewayUrl" -o tsv4647# List backends (AI models)48az apim backend list --service-name <apim-name> --resource-group <rg> \49--query "[].{id:name, url:url}" -o table5051# Get subscription key52az apim subscription keys list \53--service-name <apim-name> --resource-group <rg> --subscription-id <sub-id>54```5556---5758## Test AI Endpoint5960```bash61GATEWAY_URL=$(az apim show --name <apim-name> --resource-group <rg> --query "gatewayUrl" -o tsv)6263curl -X POST "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \64-H "Content-Type: application/json" \65-H "Ocp-Apim-Subscription-Key: <key>" \66-d '{"messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}'67```6869---7071## Common Tasks7273### Add AI Backend7475See [references/patterns.md](references/patterns.md#pattern-1-add-ai-model-backend) for full steps.7677```bash78# Discover AI resources79az cognitiveservices account list --query "[?kind=='OpenAI']" -o table8081# Create backend82az apim backend create --service-name <apim> --resource-group <rg> \83--backend-id openai-backend --protocol http --url "https://<aoai>.openai.azure.com/openai"8485# Grant access (managed identity)86az role assignment create --assignee <apim-principal-id> \87--role "Cognitive Services User" --scope <aoai-resource-id>88```8990### Apply AI Governance Policy9192Recommended policy order in `<inbound>`:93941. **Authentication** - Managed identity to backend952. **Semantic Cache Lookup** - Check cache before calling AI963. **Token Limits** - Cost control974. **Content Safety** - Filter harmful content985. **Backend Selection** - Load balancing996. **Metrics** - Token usage tracking100101See [references/policies.md](references/policies.md#combining-policies) for complete example.102103---104105## Troubleshooting106107| Issue | Solution |108|-------|----------|109| Token limit 429 | Increase `tokens-per-minute` or add load balancing |110| No cache hits | Lower `score-threshold` to 0.7 |111| Content false positives | Increase category thresholds (5-6) |112| Backend auth 401 | Grant APIM "Cognitive Services User" role |113114See [references/troubleshooting.md](references/troubleshooting.md) for details.115116---117118## References119120- [**Detailed Policies**](references/policies.md) - Full policy examples121- [**Configuration Patterns**](references/patterns.md) - Step-by-step patterns122- [**Troubleshooting**](references/troubleshooting.md) - Common issues123- [AI-Gateway Samples](https://github.com/Azure-Samples/AI-Gateway)124- [GenAI Gateway Docs](https://learn.microsoft.com/azure/api-management/genai-gateway-capabilities)125126## SDK Quick References127128- **Content Safety**: [Python](references/sdk/azure-ai-contentsafety-py.md) | [TypeScript](references/sdk/azure-ai-contentsafety-ts.md)129- **API Management**: [Python](references/sdk/azure-mgmt-apimanagement-py.md) | [.NET](references/sdk/azure-mgmt-apimanagement-dotnet.md)130