Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Configure Azure API Management as an AI Gateway with caching, token limits, and content safety
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
references/patterns.md
1# AI Gateway Configuration Patterns23Step-by-step patterns for configuring Azure API Management as an AI Gateway.45---67## Pattern 1: Add AI Model Backend89Connect Azure OpenAI or AI Foundry models to your APIM instance.1011### Prerequisites1213- APIM instance deployed (use **azure-prepare** skill to deploy APIM — see [APIM deployment guide](https://learn.microsoft.com/azure/api-management/get-started-create-service-instance))14- Azure OpenAI or AI Foundry resource provisioned15- System-assigned or user-assigned managed identity enabled on APIM1617### Steps1819#### 1. Discover AI Resources2021```bash22# Find Azure OpenAI resources23az cognitiveservices account list --query "[?kind=='OpenAI'].{name:name, rg:resourceGroup, endpoint:properties.endpoint}" -o table2425# Find AI Foundry resources (if using)26az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, rg:resourceGroup}" -o table27```2829#### 2. Enable Managed Identity on APIM3031```bash32# Enable system-assigned identity33az apim update --name <apim-name> --resource-group <rg> --set identity.type=SystemAssigned3435# Get principal ID36PRINCIPAL_ID=$(az apim show --name <apim-name> --resource-group <rg> --query "identity.principalId" -o tsv)37```3839#### 3. Grant RBAC Access4041```bash42AOAI_ID=$(az cognitiveservices account show --name <aoai-name> --resource-group <rg> --query id -o tsv)4344az role assignment create \45--assignee "$PRINCIPAL_ID" \46--role "Cognitive Services User" \47--scope "$AOAI_ID"48```4950#### 4. Create Backend5152```bash53az apim backend create \54--service-name <apim-name> \55--resource-group <rg> \56--backend-id openai-backend \57--protocol http \58--url "https://<aoai-name>.openai.azure.com/openai"59```6061#### 5. Import API (OpenAPI Spec)6263```bash64# Import the Azure OpenAI API specification65az apim api import \66--service-name <apim-name> \67--resource-group <rg> \68--api-id azure-openai-api \69--path "openai" \70--specification-format OpenApi \71--specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json" \72--service-url "https://<aoai-name>.openai.azure.com/openai"73```7475#### 6. Set Backend Policy7677Add managed identity authentication in `<inbound>`:7879```xml80<inbound>81<base />82<set-backend-service backend-id="openai-backend" />83<authentication-managed-identity resource="https://cognitiveservices.azure.com" />84</inbound>85```8687---8889## Pattern 2: Load Balance Across Multiple AI Backends9091Distribute requests across multiple Azure OpenAI instances for higher throughput.9293### Steps9495#### 1. Create Multiple Backends9697```bash98# Primary region99az apim backend create --service-name <apim> --resource-group <rg> \100--backend-id openai-eastus --protocol http \101--url "https://<aoai-eastus>.openai.azure.com/openai"102103# Secondary region104az apim backend create --service-name <apim> --resource-group <rg> \105--backend-id openai-westus --protocol http \106--url "https://<aoai-westus>.openai.azure.com/openai"107```108109#### 2. Create Backend Pool110111Using APIM backend pool (preview) or policy-based load balancing:112113```xml114<inbound>115<base />116<set-variable name="backendUrl" value="@{117var backends = new [] {118"https://aoai-eastus.openai.azure.com",119"https://aoai-westus.openai.azure.com"120};121var hash = Math.Abs(context.RequestId.GetHashCode());122var index = hash % backends.Length;123return backends[index];124}" />125<set-backend-service base-url="@((string)context.Variables["backendUrl"] + "/openai")" />126<authentication-managed-identity resource="https://cognitiveservices.azure.com" />127</inbound>128```129130#### 3. Add Circuit Breaker (Retry on 429)131132```xml133<retry condition="@(context.Response.StatusCode == 429)" count="3" interval="10" delta="5" max-interval="30" first-fast-retry="false">134<set-variable name="backendUrl" value="@{135var backends = new [] {136"https://aoai-eastus.openai.azure.com",137"https://aoai-westus.openai.azure.com"138};139var currentIndex = Array.IndexOf(backends, (string)context.Variables["backendUrl"]);140return backends[(currentIndex + 1) % backends.Length];141}" />142<set-backend-service base-url="@((string)context.Variables["backendUrl"] + "/openai")" />143<forward-request />144</retry>145```146147---148149## Pattern 3: Convert API to MCP Tool150151Expose an existing API through APIM as an MCP-compatible tool for AI agents.152153### Steps1541551. **Import API** into APIM using OpenAPI spec1562. **Add rate limiting** to protect the tool endpoint1573. **Add content safety** to filter harmful inputs1584. **Generate MCP manifest** pointing to the APIM endpoint159160```xml161<!-- Rate limit MCP tool calls -->162<inbound>163<base />164<rate-limit-by-key calls="10" renewal-period="60"165counter-key="@(context.Request.Headers.GetValueOrDefault("X-Agent-Id", "anonymous"))" />166</inbound>167```168169---170171## Pattern 4: Add Streaming Support172173Configure APIM to properly handle Server-Sent Events (SSE) for streaming AI responses.174175```xml176<inbound>177<base />178<set-backend-service backend-id="openai-backend" />179<authentication-managed-identity resource="https://cognitiveservices.azure.com" />180</inbound>181<outbound>182<base />183<set-header name="Content-Type" exists-action="override">184<value>@(context.Request.Body.As<JObject>()["stream"]?.Value<bool>() == true185? "text/event-stream" : "application/json")</value>186</set-header>187</outbound>188```189190> **Note**: Semantic caching and token metrics policies are NOT compatible with streaming responses. Use non-streaming for cost control scenarios.191192---193194## Pattern 5: Multi-Tenant AI Gateway195196Isolate tenants with per-client rate limiting and tracking.197198```xml199<inbound>200<base />201<!-- Extract tenant from subscription or header -->202<set-variable name="tenantId" value="@(context.Subscription.Id)" />203204<!-- Per-tenant token limit -->205<azure-openai-token-limit206tokens-per-minute="10000"207counter-key="@((string)context.Variables["tenantId"])"208estimate-prompt-tokens="true" />209210<!-- Per-tenant metrics -->211<azure-openai-emit-token-metric namespace="ai-gateway">212<dimension name="Tenant" value="@((string)context.Variables["tenantId"])" />213<dimension name="API" value="@(context.Api.Name)" />214</azure-openai-emit-token-metric>215216<set-backend-service backend-id="openai-backend" />217<authentication-managed-identity resource="https://cognitiveservices.azure.com" />218</inbound>219```220221---222223## Next Steps224225- Apply [governance policies](policies.md) to your configured backends226- Review [troubleshooting](troubleshooting.md) for common configuration issues227