Source from repo

Azure AI Gateway

Configure Azure API Management as an AI Gateway with caching, token limits, and content safety

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

39.4 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

references/patterns.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown227 linesFree

references/patterns.md

1# AI Gateway Configuration Patterns
2 
3Step-by-step patterns for configuring Azure API Management as an AI Gateway.
4 
5---
6 
7## Pattern 1: Add AI Model Backend
8 
9Connect Azure OpenAI or AI Foundry models to your APIM instance.
10 
11### Prerequisites
12 
13- APIM instance deployed (use **azure-prepare** skill to deploy APIM — see [APIM deployment guide](https://learn.microsoft.com/azure/api-management/get-started-create-service-instance))
14- Azure OpenAI or AI Foundry resource provisioned
15- System-assigned or user-assigned managed identity enabled on APIM
16 
17### Steps
18 
19#### 1. Discover AI Resources
20 
21```bash
22# Find Azure OpenAI resources
23az cognitiveservices account list --query "[?kind=='OpenAI'].{name:name, rg:resourceGroup, endpoint:properties.endpoint}" -o table
24 
25# Find AI Foundry resources (if using)
26az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, rg:resourceGroup}" -o table
27```
28 
29#### 2. Enable Managed Identity on APIM
30 
31```bash
32# Enable system-assigned identity
33az apim update --name <apim-name> --resource-group <rg> --set identity.type=SystemAssigned
34 
35# Get principal ID
36PRINCIPAL_ID=$(az apim show --name <apim-name> --resource-group <rg> --query "identity.principalId" -o tsv)
37```
38 
39#### 3. Grant RBAC Access
40 
41```bash
42AOAI_ID=$(az cognitiveservices account show --name <aoai-name> --resource-group <rg> --query id -o tsv)
43 
44az role assignment create \
45  --assignee "$PRINCIPAL_ID" \
46  --role "Cognitive Services User" \
47  --scope "$AOAI_ID"
48```
49 
50#### 4. Create Backend
51 
52```bash
53az apim backend create \
54  --service-name <apim-name> \
55  --resource-group <rg> \
56  --backend-id openai-backend \
57  --protocol http \
58  --url "https://<aoai-name>.openai.azure.com/openai"
59```
60 
61#### 5. Import API (OpenAPI Spec)
62 
63```bash
64# Import the Azure OpenAI API specification
65az apim api import \
66  --service-name <apim-name> \
67  --resource-group <rg> \
68  --api-id azure-openai-api \
69  --path "openai" \
70  --specification-format OpenApi \
71  --specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json" \
72  --service-url "https://<aoai-name>.openai.azure.com/openai"
73```
74 
75#### 6. Set Backend Policy
76 
77Add managed identity authentication in `<inbound>`:
78 
79```xml
80<inbound>
81    <base />
82    <set-backend-service backend-id="openai-backend" />
83    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
84</inbound>
85```
86 
87---
88 
89## Pattern 2: Load Balance Across Multiple AI Backends
90 
91Distribute requests across multiple Azure OpenAI instances for higher throughput.
92 
93### Steps
94 
95#### 1. Create Multiple Backends
96 
97```bash
98# Primary region
99az apim backend create --service-name <apim> --resource-group <rg> \
100  --backend-id openai-eastus --protocol http \
101  --url "https://<aoai-eastus>.openai.azure.com/openai"
102 
103# Secondary region
104az apim backend create --service-name <apim> --resource-group <rg> \
105  --backend-id openai-westus --protocol http \
106  --url "https://<aoai-westus>.openai.azure.com/openai"
107```
108 
109#### 2. Create Backend Pool
110 
111Using APIM backend pool (preview) or policy-based load balancing:
112 
113```xml
114<inbound>
115    <base />
116    <set-variable name="backendUrl" value="@{
117        var backends = new [] {
118            "https://aoai-eastus.openai.azure.com",
119            "https://aoai-westus.openai.azure.com"
120        };
121        var hash = Math.Abs(context.RequestId.GetHashCode());
122        var index = hash % backends.Length;
123        return backends[index];
124    }" />
125    <set-backend-service base-url="@((string)context.Variables["backendUrl"] + "/openai")" />
126    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
127</inbound>
128```
129 
130#### 3. Add Circuit Breaker (Retry on 429)
131 
132```xml
133<retry condition="@(context.Response.StatusCode == 429)" count="3" interval="10" delta="5" max-interval="30" first-fast-retry="false">
134    <set-variable name="backendUrl" value="@{
135        var backends = new [] {
136            "https://aoai-eastus.openai.azure.com",
137            "https://aoai-westus.openai.azure.com"
138        };
139        var currentIndex = Array.IndexOf(backends, (string)context.Variables["backendUrl"]);
140        return backends[(currentIndex + 1) % backends.Length];
141    }" />
142    <set-backend-service base-url="@((string)context.Variables["backendUrl"] + "/openai")" />
143    <forward-request />
144</retry>
145```
146 
147---
148 
149## Pattern 3: Convert API to MCP Tool
150 
151Expose an existing API through APIM as an MCP-compatible tool for AI agents.
152 
153### Steps
154 
1551. **Import API** into APIM using OpenAPI spec
1562. **Add rate limiting** to protect the tool endpoint
1573. **Add content safety** to filter harmful inputs
1584. **Generate MCP manifest** pointing to the APIM endpoint
159 
160```xml
161<!-- Rate limit MCP tool calls -->
162<inbound>
163    <base />
164    <rate-limit-by-key calls="10" renewal-period="60"
165        counter-key="@(context.Request.Headers.GetValueOrDefault("X-Agent-Id", "anonymous"))" />
166</inbound>
167```
168 
169---
170 
171## Pattern 4: Add Streaming Support
172 
173Configure APIM to properly handle Server-Sent Events (SSE) for streaming AI responses.
174 
175```xml
176<inbound>
177    <base />
178    <set-backend-service backend-id="openai-backend" />
179    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
180</inbound>
181<outbound>
182    <base />
183    <set-header name="Content-Type" exists-action="override">
184        <value>@(context.Request.Body.As<JObject>()["stream"]?.Value<bool>() == true
185            ? "text/event-stream" : "application/json")</value>
186    </set-header>
187</outbound>
188```
189 
190> **Note**: Semantic caching and token metrics policies are NOT compatible with streaming responses. Use non-streaming for cost control scenarios.
191 
192---
193 
194## Pattern 5: Multi-Tenant AI Gateway
195 
196Isolate tenants with per-client rate limiting and tracking.
197 
198```xml
199<inbound>
200    <base />
201    <!-- Extract tenant from subscription or header -->
202    <set-variable name="tenantId" value="@(context.Subscription.Id)" />
203 
204    <!-- Per-tenant token limit -->
205    <azure-openai-token-limit
206        tokens-per-minute="10000"
207        counter-key="@((string)context.Variables["tenantId"])"
208        estimate-prompt-tokens="true" />
209 
210    <!-- Per-tenant metrics -->
211    <azure-openai-emit-token-metric namespace="ai-gateway">
212        <dimension name="Tenant" value="@((string)context.Variables["tenantId"])" />
213        <dimension name="API" value="@(context.Api.Name)" />
214    </azure-openai-emit-token-metric>
215 
216    <set-backend-service backend-id="openai-backend" />
217    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
218</inbound>
219```
220 
221---
222 
223## Next Steps
224 
225- Apply [governance policies](policies.md) to your configured backends
226- Review [troubleshooting](troubleshooting.md) for common configuration issues
227

Preparing the source view

Azure AI Gateway

references/patterns.md