Source from repo

Azure AI Gateway

Configure Azure API Management as an AI Gateway with caching, token limits, and content safety

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

39.4 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

references/troubleshooting.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown271 linesFree

references/troubleshooting.md

1# AI Gateway Troubleshooting
2 
3Common issues when using Azure API Management as an AI Gateway.
4 
5---
6 
7## Authentication Issues
8 
9### 401 Unauthorized from Backend
10 
11**Symptom**: APIM returns `401` when calling Azure OpenAI.
12 
13**Causes & Solutions**:
14 
15| Cause | Fix |
16|-------|-----|
17| Managed identity not enabled on APIM | `az apim update --name <apim> --resource-group <rg> --set identity.type=SystemAssigned` |
18| Missing RBAC role | `az role assignment create --assignee <apim-principal-id> --role "Cognitive Services User" --scope <aoai-resource-id>` |
19| Wrong auth resource | Ensure `resource="https://cognitiveservices.azure.com"` (not the endpoint URL) |
20| RBAC propagation delay | Wait 5-10 minutes after role assignment |
21 
22**Diagnostic**:
23 
24```bash
25# Verify identity is enabled
26az apim show --name <apim> --resource-group <rg> --query "identity" -o json
27 
28# Check role assignments
29AOAI_ID=$(az cognitiveservices account show --name <aoai> --resource-group <rg> --query id -o tsv)
30az role assignment list --scope "$AOAI_ID" --query "[?principalType=='ServicePrincipal'].{role:roleDefinitionName, principal:principalId}" -o table
31```
32 
33---
34 
35## Rate Limiting Issues
36 
37### 429 Token Limit Exceeded
38 
39**Symptom**: Requests blocked with `429 Too Many Requests` from `azure-openai-token-limit` policy.
40 
41**Solutions**:
42 
431. **Increase limit**: Raise `tokens-per-minute` value
442. **Add more backends**: Load balance across regions for higher aggregate TPM
453. **Enable semantic caching**: Reduce actual token consumption by serving cached responses
464. **Switch counter-key**: Use per-user instead of global to prevent one user from exhausting the pool
47 
48```xml
49<!-- Per-user instead of global -->
50<azure-openai-token-limit
51    tokens-per-minute="50000"
52    counter-key="@(context.Request.Headers.GetValueOrDefault("X-User-Id", context.Subscription.Id))"
53    estimate-prompt-tokens="true" />
54```
55 
56### 429 from Azure OpenAI (Not APIM)
57 
58**Symptom**: Backend returns `429` even though APIM token limits are not exceeded.
59 
60**Cause**: Azure OpenAI's own TPM quota is exhausted.
61 
62**Solutions**:
63 
641. Increase Azure OpenAI deployment TPM quota in the portal
652. Add load balancing across multiple Azure OpenAI instances
663. Use retry with backoff:
67 
68```xml
69<retry condition="@(context.Response.StatusCode == 429)" count="3" interval="10">
70    <forward-request />
71</retry>
72```
73 
74---
75 
76## Semantic Caching Issues
77 
78### No Cache Hits
79 
80**Symptom**: Semantic cache is configured but cache hit rate is 0%.
81 
82**Causes & Solutions**:
83 
84| Cause | Fix |
85|-------|-----|
86| `score-threshold` too high | Lower from 0.9 to 0.7 (more matches) |
87| Embeddings backend misconfigured | Verify backend URL and auth |
88| Redis not configured | Deploy Azure Cache for Redis Enterprise with RediSearch |
89| Streaming requests | Semantic caching doesn't work with `"stream": true` |
90 
91**Verify caching is working**:
92 
93```bash
94# Check cache-related headers in response
95curl -v -X POST "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \
96  -H "Content-Type: application/json" \
97  -H "Ocp-Apim-Subscription-Key: <key>" \
98  -d '{"messages": [{"role": "user", "content": "What is Azure?"}], "max_tokens": 100}'
99 
100# Look for: x-cache-status header in response
101```
102 
103### Cache Returns Stale Data
104 
105**Solution**: Reduce `duration` in `azure-openai-semantic-cache-store`:
106 
107```xml
108<!-- Shorter TTL for frequently changing knowledge -->
109<azure-openai-semantic-cache-store duration="300" />  <!-- 5 minutes -->
110```
111 
112---
113 
114## Content Safety Issues
115 
116### False Positives (Legitimate Content Blocked)
117 
118**Symptom**: Normal business content is being blocked by content safety policy.
119 
120**Solutions**:
121 
1221. **Increase thresholds** (less strict):
123 
124```xml
125<llm-content-safety backend-id="contentsafety-backend">
126    <category name="Hate" threshold="5" />      <!-- Was 4, now less strict -->
127    <category name="Sexual" threshold="5" />
128    <category name="SelfHarm" threshold="5" />
129    <category name="Violence" threshold="5" />
130</llm-content-safety>
131```
132 
1332. **Log blocked content** for review:
134 
135```xml
136<on-error>
137    <choose>
138        <when condition="@(context.LastError.Source == "llm-content-safety")">
139            <trace source="content-safety" severity="warning">
140                @{
141                    return new JObject(
142                        new JProperty("blocked", true),
143                        new JProperty("subscription", context.Subscription.Id),
144                        new JProperty("timestamp", DateTime.UtcNow)
145                    ).ToString();
146                }
147            </trace>
148            <return-response>
149                <set-status code="400" reason="Content Filtered" />
150                <set-body>{"error": "Content filtered by safety policy"}</set-body>
151            </return-response>
152        </when>
153    </choose>
154</on-error>
155```
156 
157### Content Safety Backend Error
158 
159**Symptom**: `500` error from `llm-content-safety` policy.
160 
161**Causes**:
162 
163| Cause | Fix |
164|-------|-----|
165| Content Safety resource not deployed | Deploy Azure AI Content Safety resource |
166| Backend URL wrong | Check `contentsafety-backend` URL matches resource endpoint |
167| Missing RBAC | Grant APIM "Cognitive Services User" on the Content Safety resource |
168| Region mismatch | Content Safety must be in a supported region |
169 
170---
171 
172## Backend Configuration Issues
173 
174### Backend Not Found
175 
176**Symptom**: `500` error with "Backend not found" message.
177 
178```bash
179# Verify backend exists
180az apim backend list --service-name <apim> --resource-group <rg> \
181  --query "[].{id:name, url:url}" -o table
182 
183# Check backend ID matches policy reference
184```
185 
186### Timeout on AI Requests
187 
188**Symptom**: Requests timeout, especially for large context windows or complex prompts.
189 
190**Solution**: Increase timeout in `<backend>`:
191 
192```xml
193<backend>
194    <!-- Default is 30s, increase for large AI requests -->
195    <forward-request timeout="120" />
196</backend>
197```
198 
199---
200 
201## Diagnostic Tools
202 
203### APIM Tracing
204 
205Enable request tracing for debugging policy flow:
206 
207```bash
208# Get tracing subscription key
209az apim subscription list --service-name <apim> --resource-group <rg> \
210  --query "[?displayName=='Built-in all-access subscription'].primaryKey" -o tsv
211 
212# Send request with tracing
213curl -X POST "${GATEWAY_URL}/..." \
214  -H "Ocp-Apim-Trace: true" \
215  -H "Ocp-Apim-Subscription-Key: <built-in-key>"
216```
217 
218### Application Insights
219 
220If APIM is connected to Application Insights:
221 
222```kql
223// Failed AI gateway requests
224requests
225| where success == false
226| where url contains "openai"
227| project timestamp, resultCode, duration, url
228| order by timestamp desc
229| take 20
230 
231// Token metrics over time
232customMetrics
233| where name == "Total Tokens"
234| summarize TotalTokens = sum(value) by bin(timestamp, 1h)
235| render timechart
236 
237// Content safety blocks
238traces
239| where message contains "content-safety"
240| project timestamp, message, customDimensions
241| order by timestamp desc
242```
243 
244### Health Check
245 
246Quick validation that the AI Gateway is functioning:
247 
248```bash
249# 1. Check APIM is running
250az apim show --name <apim> --resource-group <rg> --query "provisioningState" -o tsv
251# Expected: Succeeded
252 
253# 2. Check backends
254az apim backend list --service-name <apim> --resource-group <rg> -o table
255 
256# 3. Test endpoint
257curl -s -o /dev/null -w "%{http_code}" "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \
258  -H "Ocp-Apim-Subscription-Key: <key>" \
259  -H "Content-Type: application/json" \
260  -d '{"messages": [{"role": "user", "content": "ping"}], "max_tokens": 5}'
261# Expected: 200
262```
263 
264---
265 
266## References
267 
268- [APIM Diagnostics](https://learn.microsoft.com/azure/api-management/diagnose-solve-problems)
269- [AI Gateway Monitoring](https://learn.microsoft.com/azure/api-management/genai-gateway-capabilities#monitoring-and-analytics)
270- [APIM Error Handling](https://learn.microsoft.com/azure/api-management/api-management-error-handling-policies)
271

Preparing the source view

Azure AI Gateway

references/troubleshooting.md