AI Gateway Troubleshooting

Common issues when using Azure API Management as an AI Gateway.

Authentication Issues

401 Unauthorized from Backend

Symptom: APIM returns 401 when calling Azure OpenAI.

Causes & Solutions:

Cause	Fix
Managed identity not enabled on APIM	`az apim update --name <apim> --resource-group <rg> --set identity.type=SystemAssigned`
Missing RBAC role	`az role assignment create --assignee <apim-principal-id> --role "Cognitive Services User" --scope <aoai-resource-id>`
Wrong auth resource	Ensure `resource="https://cognitiveservices.azure.com"` (not the endpoint URL)
RBAC propagation delay	Wait 5-10 minutes after role assignment

Diagnostic:

# Verify identity is enabled
az apim show --name <apim> --resource-group <rg> --query "identity" -o json

# Check role assignments
AOAI_ID=$(az cognitiveservices account show --name <aoai> --resource-group <rg> --query id -o tsv)
az role assignment list --scope "$AOAI_ID" --query "[?principalType=='ServicePrincipal'].{role:roleDefinitionName, principal:principalId}" -o table

Rate Limiting Issues

429 Token Limit Exceeded

Symptom: Requests blocked with 429 Too Many Requests from azure-openai-token-limit policy.

Solutions:

Increase limit: Raise tokens-per-minute value
Add more backends: Load balance across regions for higher aggregate TPM
Enable semantic caching: Reduce actual token consumption by serving cached responses
Switch counter-key: Use per-user instead of global to prevent one user from exhausting the pool

<!-- Per-user instead of global -->
<azure-openai-token-limit
    tokens-per-minute="50000"
    counter-key="@(context.Request.Headers.GetValueOrDefault("X-User-Id", context.Subscription.Id))"
    estimate-prompt-tokens="true" />

429 from Azure OpenAI (Not APIM)

Symptom: Backend returns 429 even though APIM token limits are not exceeded.

Cause: Azure OpenAI's own TPM quota is exhausted.

Solutions:

Increase Azure OpenAI deployment TPM quota in the portal
Add load balancing across multiple Azure OpenAI instances
Use retry with backoff:

<retry condition="@(context.Response.StatusCode == 429)" count="3" interval="10">
    <forward-request />
</retry>

Semantic Caching Issues

No Cache Hits

Symptom: Semantic cache is configured but cache hit rate is 0%.

Causes & Solutions:

Cause	Fix
`score-threshold` too high	Lower from 0.9 to 0.7 (more matches)
Embeddings backend misconfigured	Verify backend URL and auth
Redis not configured	Deploy Azure Cache for Redis Enterprise with RediSearch
Streaming requests	Semantic caching doesn't work with `"stream": true`

Verify caching is working:

# Check cache-related headers in response
curl -v -X POST "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \
  -H "Content-Type: application/json" \
  -H "Ocp-Apim-Subscription-Key: <key>" \
  -d '{"messages": [{"role": "user", "content": "What is Azure?"}], "max_tokens": 100}'

# Look for: x-cache-status header in response

Cache Returns Stale Data

Solution: Reduce duration in azure-openai-semantic-cache-store:

<!-- Shorter TTL for frequently changing knowledge -->
<azure-openai-semantic-cache-store duration="300" />  <!-- 5 minutes -->

Content Safety Issues

False Positives (Legitimate Content Blocked)

Symptom: Normal business content is being blocked by content safety policy.

Solutions:

Increase thresholds (less strict):

<llm-content-safety backend-id="contentsafety-backend">
    <category name="Hate" threshold="5" />      <!-- Was 4, now less strict -->
    <category name="Sexual" threshold="5" />
    <category name="SelfHarm" threshold="5" />
    <category name="Violence" threshold="5" />
</llm-content-safety>

Log blocked content for review:

<on-error>
    <choose>
        <when condition="@(context.LastError.Source == "llm-content-safety")">
            <trace source="content-safety" severity="warning">
                @{
                    return new JObject(
                        new JProperty("blocked", true),
                        new JProperty("subscription", context.Subscription.Id),
                        new JProperty("timestamp", DateTime.UtcNow)
                    ).ToString();
                }
            </trace>
            <return-response>
                <set-status code="400" reason="Content Filtered" />
                <set-body>{"error": "Content filtered by safety policy"}</set-body>
            </return-response>
        </when>
    </choose>
</on-error>

Content Safety Backend Error

Symptom: 500 error from llm-content-safety policy.

Causes:

Cause	Fix
Content Safety resource not deployed	Deploy Azure AI Content Safety resource
Backend URL wrong	Check `contentsafety-backend` URL matches resource endpoint
Missing RBAC	Grant APIM "Cognitive Services User" on the Content Safety resource
Region mismatch	Content Safety must be in a supported region

Backend Configuration Issues

Backend Not Found

Symptom: 500 error with "Backend not found" message.

# Verify backend exists
az apim backend list --service-name <apim> --resource-group <rg> \
  --query "[].{id:name, url:url}" -o table

# Check backend ID matches policy reference

Timeout on AI Requests

Symptom: Requests timeout, especially for large context windows or complex prompts.

Solution: Increase timeout in <backend>:

<backend>
    <!-- Default is 30s, increase for large AI requests -->
    <forward-request timeout="120" />
</backend>

Diagnostic Tools

APIM Tracing

Enable request tracing for debugging policy flow:

# Get tracing subscription key
az apim subscription list --service-name <apim> --resource-group <rg> \
  --query "[?displayName=='Built-in all-access subscription'].primaryKey" -o tsv

# Send request with tracing
curl -X POST "${GATEWAY_URL}/..." \
  -H "Ocp-Apim-Trace: true" \
  -H "Ocp-Apim-Subscription-Key: <built-in-key>"

Application Insights

If APIM is connected to Application Insights:

// Failed AI gateway requests
requests
| where success == false
| where url contains "openai"
| project timestamp, resultCode, duration, url
| order by timestamp desc
| take 20

// Token metrics over time
customMetrics
| where name == "Total Tokens"
| summarize TotalTokens = sum(value) by bin(timestamp, 1h)
| render timechart

// Content safety blocks
traces
| where message contains "content-safety"
| project timestamp, message, customDimensions
| order by timestamp desc

Health Check

Quick validation that the AI Gateway is functioning:

# 1. Check APIM is running
az apim show --name <apim> --resource-group <rg> --query "provisioningState" -o tsv
# Expected: Succeeded

# 2. Check backends
az apim backend list --service-name <apim> --resource-group <rg> -o table

# 3. Test endpoint
curl -s -o /dev/null -w "%{http_code}" "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \
  -H "Ocp-Apim-Subscription-Key: <key>" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "ping"}], "max_tokens": 5}'
# Expected: 200

Preparing the source view

Azure AI Gateway

references/troubleshooting.md

AI Gateway Troubleshooting

Authentication Issues

401 Unauthorized from Backend

Rate Limiting Issues

429 Token Limit Exceeded

429 from Azure OpenAI (Not APIM)

Semantic Caching Issues

No Cache Hits

Cache Returns Stale Data

Content Safety Issues

False Positives (Legitimate Content Blocked)

Content Safety Backend Error

Backend Configuration Issues

Backend Not Found

Timeout on AI Requests

Diagnostic Tools

APIM Tracing

Application Insights

Health Check

References