AI Gateway Configuration Patterns

Step-by-step patterns for configuring Azure API Management as an AI Gateway.

Pattern 1: Add AI Model Backend

Connect Azure OpenAI or AI Foundry models to your APIM instance.

Prerequisites

APIM instance deployed (use azure-prepare skill to deploy APIM — see APIM deployment guide)
Azure OpenAI or AI Foundry resource provisioned
System-assigned or user-assigned managed identity enabled on APIM

Steps

#### 1. Discover AI Resources

# Find Azure OpenAI resources
az cognitiveservices account list --query "[?kind=='OpenAI'].{name:name, rg:resourceGroup, endpoint:properties.endpoint}" -o table

# Find AI Foundry resources (if using)
az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, rg:resourceGroup}" -o table

#### 2. Enable Managed Identity on APIM

# Enable system-assigned identity
az apim update --name <apim-name> --resource-group <rg> --set identity.type=SystemAssigned

# Get principal ID
PRINCIPAL_ID=$(az apim show --name <apim-name> --resource-group <rg> --query "identity.principalId" -o tsv)

#### 3. Grant RBAC Access

AOAI_ID=$(az cognitiveservices account show --name <aoai-name> --resource-group <rg> --query id -o tsv)

az role assignment create \
  --assignee "$PRINCIPAL_ID" \
  --role "Cognitive Services User" \
  --scope "$AOAI_ID"

#### 4. Create Backend

az apim backend create \
  --service-name <apim-name> \
  --resource-group <rg> \
  --backend-id openai-backend \
  --protocol http \
  --url "https://<aoai-name>.openai.azure.com/openai"

#### 5. Import API (OpenAPI Spec)

# Import the Azure OpenAI API specification
az apim api import \
  --service-name <apim-name> \
  --resource-group <rg> \
  --api-id azure-openai-api \
  --path "openai" \
  --specification-format OpenApi \
  --specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json" \
  --service-url "https://<aoai-name>.openai.azure.com/openai"

#### 6. Set Backend Policy

Add managed identity authentication in <inbound>:

<inbound>
    <base />
    <set-backend-service backend-id="openai-backend" />
    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
</inbound>

Pattern 2: Load Balance Across Multiple AI Backends

Distribute requests across multiple Azure OpenAI instances for higher throughput.

Steps

#### 1. Create Multiple Backends

# Primary region
az apim backend create --service-name <apim> --resource-group <rg> \
  --backend-id openai-eastus --protocol http \
  --url "https://<aoai-eastus>.openai.azure.com/openai"

# Secondary region
az apim backend create --service-name <apim> --resource-group <rg> \
  --backend-id openai-westus --protocol http \
  --url "https://<aoai-westus>.openai.azure.com/openai"

#### 2. Create Backend Pool

Using APIM backend pool (preview) or policy-based load balancing:

<inbound>
    <base />
    <set-variable name="backendUrl" value="@{
        var backends = new [] {
            "https://aoai-eastus.openai.azure.com",
            "https://aoai-westus.openai.azure.com"
        };
        var hash = Math.Abs(context.RequestId.GetHashCode());
        var index = hash % backends.Length;
        return backends[index];
    }" />
    <set-backend-service base-url="@((string)context.Variables["backendUrl"] + "/openai")" />
    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
</inbound>

#### 3. Add Circuit Breaker (Retry on 429)

<retry condition="@(context.Response.StatusCode == 429)" count="3" interval="10" delta="5" max-interval="30" first-fast-retry="false">
    <set-variable name="backendUrl" value="@{
        var backends = new [] {
            "https://aoai-eastus.openai.azure.com",
            "https://aoai-westus.openai.azure.com"
        };
        var currentIndex = Array.IndexOf(backends, (string)context.Variables["backendUrl"]);
        return backends[(currentIndex + 1) % backends.Length];
    }" />
    <set-backend-service base-url="@((string)context.Variables["backendUrl"] + "/openai")" />
    <forward-request />
</retry>

Pattern 3: Convert API to MCP Tool

Expose an existing API through APIM as an MCP-compatible tool for AI agents.

Steps

Import API into APIM using OpenAPI spec
Add rate limiting to protect the tool endpoint
Add content safety to filter harmful inputs
Generate MCP manifest pointing to the APIM endpoint

<!-- Rate limit MCP tool calls -->
<inbound>
    <base />
    <rate-limit-by-key calls="10" renewal-period="60"
        counter-key="@(context.Request.Headers.GetValueOrDefault("X-Agent-Id", "anonymous"))" />
</inbound>

Pattern 4: Add Streaming Support

Configure APIM to properly handle Server-Sent Events (SSE) for streaming AI responses.

<inbound>
    <base />
    <set-backend-service backend-id="openai-backend" />
    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
</inbound>
<outbound>
    <base />
    <set-header name="Content-Type" exists-action="override">
        <value>@(context.Request.Body.As<JObject>()["stream"]?.Value<bool>() == true
            ? "text/event-stream" : "application/json")</value>
    </set-header>
</outbound>

Note: Semantic caching and token metrics policies are NOT compatible with streaming responses. Use non-streaming for cost control scenarios.

Pattern 5: Multi-Tenant AI Gateway

Isolate tenants with per-client rate limiting and tracking.

<inbound>
    <base />
    <!-- Extract tenant from subscription or header -->
    <set-variable name="tenantId" value="@(context.Subscription.Id)" />

    <!-- Per-tenant token limit -->
    <azure-openai-token-limit
        tokens-per-minute="10000"
        counter-key="@((string)context.Variables["tenantId"])"
        estimate-prompt-tokens="true" />

    <!-- Per-tenant metrics -->
    <azure-openai-emit-token-metric namespace="ai-gateway">
        <dimension name="Tenant" value="@((string)context.Variables["tenantId"])" />
        <dimension name="API" value="@(context.Api.Name)" />
    </azure-openai-emit-token-metric>

    <set-backend-service backend-id="openai-backend" />
    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
</inbound>

Next Steps

Apply governance policies to your configured backends
Review troubleshooting for common configuration issues

Preparing the source view

Azure AI Gateway

references/patterns.md

AI Gateway Configuration Patterns

Pattern 1: Add AI Model Backend

Prerequisites

Steps

Pattern 2: Load Balance Across Multiple AI Backends

Steps

Pattern 3: Convert API to MCP Tool

Steps

Pattern 4: Add Streaming Support

Pattern 5: Multi-Tenant AI Gateway

Next Steps