Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
foundry-agent/trace/references/analyze-latency.md
1# Analyze Latency — Find and Diagnose Slow Traces23Identify slow agent traces, find bottleneck spans, and correlate with token usage.45## Step 1 — Find Slow Conversations67> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To scope by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--latency) below.89```kql10dependencies11| where timestamp > ago(24h)12| where customDimensions["gen_ai.operation.name"] == "invoke_agent"13| project timestamp, duration, success,14agentName = tostring(customDimensions["gen_ai.agent.name"]),15conversationId = tostring(customDimensions["gen_ai.conversation.id"]),16operation_Id17| summarize18totalDuration = sum(duration),19spanCount = count(),20hasErrors = countif(success == false) > 021by conversationId, operation_Id22| where totalDuration > 500023| order by totalDuration desc24| take 5025```2627> **Default threshold:** 5 seconds. Ask the user for their latency threshold if not specified.2829## Step 2 — Latency Distribution (P50/P95/P99)3031```kql32dependencies33| where timestamp > ago(24h)34| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")35| summarize36p50 = percentile(duration, 50),37p95 = percentile(duration, 95),38p99 = percentile(duration, 99),39avg = avg(duration),40count = count()41by operation = tostring(customDimensions["gen_ai.operation.name"]),42model = tostring(customDimensions["gen_ai.request.model"])43| order by p95 desc44```4546Present as:4748| Operation | Model | P50 (ms) | P95 (ms) | P99 (ms) | Avg (ms) | Count |49|-----------|-------|---------|---------|---------|---------|-------|5051## Step 3 — Bottleneck Breakdown5253For a specific slow conversation, break down time spent per span type:5455```kql56dependencies57| where operation_Id == "<operation_id>"58| extend operation = tostring(customDimensions["gen_ai.operation.name"])59| summarize60totalDuration = sum(duration),61spanCount = count(),62avgDuration = avg(duration)63by operation, name64| order by totalDuration desc65```6667Common bottleneck patterns:68- **`chat` spans dominate** → LLM inference is slow (consider smaller model or caching)69- **`execute_tool` spans dominate** → Tool execution is slow (optimize tool implementation)70- **`invoke_agent` has long gaps** → Orchestration overhead (check agent framework)7172## Step 4 — Token Usage vs Latency Correlation7374```kql75dependencies76| where timestamp > ago(24h)77| where customDimensions["gen_ai.operation.name"] == "chat"78| extend79inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),80outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])81| where isnotempty(inputTokens)82| project duration, inputTokens, outputTokens,83model = tostring(customDimensions["gen_ai.request.model"]),84operation_Id85| order by duration desc86| take 10087```8889High token counts often correlate with high latency. If confirmed, suggest:90- Reduce system prompt length91- Limit conversation history window92- Use a faster model for simpler queries9394## Hosted Agent Variant — Latency9596For hosted agents, scope by Foundry agent name via `requests` then join to `dependencies`:9798```kql99let reqIds = requests100| where timestamp > ago(24h)101| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"102| distinct id;103dependencies104| where timestamp > ago(24h)105| where operation_ParentId in (reqIds)106| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")107| summarize108p50 = percentile(duration, 50),109p95 = percentile(duration, 95),110p99 = percentile(duration, 99),111avg = avg(duration),112count = count()113by operation = tostring(customDimensions["gen_ai.operation.name"]),114model = tostring(customDimensions["gen_ai.request.model"])115| order by p95 desc116```117