Source from repo

Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

546.7 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

foundry-agent/trace/references/analyze-latency.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown117 linesFree

foundry-agent/trace/references/analyze-latency.md

1# Analyze Latency — Find and Diagnose Slow Traces
2 
3Identify slow agent traces, find bottleneck spans, and correlate with token usage.
4 
5## Step 1 — Find Slow Conversations
6 
7> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To scope by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--latency) below.
8 
9```kql
10dependencies
11| where timestamp > ago(24h)
12| where customDimensions["gen_ai.operation.name"] == "invoke_agent"
13| project timestamp, duration, success,
14    agentName = tostring(customDimensions["gen_ai.agent.name"]),
15    conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
16    operation_Id
17| summarize
18    totalDuration = sum(duration),
19    spanCount = count(),
20    hasErrors = countif(success == false) > 0
21  by conversationId, operation_Id
22| where totalDuration > 5000
23| order by totalDuration desc
24| take 50
25```
26 
27> **Default threshold:** 5 seconds. Ask the user for their latency threshold if not specified.
28 
29## Step 2 — Latency Distribution (P50/P95/P99)
30 
31```kql
32dependencies
33| where timestamp > ago(24h)
34| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
35| summarize
36    p50 = percentile(duration, 50),
37    p95 = percentile(duration, 95),
38    p99 = percentile(duration, 99),
39    avg = avg(duration),
40    count = count()
41  by operation = tostring(customDimensions["gen_ai.operation.name"]),
42     model = tostring(customDimensions["gen_ai.request.model"])
43| order by p95 desc
44```
45 
46Present as:
47 
48| Operation | Model | P50 (ms) | P95 (ms) | P99 (ms) | Avg (ms) | Count |
49|-----------|-------|---------|---------|---------|---------|-------|
50 
51## Step 3 — Bottleneck Breakdown
52 
53For a specific slow conversation, break down time spent per span type:
54 
55```kql
56dependencies
57| where operation_Id == "<operation_id>"
58| extend operation = tostring(customDimensions["gen_ai.operation.name"])
59| summarize
60    totalDuration = sum(duration),
61    spanCount = count(),
62    avgDuration = avg(duration)
63  by operation, name
64| order by totalDuration desc
65```
66 
67Common bottleneck patterns:
68- **`chat` spans dominate** → LLM inference is slow (consider smaller model or caching)
69- **`execute_tool` spans dominate** → Tool execution is slow (optimize tool implementation)
70- **`invoke_agent` has long gaps** → Orchestration overhead (check agent framework)
71 
72## Step 4 — Token Usage vs Latency Correlation
73 
74```kql
75dependencies
76| where timestamp > ago(24h)
77| where customDimensions["gen_ai.operation.name"] == "chat"
78| extend
79    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
80    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
81| where isnotempty(inputTokens)
82| project duration, inputTokens, outputTokens,
83    model = tostring(customDimensions["gen_ai.request.model"]),
84    operation_Id
85| order by duration desc
86| take 100
87```
88 
89High token counts often correlate with high latency. If confirmed, suggest:
90- Reduce system prompt length
91- Limit conversation history window
92- Use a faster model for simpler queries
93 
94## Hosted Agent Variant — Latency
95 
96For hosted agents, scope by Foundry agent name via `requests` then join to `dependencies`:
97 
98```kql
99let reqIds = requests
100| where timestamp > ago(24h)
101| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"
102| distinct id;
103dependencies
104| where timestamp > ago(24h)
105| where operation_ParentId in (reqIds)
106| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
107| summarize
108    p50 = percentile(duration, 50),
109    p95 = percentile(duration, 95),
110    p99 = percentile(duration, 99),
111    avg = avg(duration),
112    count = count()
113  by operation = tostring(customDimensions["gen_ai.operation.name"]),
114     model = tostring(customDimensions["gen_ai.request.model"])
115| order by p95 desc
116```
117

Preparing the source view

Microsoft Foundry Skill

foundry-agent/trace/references/analyze-latency.md