Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

546.6 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

foundry-agent/trace/references/analyze-latency.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown117 linesFree

foundry-agent/trace/references/analyze-latency.md

1# Analyze Latency — Find and Diagnose Slow Traces
2 
3Identify slow agent traces, find bottleneck spans, and correlate with token usage.
4 
5## Step 1 — Find Slow Conversations
6 
7> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To scope by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--latency) below.
8 
9```kql
10dependencies
11| where timestamp > ago(24h)
12| where customDimensions["gen_ai.operation.name"] == "invoke_agent"
13| project timestamp, duration, success,
14    agentName = tostring(customDimensions["gen_ai.agent.name"]),
15    conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
16    operation_Id
17| summarize
18    totalDuration = sum(duration),
19    spanCount = count(),
20    hasErrors = countif(success == false) > 0
21  by conversationId, operation_Id
22| where totalDuration > 5000
23| order by totalDuration desc
24| take 50
25```
26 
27> **Default threshold:** 5 seconds. Ask the user for their latency threshold if not specified.
28 
29## Step 2 — Latency Distribution (P50/P95/P99)
30 
31```kql
32dependencies
33| where timestamp > ago(24h)
34| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
35| summarize
36    p50 = percentile(duration, 50),
37    p95 = percentile(duration, 95),
38    p99 = percentile(duration, 99),
39    avg = avg(duration),
40    count = count()
41  by operation = tostring(customDimensions["gen_ai.operation.name"]),
42     model = tostring(customDimensions["gen_ai.request.model"])
43| order by p95 desc
44```
45 
46Present as:
47 
48| Operation | Model | P50 (ms) | P95 (ms) | P99 (ms) | Avg (ms) | Count |
49|-----------|-------|---------|---------|---------|---------|-------|
50 
51## Step 3 — Bottleneck Breakdown
52 
53For a specific slow conversation, break down time spent per span type:
54 
55```kql
56dependencies
57| where operation_Id == "<operation_id>"
58| extend operation = tostring(customDimensions["gen_ai.operation.name"])
59| summarize
60    totalDuration = sum(duration),
61    spanCount = count(),
62    avgDuration = avg(duration)
63  by operation, name
64| order by totalDuration desc
65```
66 
67Common bottleneck patterns:
68- **`chat` spans dominate** → LLM inference is slow (consider smaller model or caching)
69- **`execute_tool` spans dominate** → Tool execution is slow (optimize tool implementation)
70- **`invoke_agent` has long gaps** → Orchestration overhead (check agent framework)
71 
72## Step 4 — Token Usage vs Latency Correlation
73 
74```kql
75dependencies
76| where timestamp > ago(24h)
77| where customDimensions["gen_ai.operation.name"] == "chat"
78| extend
79    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
80    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
81| where isnotempty(inputTokens)
82| project duration, inputTokens, outputTokens,
83    model = tostring(customDimensions["gen_ai.request.model"]),
84    operation_Id
85| order by duration desc
86| take 100
87```
88 
89High token counts often correlate with high latency. If confirmed, suggest:
90- Reduce system prompt length
91- Limit conversation history window
92- Use a faster model for simpler queries
93 
94## Hosted Agent Variant — Latency
95 
96For hosted agents, scope by Foundry agent name via `requests` then join to `dependencies`:
97 
98```kql
99let reqIds = requests
100| where timestamp > ago(24h)
101| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"
102| distinct id;
103dependencies
104| where timestamp > ago(24h)
105| where operation_ParentId in (reqIds)
106| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
107| summarize
108    p50 = percentile(duration, 50),
109    p95 = percentile(duration, 95),
110    p99 = percentile(duration, 99),
111    avg = avg(duration),
112    count = count()
113  by operation = tostring(customDimensions["gen_ai.operation.name"]),
114     model = tostring(customDimensions["gen_ai.request.model"])
115| order by p95 desc
116```
117

Loading source

Preparing the source view

Pulling the file list, source metadata, and syntax-aware rendering for this listing.

Marketplace

Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

546.6 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

foundry-agent/trace/references/analyze-latency.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown117 linesFree

foundry-agent/trace/references/analyze-latency.md

1# Analyze Latency — Find and Diagnose Slow Traces
2 
3Identify slow agent traces, find bottleneck spans, and correlate with token usage.
4 
5## Step 1 — Find Slow Conversations
6 
7> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To scope by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--latency) below.
8 
9```kql
10dependencies
11| where timestamp > ago(24h)
12| where customDimensions["gen_ai.operation.name"] == "invoke_agent"
13| project timestamp, duration, success,
14    agentName = tostring(customDimensions["gen_ai.agent.name"]),
15    conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
16    operation_Id
17| summarize
18    totalDuration = sum(duration),
19    spanCount = count(),
20    hasErrors = countif(success == false) > 0
21  by conversationId, operation_Id
22| where totalDuration > 5000
23| order by totalDuration desc
24| take 50
25```
26 
27> **Default threshold:** 5 seconds. Ask the user for their latency threshold if not specified.
28 
29## Step 2 — Latency Distribution (P50/P95/P99)
30 
31```kql
32dependencies
33| where timestamp > ago(24h)
34| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
35| summarize
36    p50 = percentile(duration, 50),
37    p95 = percentile(duration, 95),
38    p99 = percentile(duration, 99),
39    avg = avg(duration),
40    count = count()
41  by operation = tostring(customDimensions["gen_ai.operation.name"]),
42     model = tostring(customDimensions["gen_ai.request.model"])
43| order by p95 desc
44```
45 
46Present as:
47 
48| Operation | Model | P50 (ms) | P95 (ms) | P99 (ms) | Avg (ms) | Count |
49|-----------|-------|---------|---------|---------|---------|-------|
50 
51## Step 3 — Bottleneck Breakdown
52 
53For a specific slow conversation, break down time spent per span type:
54 
55```kql
56dependencies
57| where operation_Id == "<operation_id>"
58| extend operation = tostring(customDimensions["gen_ai.operation.name"])
59| summarize
60    totalDuration = sum(duration),
61    spanCount = count(),
62    avgDuration = avg(duration)
63  by operation, name
64| order by totalDuration desc
65```
66 
67Common bottleneck patterns:
68- **`chat` spans dominate** → LLM inference is slow (consider smaller model or caching)
69- **`execute_tool` spans dominate** → Tool execution is slow (optimize tool implementation)
70- **`invoke_agent` has long gaps** → Orchestration overhead (check agent framework)
71 
72## Step 4 — Token Usage vs Latency Correlation
73 
74```kql
75dependencies
76| where timestamp > ago(24h)
77| where customDimensions["gen_ai.operation.name"] == "chat"
78| extend
79    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
80    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
81| where isnotempty(inputTokens)
82| project duration, inputTokens, outputTokens,
83    model = tostring(customDimensions["gen_ai.request.model"]),
84    operation_Id
85| order by duration desc
86| take 100
87```
88 
89High token counts often correlate with high latency. If confirmed, suggest:
90- Reduce system prompt length
91- Limit conversation history window
92- Use a faster model for simpler queries
93 
94## Hosted Agent Variant — Latency
95 
96For hosted agents, scope by Foundry agent name via `requests` then join to `dependencies`:
97 
98```kql
99let reqIds = requests
100| where timestamp > ago(24h)
101| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"
102| distinct id;
103dependencies
104| where timestamp > ago(24h)
105| where operation_ParentId in (reqIds)
106| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
107| summarize
108    p50 = percentile(duration, 50),
109    p95 = percentile(duration, 95),
110    p99 = percentile(duration, 99),
111    avg = avg(duration),
112    count = count()
113  by operation = tostring(customDimensions["gen_ai.operation.name"]),
114     model = tostring(customDimensions["gen_ai.request.model"])
115| order by p95 desc
116```
117