Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
foundry-agent/trace/references/analyze-failures.md
1# Analyze Failures — Find and Cluster Failing Traces23Identify failing agent traces, group them by root cause, and produce a prioritized action table.45## Step 1 — Find Failing Traces67> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To filter by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--failures) below.89```kql10dependencies11| where timestamp > ago(24h)12| where success == false or toint(resultCode) >= 40013| extend14operation = tostring(customDimensions["gen_ai.operation.name"]),15errorType = tostring(customDimensions["error.type"]),16model = tostring(customDimensions["gen_ai.request.model"]),17agentName = tostring(customDimensions["gen_ai.agent.name"]),18conversationId = tostring(customDimensions["gen_ai.conversation.id"])19| project timestamp, name, duration, resultCode, errorType, operation, model,20agentName, conversationId, operation_Id, id21| order by timestamp desc22| take 10023```2425## Step 2 — Cluster by Error Type2627```kql28dependencies29| where timestamp > ago(24h)30| where success == false or toint(resultCode) >= 40031| extend32errorType = tostring(customDimensions["error.type"]),33operation = tostring(customDimensions["gen_ai.operation.name"])34| summarize35count = count(),36firstSeen = min(timestamp),37lastSeen = max(timestamp),38avgDuration = avg(duration),39sampleOperationId = take_any(operation_Id)40by errorType, operation, resultCode41| order by count desc42```4344## Step 3 — Prioritized Action Table4546Present results as:4748| Priority | Error Type | Operation | Count | Result Code | Suggested Action |49|----------|-----------|-----------|-------|-------------|-----------------|50| P0 | timeout | invoke_agent | 15 | 504 | Check agent container health, increase timeout |51| P1 | rate_limited | chat | 8 | 429 | Check quota, add retry logic |52| P2 | content_filter | chat | 5 | 400 | Review prompt for policy violations |53| P3 | tool_error | execute_tool | 3 | 500 | Check tool implementation and permissions |5455**Prioritization:** P0 = highest count or most severe (5xx), then by count × recency.5657## Step 4 — Drill Into Specific Failure5859When the user selects a cluster, show individual failing traces:6061```kql62dependencies63| where timestamp > ago(24h)64| where success == false65| where customDimensions["error.type"] == "<selected_error_type>"66| where customDimensions["gen_ai.operation.name"] == "<selected_operation>"67| project timestamp, name, duration, resultCode,68conversationId = tostring(customDimensions["gen_ai.conversation.id"]),69responseId = tostring(customDimensions["gen_ai.response.id"]),70operation_Id71| order by timestamp desc72| take 2073```7475Also check `exceptions` table for stack traces:7677```kql78exceptions79| where timestamp > ago(24h)80| where operation_Id in ("<operation_id_1>", "<operation_id_2>")81| project timestamp, type, message, outerMessage, details, operation_Id82| order by timestamp desc83```8485Offer to view the full conversation for any trace via [Conversation Detail](conversation-detail.md).8687## Hosted Agent Variant — Failures8889For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use a two-step join:9091```kql92let reqIds = requests93| where timestamp > ago(24h)94| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"95| distinct id;96dependencies97| where timestamp > ago(24h)98| where operation_ParentId in (reqIds)99| where success == false or toint(resultCode) >= 400100| extend101operation = tostring(customDimensions["gen_ai.operation.name"]),102errorType = tostring(customDimensions["error.type"]),103model = tostring(customDimensions["gen_ai.request.model"]),104conversationId = tostring(customDimensions["gen_ai.conversation.id"])105| project timestamp, name, duration, resultCode, errorType, operation, model,106conversationId, operation_ParentId, operation_Id107| order by timestamp desc108| take 100109```110