Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
foundry-agent/trace/references/analyze-failures.md
1# Analyze Failures — Find and Cluster Failing Traces23Identify failing agent traces, group them by root cause, and produce a prioritized action table.45## Step 1 — Find Failing Traces67> ⚠️ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To filter by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--failures) below.89```kql10dependencies11| where timestamp > ago(24h)12| where success == false or toint(resultCode) >= 40013| extend14operation = tostring(customDimensions["gen_ai.operation.name"]),15errorType = tostring(customDimensions["error.type"]),16model = tostring(customDimensions["gen_ai.request.model"]),17agentName = tostring(customDimensions["gen_ai.agent.name"]),18conversationId = tostring(customDimensions["gen_ai.conversation.id"])19| project timestamp, name, duration, resultCode, errorType, operation, model,20agentName, conversationId, operation_Id, id21| order by timestamp desc22| take 10023```2425## Step 2 — Cluster by Error Type2627```kql28dependencies29| where timestamp > ago(24h)30| where success == false or toint(resultCode) >= 40031| extend32errorType = tostring(customDimensions["error.type"]),33operation = tostring(customDimensions["gen_ai.operation.name"])34| summarize35count = count(),36firstSeen = min(timestamp),37lastSeen = max(timestamp),38avgDuration = avg(duration),39sampleOperationId = take_any(operation_Id)40by errorType, operation, resultCode41| order by count desc42```4344## Step 3 — Prioritized Action Table4546Present results as:4748| Priority | Error Type | Operation | Count | Result Code | Suggested Action |49|----------|-----------|-----------|-------|-------------|-----------------|50| P0 | timeout | invoke_agent | 15 | 504 | Check agent container health, increase timeout |51| P1 | rate_limited | chat | 8 | 429 | Check quota, add retry logic |52| P2 | content_filter | chat | 5 | 400 | Review prompt for policy violations |53| P3 | tool_error | execute_tool | 3 | 500 | Check tool implementation and permissions |5455**Prioritization:** P0 = highest count or most severe (5xx), then by count × recency.5657## Step 4 — Drill Into Specific Failure5859When the user selects a cluster, show individual failing traces:6061```kql62dependencies63| where timestamp > ago(24h)64| where success == false65| where customDimensions["error.type"] == "<selected_error_type>"66| where customDimensions["gen_ai.operation.name"] == "<selected_operation>"67| project timestamp, name, duration, resultCode,68conversationId = tostring(customDimensions["gen_ai.conversation.id"]),69responseId = tostring(customDimensions["gen_ai.response.id"]),70operation_Id71| order by timestamp desc72| take 2073```7475Also check `exceptions` table for stack traces:7677```kql78exceptions79| where timestamp > ago(24h)80| where operation_Id in ("<operation_id_1>", "<operation_id_2>")81| project timestamp, type, message, outerMessage, details, operation_Id82| order by timestamp desc83```8485Offer to view the full conversation for any trace via [Conversation Detail](conversation-detail.md).8687## Hosted Agent Variant — Failures8889For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use a two-step join:9091```kql92let reqIds = requests93| where timestamp > ago(24h)94| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"95| distinct id;96dependencies97| where timestamp > ago(24h)98| where operation_ParentId in (reqIds)99| where success == false or toint(resultCode) >= 400100| extend101operation = tostring(customDimensions["gen_ai.operation.name"]),102errorType = tostring(customDimensions["error.type"]),103model = tostring(customDimensions["gen_ai.request.model"]),104conversationId = tostring(customDimensions["gen_ai.conversation.id"])105| project timestamp, name, duration, resultCode, errorType, operation, model,106conversationId, operation_ParentId, operation_Id107| order by timestamp desc108| take 100109```110