Source from repo

Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

546.7 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

foundry-agent/eval-datasets/references/eval-lineage.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown128 linesFree

foundry-agent/eval-datasets/references/eval-lineage.md

1# Eval Lineage — Full Traceability from Production to Deployment
2 
3Track the complete chain from production traces through dataset creation, evaluation runs, comparisons, and deployment decisions. Enables "why was this deployed?" audit queries and compliance reporting.
4 
5## Lineage Chain
6 
7```
8Production Trace (App Insights)
9    │ conversationId, responseId
10    ▼
11Dataset Version (.foundry/datasets/*.jsonl, environment-scoped)
12    │ metadata.conversationId, metadata.harvestRule
13    ▼
14Evaluation Run (evaluation_agent_batch_eval_create)
15    │ evaluationId when creating, evalId when querying, evalRunId
16    ▼
17Comparison (evaluation_comparison_create)
18    │ insightId, baselineRunId, treatmentRunIds
19    ▼
20Deployment Decision (agent_update)
21    │ agentVersion
22    ▼
23Production Trace (cycle repeats)
24```
25 
26## Lineage Manifest
27 
28Track lineage in `.foundry/datasets/manifest.json`:
29 
30```json
31{
32  "datasets": [
33    {
34      "name": "support-bot-prod-traces",
35      "file": "support-bot-prod-traces-v3.jsonl",
36      "version": "v3",
37      "tag": "prod",
38      "source": "trace-harvest",
39      "harvestRule": "error+latency",
40      "timeRange": "2025-02-01 to 2025-02-07",
41      "exampleCount": 63,
42      "createdAt": "2025-02-08T10:00:00Z",
43      "evalRuns": [
44        {
45          "evalId": "eval-group-001",
46          "runId": "run-abc-123",
47          "agentVersion": "3",
48          "date": "2025-02-08T12:00:00Z",
49          "status": "completed"
50        },
51        {
52          "evalId": "eval-group-001",
53          "runId": "run-def-456",
54          "agentVersion": "4",
55          "date": "2025-02-10T09:00:00Z",
56          "status": "completed"
57        }
58      ],
59      "comparisons": [
60        {
61          "insightId": "insight-xyz-789",
62          "baselineRunId": "run-abc-123",
63          "treatmentRunIds": ["run-def-456"],
64          "result": "v4 improved on 3/5 metrics",
65          "date": "2025-02-10T10:00:00Z"
66        }
67      ],
68      "deployments": [
69        {
70          "agentVersion": "4",
71          "deployedAt": "2025-02-10T14:00:00Z",
72          "reason": "v4 improved coherence +25%, relevance +10% vs v3"
73        }
74      ]
75    }
76  ]
77}
78```
79 
80## Audit Queries
81 
82### "Why was version X deployed?"
83 
841. Read `.foundry/datasets/manifest.json`
852. Find entries where `deployments[].agentVersion == X`
863. Show the comparison that justified the deployment
874. Show the dataset and eval runs that informed the comparison
88 
89### "What traces led to this dataset?"
90 
911. Read the dataset JSONL file
922. Extract `metadata.conversationId` from each example
933. Look up each conversation in App Insights using the [trace skill](../../trace/trace.md)
94 
95### "What evaluation history does this agent have?"
96 
971. Use **`evaluation_get`** to list all evaluation groups
982. For each group, list runs with `isRequestForRuns=true`
993. Build the timeline from [Eval Trending](eval-trending.md)
1004. Show comparisons from **`evaluation_comparison_get`**
101 
102### "Did this dataset version catch any regressions?"
103 
1041. Find the dataset version in the manifest
1052. Check `evalRuns` for runs that used this dataset
1063. Check `comparisons` for any regression results
1074. Cross-reference with `tag == "regression-<date>"` entries
108 
109## Maintaining Lineage
110 
111Update `.foundry/datasets/manifest.json` at each step:
112 
113| Event | Fields to Update |
114|-------|-----------------|
115| Dataset created | Add new entry with `name`, `version`, `source`, `exampleCount` |
116| Evaluation run | Append to `evalRuns[]` with `evalId`, `runId`, `agentVersion` |
117| Comparison | Append to `comparisons[]` with `insightId`, `result` |
118| Deployment | Append to `deployments[]` with `agentVersion`, `reason` |
119| Tag change | Update `tag` field |
120 
121> 💡 **Tip:** Store the evaluation group identifier as `evalId` in lineage/manifest records, even if the create call used the parameter name `evaluationId`.
122 
123## Next Steps
124 
125- **View metric trends** → [Eval Trending](eval-trending.md)
126- **Check for regressions** → [Eval Regression](eval-regression.md)
127- **Harvest new traces** → [Trace-to-Dataset](trace-to-dataset.md) (start the next cycle)
128

Preparing the source view

Microsoft Foundry Skill

foundry-agent/eval-datasets/references/eval-lineage.md