Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
foundry-agent/eval-datasets/references/eval-lineage.md
1# Eval Lineage — Full Traceability from Production to Deployment23Track the complete chain from production traces through dataset creation, evaluation runs, comparisons, and deployment decisions. Enables "why was this deployed?" audit queries and compliance reporting.45## Lineage Chain67```8Production Trace (App Insights)9│ conversationId, responseId10▼11Dataset Version (.foundry/datasets/*.jsonl, environment-scoped)12│ metadata.conversationId, metadata.harvestRule13▼14Evaluation Run (evaluation_agent_batch_eval_create)15│ evaluationId when creating, evalId when querying, evalRunId16▼17Comparison (evaluation_comparison_create)18│ insightId, baselineRunId, treatmentRunIds19▼20Deployment Decision (agent_update)21│ agentVersion22▼23Production Trace (cycle repeats)24```2526## Lineage Manifest2728Track lineage in `.foundry/datasets/manifest.json`:2930```json31{32"datasets": [33{34"name": "support-bot-prod-traces",35"file": "support-bot-prod-traces-v3.jsonl",36"version": "v3",37"tag": "prod",38"source": "trace-harvest",39"harvestRule": "error+latency",40"timeRange": "2025-02-01 to 2025-02-07",41"exampleCount": 63,42"createdAt": "2025-02-08T10:00:00Z",43"evalRuns": [44{45"evalId": "eval-group-001",46"runId": "run-abc-123",47"agentVersion": "3",48"date": "2025-02-08T12:00:00Z",49"status": "completed"50},51{52"evalId": "eval-group-001",53"runId": "run-def-456",54"agentVersion": "4",55"date": "2025-02-10T09:00:00Z",56"status": "completed"57}58],59"comparisons": [60{61"insightId": "insight-xyz-789",62"baselineRunId": "run-abc-123",63"treatmentRunIds": ["run-def-456"],64"result": "v4 improved on 3/5 metrics",65"date": "2025-02-10T10:00:00Z"66}67],68"deployments": [69{70"agentVersion": "4",71"deployedAt": "2025-02-10T14:00:00Z",72"reason": "v4 improved coherence +25%, relevance +10% vs v3"73}74]75}76]77}78```7980## Audit Queries8182### "Why was version X deployed?"83841. Read `.foundry/datasets/manifest.json`852. Find entries where `deployments[].agentVersion == X`863. Show the comparison that justified the deployment874. Show the dataset and eval runs that informed the comparison8889### "What traces led to this dataset?"90911. Read the dataset JSONL file922. Extract `metadata.conversationId` from each example933. Look up each conversation in App Insights using the [trace skill](../../trace/trace.md)9495### "What evaluation history does this agent have?"96971. Use **`evaluation_get`** to list all evaluation groups982. For each group, list runs with `isRequestForRuns=true`993. Build the timeline from [Eval Trending](eval-trending.md)1004. Show comparisons from **`evaluation_comparison_get`**101102### "Did this dataset version catch any regressions?"1031041. Find the dataset version in the manifest1052. Check `evalRuns` for runs that used this dataset1063. Check `comparisons` for any regression results1074. Cross-reference with `tag == "regression-<date>"` entries108109## Maintaining Lineage110111Update `.foundry/datasets/manifest.json` at each step:112113| Event | Fields to Update |114|-------|-----------------|115| Dataset created | Add new entry with `name`, `version`, `source`, `exampleCount` |116| Evaluation run | Append to `evalRuns[]` with `evalId`, `runId`, `agentVersion` |117| Comparison | Append to `comparisons[]` with `insightId`, `result` |118| Deployment | Append to `deployments[]` with `agentVersion`, `reason` |119| Tag change | Update `tag` field |120121> 💡 **Tip:** Store the evaluation group identifier as `evalId` in lineage/manifest records, even if the create call used the parameter name `evaluationId`.122123## Next Steps124125- **View metric trends** → [Eval Trending](eval-trending.md)126- **Check for regressions** → [Eval Regression](eval-regression.md)127- **Harvest new traces** → [Trace-to-Dataset](trace-to-dataset.md) (start the next cycle)128