Source from repo

Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

546.7 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

foundry-agent/eval-datasets/references/dataset-curation.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown103 linesFree

foundry-agent/eval-datasets/references/dataset-curation.md

1# Dataset Curation — Human-in-the-Loop Review
2 
3Review, annotate, and approve harvested trace candidates before including them in evaluation datasets. This ensures dataset quality by adding a human review gate between raw trace extraction and finalized test cases.
4 
5## Workflow Overview
6 
7```
8Raw Traces (from KQL harvest)
9    │
10    ▼
11[1] Candidate File (unreviewed)
12    │
13    ▼
14[2] Human Review (approve/edit/reject each)
15    │
16    ▼
17[3] Approved Dataset (versioned, ready for eval)
18```
19 
20## Step 1 — Generate Candidate File
21 
22After running a [trace harvest](trace-to-dataset.md), save candidates with a `status` field:
23 
24```
25.foundry/datasets/<agent-name>-traces-candidates-<date>.jsonl
26```
27 
28Each line includes a review status:
29 
30```json
31{"query": "How do I reset my password?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-abc-123", "harvestRule": "error", "errorType": "TimeoutError", "duration": 12300}}
32{"query": "What's the refund policy?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-def-456", "harvestRule": "latency", "duration": 8700}}
33```
34 
35## Step 2 — Present for Review
36 
37Show candidates in a review table:
38 
39| # | Status | Query (preview) | Source | Error | Duration | Eval Score |
40|---|--------|----------------|--------|-------|----------|------------|
41| 1 | ⏳ pending | "How do I reset my..." | error harvest | TimeoutError | 12.3s | — |
42| 2 | ⏳ pending | "What's the refund..." | latency harvest | — | 8.7s | — |
43| 3 | ⏳ pending | "Can you help me..." | low-eval harvest | — | 0.4s | 2.0 |
44 
45### Review Actions
46 
47For each candidate, the user can:
48 
49| Action | Result |
50|--------|--------|
51| **Approve** | Include in dataset as-is |
52| **Approve + Edit** | Include with modified query/response/ground_truth |
53| **Add Ground Truth** | Approve and add the expected correct answer |
54| **Reject** | Exclude from dataset |
55| **Flag** | Mark for later review |
56 
57### Batch Operations
58 
59- *"Approve all"* — include all pending candidates
60- *"Approve all errors"* — include all candidates from error harvest
61- *"Reject duplicates"* — exclude candidates with similar queries to existing dataset entries
62- *"Approve #1, #3, #5; reject #2, #4"* — selective approval by number
63 
64## Step 3 — Finalize Dataset
65 
66After review, filter approved candidates and save to a versioned dataset:
67 
681. Read `.foundry/datasets/manifest.json` to find the latest version number
692. Filter candidates where `status == "approved"`
703. Remove the `status` field from the output
714. Save to `.foundry/datasets/<agent-name>-<source>-v<N>.jsonl`
725. Update `.foundry/datasets/manifest.json` with metadata
73 
74### Update Candidate Status
75 
76Mark the candidate file with final statuses:
77 
78```json
79{"query": "How do I reset my password?", "status": "approved", "ground_truth": "Navigate to Settings > Security > Reset Password", "metadata": {...}}
80{"query": "What's the refund policy?", "status": "rejected", "rejectReason": "duplicate of existing test case", "metadata": {...}}
81{"query": "Can you help me...", "status": "approved", "metadata": {...}}
82```
83 
84> 💡 **Tip:** Keep candidate files as an audit trail. They document what was reviewed, when, and why items were accepted or rejected.
85 
86## Quality Checks
87 
88Before finalizing, verify dataset quality:
89 
90| Check | Criteria |
91|-------|----------|
92| **No duplicates** | Ensure no query appears in both the new dataset and existing datasets |
93| **Balanced categories** | Verify reasonable distribution across categories (not all edge-cases) |
94| **Ground truth coverage** | Flag examples without ground_truth that may benefit from one |
95| **Minimum size** | Warn if dataset has fewer than 20 examples (may not be statistically meaningful) |
96| **Safety coverage** | Ensure safety-related test cases are included if the agent handles sensitive topics |
97 
98## Next Steps
99 
100- **Version the approved dataset** → [Dataset Versioning](dataset-versioning.md)
101- **Organize into splits** → [Dataset Organization](dataset-organization.md)
102- **Run evaluation** → [observe skill Step 2](../../observe/references/evaluate-step.md)
103

Preparing the source view

Microsoft Foundry Skill

foundry-agent/eval-datasets/references/dataset-curation.md