Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

560.1 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

foundry-agent/observe/references/compare-iterate.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown53 linesFree

foundry-agent/observe/references/compare-iterate.md

1# Steps 8–10 — Re-Evaluate, Compare Versions, Iterate
2 
3## Step 8 — Re-Evaluate
4 
5Use **`evaluation_agent_batch_eval_create`** with the **same `evaluationId`** as the baseline run. This places both runs in the same eval group for comparison. Use the same local test dataset (from the selected agent root's `.foundry/datasets/`) and evaluator bundle from the selected environment/evaluation suite. Update `agentVersion` to the new version.
6 
7> ⚠️ **Parameter switch reminder:** Re-evaluation creation uses `evaluationId`, but follow-up calls to `evaluation_get` and `evaluation_comparison_create` must use `evalId`.
8 
9> ⚠️ **Eval-group immutability:** Reuse the same `evaluationId` only when `evaluatorNames` and thresholds are unchanged. If you add/remove evaluators or change thresholds, create a new evaluation group first, then compare runs within that new group.
10 
11Auto-poll for completion in a background terminal (same as [Step 2](evaluate-step.md)).
12 
13## Step 9 — Compare Versions
14 
15> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing `displayName` as optional (`type: ["string", "null"]`), the API will reject requests without it with a BadRequest error. `state` must be `"NotStarted"`.
16 
17### Required Parameters for `evaluation_comparison_create`
18 
19| Parameter | Required | Description |
20|-----------|----------|-------------|
21| `insightRequest.displayName` | ✅ | Human-readable name. **Omitting causes BadRequest.** |
22| `insightRequest.state` | ✅ | Must be `"NotStarted"` |
23| `insightRequest.request.evalId` | ✅ | Eval group ID containing both runs |
24| `insightRequest.request.baselineRunId` | ✅ | Run ID of the baseline |
25| `insightRequest.request.treatmentRunIds` | ✅ | Array of treatment run IDs |
26 
27Use **`evaluation_comparison_create`** with a nested `insightRequest`:
28 
29```json
30{
31  "insightRequest": {
32    "displayName": "V1 vs V2 Comparison",
33    "state": "NotStarted",
34    "request": {
35      "type": "EvaluationComparison",
36      "evalId": "<eval-group-id>",
37      "baselineRunId": "<baseline-run-id>",
38      "treatmentRunIds": ["<new-run-id>"]
39    }
40  }
41}
42```
43 
44> **Important:** Both runs must be in the **same eval group** (same `evaluationId` in Steps 2 and 8), but comparison requests and lookups use `evalId` for that same group identifier. That shared group assumes the evaluator bundle is fixed for all runs in the group.
45 
46Then use **`evaluation_comparison_get`** (with the returned `insightId`) to retrieve comparison results. Present a summary showing which version performed better per evaluator, and recommend which version to keep.
47 
48## Step 10 — Iterate or Finish
49 
50If more categories remain in the prioritized action table (from [Step 4](analyze-results.md)), loop back to **Step 5** (dive into next category) → **Step 6** (optimize) → **Step 7** (deploy) → **Step 8** (re-evaluate) → **Step 9** (compare).
51 
52Otherwise, confirm the final agent version with the user, then prompt for [CI/CD evals & monitoring](cicd-monitoring.md).
53

Preparing the source view

Microsoft Foundry Skill

foundry-agent/observe/references/compare-iterate.md