Source from repo

Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

546.7 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

foundry-agent/observe/references/compare-iterate.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown53 linesFree

foundry-agent/observe/references/compare-iterate.md

1# Steps 8–10 — Re-Evaluate, Compare Versions, Iterate
2 
3## Step 8 — Re-Evaluate
4 
5Use **`evaluation_agent_batch_eval_create`** with the **same `evaluationId`** as the baseline run. This places both runs in the same eval group for comparison. Use the same local test dataset (from the selected agent root's `.foundry/datasets/`) and evaluator bundle from the selected environment/evaluation suite. Update `agentVersion` to the new version.
6 
7> ⚠️ **Parameter switch reminder:** Re-evaluation creation uses `evaluationId`, but follow-up calls to `evaluation_get` and `evaluation_comparison_create` must use `evalId`.
8 
9> ⚠️ **Eval-group immutability:** Reuse the same `evaluationId` only when `evaluatorNames` and thresholds are unchanged. If you add/remove evaluators or change thresholds, create a new evaluation group first, then compare runs within that new group.
10 
11Auto-poll for completion in a background terminal (same as [Step 2](evaluate-step.md)).
12 
13## Step 9 — Compare Versions
14 
15> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing `displayName` as optional (`type: ["string", "null"]`), the API will reject requests without it with a BadRequest error. `state` must be `"NotStarted"`.
16 
17### Required Parameters for `evaluation_comparison_create`
18 
19| Parameter | Required | Description |
20|-----------|----------|-------------|
21| `insightRequest.displayName` | ✅ | Human-readable name. **Omitting causes BadRequest.** |
22| `insightRequest.state` | ✅ | Must be `"NotStarted"` |
23| `insightRequest.request.evalId` | ✅ | Eval group ID containing both runs |
24| `insightRequest.request.baselineRunId` | ✅ | Run ID of the baseline |
25| `insightRequest.request.treatmentRunIds` | ✅ | Array of treatment run IDs |
26 
27Use **`evaluation_comparison_create`** with a nested `insightRequest`:
28 
29```json
30{
31  "insightRequest": {
32    "displayName": "V1 vs V2 Comparison",
33    "state": "NotStarted",
34    "request": {
35      "type": "EvaluationComparison",
36      "evalId": "<eval-group-id>",
37      "baselineRunId": "<baseline-run-id>",
38      "treatmentRunIds": ["<new-run-id>"]
39    }
40  }
41}
42```
43 
44> **Important:** Both runs must be in the **same eval group** (same `evaluationId` in Steps 2 and 8), but comparison requests and lookups use `evalId` for that same group identifier. That shared group assumes the evaluator bundle is fixed for all runs in the group.
45 
46Then use **`evaluation_comparison_get`** (with the returned `insightId`) to retrieve comparison results. Present a summary showing which version performed better per evaluator, and recommend which version to keep.
47 
48## Step 10 — Iterate or Finish
49 
50If more categories remain in the prioritized action table (from [Step 4](analyze-results.md)), loop back to **Step 5** (dive into next category) → **Step 6** (optimize) → **Step 7** (deploy) → **Step 8** (re-evaluate) → **Step 9** (compare).
51 
52Otherwise, confirm the final agent version with the user, then prompt for [CI/CD evals & monitoring](cicd-monitoring.md).
53

Preparing the source view

Microsoft Foundry Skill

foundry-agent/observe/references/compare-iterate.md