Source from repo
Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page
Files
Skill
n/a
Size
564.8 KB
Entrypoint
SKILL.md
Format
git-repo
Open file
foundry-agent/observe/references/deploy-and-setup.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown111 linesFree
foundry-agent/observe/references/deploy-and-setup.md
1# Step 1 — Auto-Setup Evaluators & Dataset
2 
3> **This step runs automatically after deployment.** If the agent was deployed via the [deploy skill](../../deploy/deploy.md), `.foundry` cache and metadata may already be configured. Check `.foundry/evaluators/`, `.foundry/datasets/`, and the selected metadata file under the selected agent root before re-creating them.
4>
5> If the agent is **not yet deployed**, follow the [deploy skill](../../deploy/deploy.md) first. It handles project detection, Dockerfile generation, ACR build, agent creation, verification, and auto-creates `.foundry` cache after a successful deployment.
6 
7## Auto-Create Evaluators & Dataset
8 
9> **This step is fully automatic.** After deployment, immediately prepare evaluators and a local test dataset for the selected environment without waiting for the user to request it.
10 
11### 1. Read Agent Instructions
12 
13Use **`agent_get`** (or local `agent.yaml` in the selected agent root) to understand the agent's purpose and capabilities.
14 
15### 2. Reuse or Refresh Cache
16 
17Inspect `.foundry/evaluators/`, `.foundry/datasets/`, and the selected environment's `evaluationSuites[]` in the selected agent root only. Do **not** merge sibling agent folders. If the selected environment still uses older `testSuites[]` or legacy `testCases[]`, normalize that list to evaluation suites first and plan to rewrite that environment as `evaluationSuites[]` when this step persists metadata.
18 
19- **Cache is current** -> reuse it and summarize what is already available.
20- **Cache is missing or stale** -> refresh it after confirming with the user.
21- **User explicitly asks for refresh** -> rebuild and rewrite only the selected environment's cache in the selected agent root.
22 
23### 2.5 Discover Existing Evaluators
24 
25Use **`evaluator_catalog_get`** with the selected environment's project endpoint to list all evaluators already registered in the project. Display them to the user grouped by type (`custom` vs `built-in`) with name, category, and version. During Phase 1, catalog any promising custom evaluators for later reuse, but keep the first run on the built-in baseline. Only propose creating a new evaluator in Phase 2 when no existing evaluator covers a required dimension.
26 
27### 3. Select Evaluators
28 
29Follow the [Two-Phase Evaluator Strategy](../observe.md). Phase 1 is built-in only, so do not create a new custom evaluator during the initial setup pass.
30 
31Start with <=5 built-in evaluators for the initial eval run so the first pass stays fast:
32 
33| Category | Evaluators |
34|----------|-----------|
35| **Quality (built-in)** | relevance, task_adherence, intent_resolution |
36| **Safety (built-in)** | indirect_attack |
37| **Tool use (built-in, conditional)** | tool_call_accuracy (use when the agent calls tools; some catalogs label it as `builtin.tool_call_accuracy`) |
38 
39After analyzing initial results, suggest additional evaluators (custom or built-in) targeted at specific failure patterns instead of front-loading a broad default set.
40 
41### 4. Defer New Custom Evaluators to Phase 2
42 
43During the initial setup pass, do not create a new custom evaluator yet. Instead, record which existing custom evaluators from Step 2.5 might be reused later and run the first built-in-only eval. After the first run has been analyzed, return to this step only if the built-in judges still miss an important pattern.
44 
45When Phase 2 is needed:
46 
471. Call **`evaluator_catalog_get`** again and reuse an existing custom evaluator if it already covers the gap.
482. Only if the catalog still lacks the required signal, use **`evaluator_catalog_create`** with the selected environment's project endpoint.
493. Prefer evaluators that consume `expected_behavior`, as described in the [Two-Phase Evaluator Strategy](../observe.md), so scoring can follow the per-query rubric instead of only the global agent instructions.
504. Before passing `promptText` to `evaluator_catalog_create`, remove or rewrite any user-provided output-format instructions that conflict with the custom evaluator contract. The runtime-enforced JSON fields are `result` and `reason`; do not preserve alternate schemas such as `score`/`reasoning` or duplicate mandatory output blocks.
51 
52| Parameter | Required | Description |
53|-----------|----------|-------------|
54| `projectEndpoint` | ✅ | Azure AI Project endpoint |
55| `name` | ✅ | For example, `domain_accuracy`, `citation_quality` |
56| `category` | ✅ | `quality`, `safety`, or `agents` |
57| `scoringType` | ✅ | `ordinal`, `continuous`, or `boolean` |
58| `promptText` | ✅* | Template with `{{query}}`, `{{response}}`, and `{{expected_behavior}}` placeholders when behavior-specific scoring is needed. Keep rubric instructions, but omit conflicting output JSON schemas; the runtime enforces `result` and `reason`. |
59| `minScore` / `maxScore` | | Default: 1 / 5 |
60| `passThreshold` | | Scores >= this value pass |
61 
62### 5. Identify LLM-Judge Deployment
63 
64Use **`model_deployment_get`** to list the selected project's actual model deployments, then choose one that supports chat completions for quality evaluators. Do **not** assume `gpt-4o` exists in the project. If no deployment supports chat completions, stop the setup flow and explain that quality evaluators need a compatible judge deployment.
65 
66### 6. Generate Local Test Dataset
67 
68Generate the seed rows directly from the selected agent root's instructions and tool capabilities you already resolved during setup. Do **not** call the identified chat-capable deployment for dataset generation; reserve that deployment for quality evaluators. Save the initial seed file to `.foundry/datasets/<agent-name>-eval-seed-v1.jsonl` with each line containing at minimum `query` and `expected_behavior` fields (optionally `context`, `ground_truth`).
69 
70The local filename must start with the selected environment's Foundry agent name (`agentName` in the selected metadata file) before adding stage, environment, or version suffixes.
71 
72Include `expected_behavior` even though Phase 1 uses built-in evaluators only. That field pre-positions the seed dataset for Phase 2 custom evaluators if the first run reveals gaps that need a per-query behavioral rubric.
73 
74Use [Generate Seed Evaluation Dataset](../../eval-datasets/references/generate-seed-dataset.md) as the single source of truth for registration. It covers `project_connection_list` with `AzureStorageAccount`, key-based versus AAD upload, `evaluation_dataset_create` with `connectionName`, and saving the returned `datasetUri`.
75 
76### 7. Persist Artifacts and Evaluation Suites
77 
78```text
79.foundry/
80  agent-metadata.yaml
81  agent-metadata.prod.yaml
82  evaluators/
83    <name>.yaml
84  datasets/
85    *.jsonl
86  results/
87    <environment>/
88      <eval-id>/
89        <run-id>.json
90```
91 
92Save evaluator definitions to `.foundry/evaluators/<name>.yaml`, test data to `.foundry/datasets/*.jsonl`, and create or update evaluation suites in the selected metadata file with:
93- `id`
94- `tags` (freeform key/value map, for example `tier: smoke`, `purpose: baseline`, `stage: seed`)
95- `dataset` (for example, `<agent-name>-eval-seed`)
96- `datasetVersion` (for example, `v1`)
97- `datasetFile` (for example, `.foundry/datasets/<agent-name>-eval-seed-v1.jsonl`)
98- `datasetUri` (returned by `evaluation_dataset_create`)
99- tag values for `agent`, `stage`, and `version`
100- evaluator names and thresholds
101 
102If the selected environment still uses older `testSuites[]` or legacy `testCases[]`, replace that list with `evaluationSuites[]` in the rewritten metadata. Preserve dataset/evaluator fields and map `priority` to `tags.tier` only when `tags.tier` is missing.
103 
104> ⚠️ **Show Data Viewer deeplinks (for VS Code runtime only):** Append a Data Viewer deeplink immediately after reference to a dataset file in your response. Format: "[Open in Data Viewer](vscode://ms-windows-ai-studio.windows-ai-studio/open_data_viewer?file=<file_path>&source=microsoft-foundry-skill) for details and perform analysis".
105 
106### 8. Prompt User
107 
108*"Your agent is deployed and running in the selected environment. The `.foundry` cache now contains evaluators, a local seed dataset, the Foundry dataset registration metadata, and evaluation-suite metadata. Would you like to run an evaluation to identify optimization opportunities?"*
109 
110If yes -> proceed to [Step 2: Evaluate](evaluate-step.md). If no -> stop.
111
Preparing the source view

Microsoft Foundry Skill

foundry-agent/observe/references/deploy-and-setup.md