Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

154

Skill

n/a

Size

976.2 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

finetuning/references/dataset-formats.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown90 linesFree

finetuning/references/dataset-formats.md

1# Dataset Formats
2 
3## SFT Format (Supervised Fine-Tuning)
4 
5Standard chat-completion JSONL. Each line: JSON object with `messages` array.
6 
7```jsonl
8{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "4"}]}
9```
10 
11**Rules:**
12- Each line must be valid JSON
13- `messages` must contain at least one `user` and one `assistant` message
14- `system` message is optional but recommended
15- Multi-turn supported: alternate `user`/`assistant`
16- Last message must be `assistant` (that's what the model learns)
17 
18**Validation checklist:** `.jsonl` extension, valid JSON per line, every example has `messages`, every message has `role` and `content`, no empty `content`.
19 
20## DPO Format (Direct Preference Optimization)
21 
22Three top-level fields: `input`, `preferred_output`, `non_preferred_output`.
23 
24```jsonl
25{"input": {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain gravity."}]}, "preferred_output": [{"role": "assistant", "content": "Gravity is a fundamental force that attracts objects with mass toward each other."}], "non_preferred_output": [{"role": "assistant", "content": "Gravity is when stuff falls down."}]}
26```
27 
28**Rules:**
29- `input`: Object with `messages` array (system + user turns). May include `tools` and `parallel_tool_calls`.
30- `preferred_output` / `non_preferred_output`: Array of messages (`assistant` or `tool` role only)
31- Both must contain at least one `assistant` message
32- Exactly two completions compared per example
33 
34**DPO REST API example:**
35```json
36{
37  "model": "gpt-4.1-mini-2025-04-14",
38  "training_file": "file-abc123",
39  "method": {
40    "type": "dpo",
41    "dpo": { "beta": 0.1, "l2_multiplier": 0.1 }
42  }
43}
44```
45 
46## RFT Format (Reinforcement Fine-Tuning)
47 
48Chat-completion format with key differences from SFT:
49 
50```jsonl
51{"messages": [{"role": "user", "content": "Write a Python function to reverse a string."}], "reference_code": "def reverse_string(s):\n    return s[::-1]", "expected_output": "olleh"}
52```
53 
54**Rules:**
55- Last message **MUST** be `user` role (model generates its own response)
56- Extra fields alongside `messages` are accessible to grader via `item.*`
57- Both training and validation datasets are **required**
58- ⚠️ Do NOT put `assistant` as last message — unlike SFT, RFT generates its own outputs
59 
60**API version**: Python graders require `api-version=2025-04-01-preview` or later.
61 
62**Grader types:** `string_check` (exact match), `text_similarity` (fuzzy/BLEU/ROUGE), `python` (custom function), `score_model` (LLM judge), `multi` (weighted combination).
63 
64**Python grader template:**
65```python
66def grade(sample, item):
67    """
68    sample: dict with 'output_text' (model's generation)
69    item: dict with extra fields from JSONL
70    Returns: float 0.0–1.0
71    """
72    output = sample.get("output_text", "")
73    reference = item.get("reference_code", "")
74    return score
75```
76 
77**Python grader constraints:** 256KB code max, no network, 2GB memory, 1GB disk, 2min timeout.
78 
79**Grader field access:**
80- `sample.output_text` → model's generation
81- `sample.output_json` → structured output (if using response_format)
82- `item.*` → extra JSONL fields
83- Template variables: `{{item.field_name}}` — no spaces inside braces, no array indexing
84 
85## Converting Between Formats
86 
87- **SFT → RFT**: Strip assistant messages (RFT last message must be `user`), add grader reference fields. Use `scripts/convert_dataset.py --format rft`.
88- **SFT → DPO**: Generate rejected responses (run base model on same prompts, intentionally degrade good outputs, or use human ranking).
89- **DPO → SFT**: Extract chosen responses from the preferred output.
90

Microsoft Foundry Skill

finetuning/references/dataset-formats.md

Preparing the source view

Microsoft Foundry Skill

finetuning/references/dataset-formats.md