Source from repo

Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

155

Skill

n/a

Size

976.3 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

finetuning/references/dataset-formats.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown90 linesFree

finetuning/references/dataset-formats.md

1# Dataset Formats
2 
3## SFT Format (Supervised Fine-Tuning)
4 
5Standard chat-completion JSONL. Each line: JSON object with `messages` array.
6 
7```jsonl
8{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "4"}]}
9```
10 
11**Rules:**
12- Each line must be valid JSON
13- `messages` must contain at least one `user` and one `assistant` message
14- `system` message is optional but recommended
15- Multi-turn supported: alternate `user`/`assistant`
16- Last message must be `assistant` (that's what the model learns)
17 
18**Validation checklist:** `.jsonl` extension, valid JSON per line, every example has `messages`, every message has `role` and `content`, no empty `content`.
19 
20## DPO Format (Direct Preference Optimization)
21 
22Three top-level fields: `input`, `preferred_output`, `non_preferred_output`.
23 
24```jsonl
25{"input": {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain gravity."}]}, "preferred_output": [{"role": "assistant", "content": "Gravity is a fundamental force that attracts objects with mass toward each other."}], "non_preferred_output": [{"role": "assistant", "content": "Gravity is when stuff falls down."}]}
26```
27 
28**Rules:**
29- `input`: Object with `messages` array (system + user turns). May include `tools` and `parallel_tool_calls`.
30- `preferred_output` / `non_preferred_output`: Array of messages (`assistant` or `tool` role only)
31- Both must contain at least one `assistant` message
32- Exactly two completions compared per example
33 
34**DPO REST API example:**
35```json
36{
37  "model": "gpt-4.1-mini-2025-04-14",
38  "training_file": "file-abc123",
39  "method": {
40    "type": "dpo",
41    "dpo": { "beta": 0.1, "l2_multiplier": 0.1 }
42  }
43}
44```
45 
46## RFT Format (Reinforcement Fine-Tuning)
47 
48Chat-completion format with key differences from SFT:
49 
50```jsonl
51{"messages": [{"role": "user", "content": "Write a Python function to reverse a string."}], "reference_code": "def reverse_string(s):\n    return s[::-1]", "expected_output": "olleh"}
52```
53 
54**Rules:**
55- Last message **MUST** be `user` role (model generates its own response)
56- Extra fields alongside `messages` are accessible to grader via `item.*`
57- Both training and validation datasets are **required**
58- ⚠️ Do NOT put `assistant` as last message — unlike SFT, RFT generates its own outputs
59 
60**API version**: Python graders require `api-version=2025-04-01-preview` or later.
61 
62**Grader types:** `string_check` (exact match), `text_similarity` (fuzzy/BLEU/ROUGE), `python` (custom function), `score_model` (LLM judge), `multi` (weighted combination).
63 
64**Python grader template:**
65```python
66def grade(sample, item):
67    """
68    sample: dict with 'output_text' (model's generation)
69    item: dict with extra fields from JSONL
70    Returns: float 0.0–1.0
71    """
72    output = sample.get("output_text", "")
73    reference = item.get("reference_code", "")
74    return score
75```
76 
77**Python grader constraints:** 256KB code max, no network, 2GB memory, 1GB disk, 2min timeout.
78 
79**Grader field access:**
80- `sample.output_text` → model's generation
81- `sample.output_json` → structured output (if using response_format)
82- `item.*` → extra JSONL fields
83- Template variables: `{{item.field_name}}` — no spaces inside braces, no array indexing
84 
85## Converting Between Formats
86 
87- **SFT → RFT**: Strip assistant messages (RFT last message must be `user`), add grader reference fields. Use `scripts/convert_dataset.py --format rft`.
88- **SFT → DPO**: Generate rejected responses (run base model on same prompts, intentionally degrade good outputs, or use human ranking).
89- **DPO → SFT**: Extract chosen responses from the preferred output.
90

Preparing the source view

Microsoft Foundry Skill

finetuning/references/dataset-formats.md