Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
finetuning/references/dataset-formats.md
1# Dataset Formats23## SFT Format (Supervised Fine-Tuning)45Standard chat-completion JSONL. Each line: JSON object with `messages` array.67```jsonl8{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "4"}]}9```1011**Rules:**12- Each line must be valid JSON13- `messages` must contain at least one `user` and one `assistant` message14- `system` message is optional but recommended15- Multi-turn supported: alternate `user`/`assistant`16- Last message must be `assistant` (that's what the model learns)1718**Validation checklist:** `.jsonl` extension, valid JSON per line, every example has `messages`, every message has `role` and `content`, no empty `content`.1920## DPO Format (Direct Preference Optimization)2122Three top-level fields: `input`, `preferred_output`, `non_preferred_output`.2324```jsonl25{"input": {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain gravity."}]}, "preferred_output": [{"role": "assistant", "content": "Gravity is a fundamental force that attracts objects with mass toward each other."}], "non_preferred_output": [{"role": "assistant", "content": "Gravity is when stuff falls down."}]}26```2728**Rules:**29- `input`: Object with `messages` array (system + user turns). May include `tools` and `parallel_tool_calls`.30- `preferred_output` / `non_preferred_output`: Array of messages (`assistant` or `tool` role only)31- Both must contain at least one `assistant` message32- Exactly two completions compared per example3334**DPO REST API example:**35```json36{37"model": "gpt-4.1-mini-2025-04-14",38"training_file": "file-abc123",39"method": {40"type": "dpo",41"dpo": { "beta": 0.1, "l2_multiplier": 0.1 }42}43}44```4546## RFT Format (Reinforcement Fine-Tuning)4748Chat-completion format with key differences from SFT:4950```jsonl51{"messages": [{"role": "user", "content": "Write a Python function to reverse a string."}], "reference_code": "def reverse_string(s):\n return s[::-1]", "expected_output": "olleh"}52```5354**Rules:**55- Last message **MUST** be `user` role (model generates its own response)56- Extra fields alongside `messages` are accessible to grader via `item.*`57- Both training and validation datasets are **required**58- ⚠️ Do NOT put `assistant` as last message — unlike SFT, RFT generates its own outputs5960**API version**: Python graders require `api-version=2025-04-01-preview` or later.6162**Grader types:** `string_check` (exact match), `text_similarity` (fuzzy/BLEU/ROUGE), `python` (custom function), `score_model` (LLM judge), `multi` (weighted combination).6364**Python grader template:**65```python66def grade(sample, item):67"""68sample: dict with 'output_text' (model's generation)69item: dict with extra fields from JSONL70Returns: float 0.0–1.071"""72output = sample.get("output_text", "")73reference = item.get("reference_code", "")74return score75```7677**Python grader constraints:** 256KB code max, no network, 2GB memory, 1GB disk, 2min timeout.7879**Grader field access:**80- `sample.output_text` → model's generation81- `sample.output_json` → structured output (if using response_format)82- `item.*` → extra JSONL fields83- Template variables: `{{item.field_name}}` — no spaces inside braces, no array indexing8485## Converting Between Formats8687- **SFT → RFT**: Strip assistant messages (RFT last message must be `user`), add grader reference fields. Use `scripts/convert_dataset.py --format rft`.88- **SFT → DPO**: Generate rejected responses (run base model on same prompts, intentionally degrade good outputs, or use human ranking).89- **DPO → SFT**: Extract chosen responses from the preferred output.90