Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
finetuning/references/dataset-formats.md
1# Dataset Formats23## SFT Format (Supervised Fine-Tuning)45Standard chat-completion JSONL. Each line: JSON object with `messages` array.67```jsonl8{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "4"}]}9```1011**Rules:**12- Each line must be valid JSON13- `messages` must contain at least one `user` and one `assistant` message14- `system` message is optional but recommended15- Multi-turn supported: alternate `user`/`assistant`16- Last message must be `assistant` (that's what the model learns)1718**Validation checklist:** `.jsonl` extension, valid JSON per line, every example has `messages`, every message has `role` and `content`, no empty `content`.1920## DPO Format (Direct Preference Optimization)2122Three top-level fields: `input`, `preferred_output`, `non_preferred_output`.2324```jsonl25{"input": {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain gravity."}]}, "preferred_output": [{"role": "assistant", "content": "Gravity is a fundamental force that attracts objects with mass toward each other."}], "non_preferred_output": [{"role": "assistant", "content": "Gravity is when stuff falls down."}]}26```2728**Rules:**29- `input`: Object with `messages` array (system + user turns). May include `tools` and `parallel_tool_calls`.30- `preferred_output` / `non_preferred_output`: Array of messages (`assistant` or `tool` role only)31- Both must contain at least one `assistant` message32- Exactly two completions compared per example3334**DPO REST API example:**35```json36{37"model": "gpt-4.1-mini-2025-04-14",38"training_file": "file-abc123",39"method": {40"type": "dpo",41"dpo": { "beta": 0.1, "l2_multiplier": 0.1 }42}43}44```4546## RFT Format (Reinforcement Fine-Tuning)4748Chat-completion format with key differences from SFT:4950```jsonl51{"messages": [{"role": "user", "content": "Write a Python function to reverse a string."}], "reference_code": "def reverse_string(s):\n return s[::-1]", "expected_output": "olleh"}52```5354**Rules:**55- Last message **MUST** be `user` role (model generates its own response)56- Extra fields alongside `messages` are accessible to grader via `item.*`57- Both training and validation datasets are **required**58- ⚠️ Do NOT put `assistant` as last message — unlike SFT, RFT generates its own outputs5960**API version**: Python graders require `api-version=2025-04-01-preview` or later.6162**Grader types:** `string_check` (exact match), `text_similarity` (fuzzy/BLEU/ROUGE), `python` (custom function), `score_model` (LLM judge), `multi` (weighted combination).6364**Python grader template:**65```python66def grade(sample, item):67"""68sample: dict with 'output_text' (model's generation)69item: dict with extra fields from JSONL70Returns: float 0.0–1.071"""72output = sample.get("output_text", "")73reference = item.get("reference_code", "")74return score75```7677**Python grader constraints:** 256KB code max, no network, 2GB memory, 1GB disk, 2min timeout.7879**Grader field access:**80- `sample.output_text` → model's generation81- `sample.output_json` → structured output (if using response_format)82- `item.*` → extra JSONL fields83- Template variables: `{{item.field_name}}` — no spaces inside braces, no array indexing8485## Converting Between Formats8687- **SFT → RFT**: Strip assistant messages (RFT last message must be `user`), add grader reference fields. Use `scripts/convert_dataset.py --format rft`.88- **SFT → DPO**: Generate rejected responses (run base model on same prompts, intentionally degrade good outputs, or use human ranking).89- **DPO → SFT**: Extract chosen responses from the preferred output.90