Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
finetuning/references/training-types.md
1# Training Types: SFT vs DPO vs RFT23## Decision Matrix45| Factor | SFT | DPO | RFT |6|--------|-----|-----|-----|7| **Best for** | Teaching a new skill or format | Aligning preferences/style | Improving reasoning chains |8| **Data needed** | Input–output pairs | Chosen/rejected pairs | Prompts + grading function |9| **Data volume** | 50–5,000 examples | 500–5,000 pairs | 200–2,000 prompts |10| **Effort to prepare data** | Low | High (need contrasting pairs) | Medium (need grader, not outputs) |11| **Risk of regression** | Low | Medium | High (sensitive to grader quality) |12| **Typical improvement** | 5–30% on task metrics | Subtle style/safety shifts | 0–15% on reasoning tasks |13| **Supported models** | Most models | Select models | o4-mini |1415## When to Use Each1617### SFT (Supervised Fine-Tuning)18- You have high-quality input–output pairs19- Task is well-defined (code generation, classification, extraction, summarization)20- You want reliable, repeatable outputs in a specific format or style21- **Key insight**: 300–500 high-quality examples often outperforms 1,500+ lower-quality ones2223### DPO (Direct Preference Optimization)24- You want to adjust tone, verbosity, safety, or style25- You have examples of "good" and "bad" outputs for the same input26- SFT already works but outputs need refinement27- DPO-specific params: `beta` (default 0.1), `l2_multiplier` (default 0.1)2829### RFT (Reinforcement Fine-Tuning)30- Task has objectively verifiable answers (code execution, math, logic)31- You can write a programmatic or LLM-based grader32- You want to improve the model's reasoning, not just its outputs33- **Critical**: RFT is extremely sensitive to grader quality. Train–val gap should be ≤ 0.05.3435## Choosing a Path3637```38├─ Do you have labeled input–output pairs?39│ ├─ Yes → SFT40│ └─ No41│ ├─ Can you write a grading function? → RFT42│ └─ Can you rank "good" vs "bad" outputs? → DPO43│44After SFT:45├─ Results good enough? → Ship it46├─ Need style refinement? → DPO on top of SFT model47└─ Reasoning needs improvement? → RFT (if model supports it)48```4950## Model Compatibility (Azure AI Foundry)5152| Model | SFT | DPO | RFT | Vision FT |53|-------|-----|-----|-----|-----------|54| gpt-4.1 | ✅ | ✅ | ❌ | ✅ |55| gpt-4.1-mini | ✅ | ❌ | ❌ | ❌ |56| gpt-4.1-nano | ✅ | ❌ | ❌ | ❌ |57| gpt-4o (2024-08-06) | ✅ | ✅ | ❌ | ✅ |58| gpt-4o-mini | ✅ | ❌ | ❌ | ❌ |59| o4-mini | ❌ | ❌ | ✅ | ❌ |60| gpt-5 | ❌ | ❌ | ✅ ⚠️ | ❌ |61| gpt-oss-20b | ✅ | ❌ | ❌ | ❌ |62| Ministral-3B | ✅ | ❌ | ❌ | ❌ |63| Llama-3.3-70B | ✅ | ❌ | ❌ | ❌ |64| Qwen-3-32B | ✅ | ❌ | ❌ | ❌ |6566DPO can be applied on top of an already SFT-fine-tuned model. Vision fine-tuning follows the same SFT workflow but with image data in messages.6768> ⚠️ **Feature flags**: GPT-5 RFT and agentic RFT with tool calling require access requests. Contact your Microsoft account team or request access through the Azure AI Foundry portal. o4-mini RFT without tools is generally available.6970*Check Azure AI Foundry docs for the latest model availability.*71