Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

154

Skill

n/a

Size

976.2 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

finetuning/references/training-types.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown71 linesFree

finetuning/references/training-types.md

1# Training Types: SFT vs DPO vs RFT
2 
3## Decision Matrix
4 
5| Factor | SFT | DPO | RFT |
6|--------|-----|-----|-----|
7| **Best for** | Teaching a new skill or format | Aligning preferences/style | Improving reasoning chains |
8| **Data needed** | Input–output pairs | Chosen/rejected pairs | Prompts + grading function |
9| **Data volume** | 50–5,000 examples | 500–5,000 pairs | 200–2,000 prompts |
10| **Effort to prepare data** | Low | High (need contrasting pairs) | Medium (need grader, not outputs) |
11| **Risk of regression** | Low | Medium | High (sensitive to grader quality) |
12| **Typical improvement** | 5–30% on task metrics | Subtle style/safety shifts | 0–15% on reasoning tasks |
13| **Supported models** | Most models | Select models | o4-mini |
14 
15## When to Use Each
16 
17### SFT (Supervised Fine-Tuning)
18- You have high-quality input–output pairs
19- Task is well-defined (code generation, classification, extraction, summarization)
20- You want reliable, repeatable outputs in a specific format or style
21- **Key insight**: 300–500 high-quality examples often outperforms 1,500+ lower-quality ones
22 
23### DPO (Direct Preference Optimization)
24- You want to adjust tone, verbosity, safety, or style
25- You have examples of "good" and "bad" outputs for the same input
26- SFT already works but outputs need refinement
27- DPO-specific params: `beta` (default 0.1), `l2_multiplier` (default 0.1)
28 
29### RFT (Reinforcement Fine-Tuning)
30- Task has objectively verifiable answers (code execution, math, logic)
31- You can write a programmatic or LLM-based grader
32- You want to improve the model's reasoning, not just its outputs
33- **Critical**: RFT is extremely sensitive to grader quality. Train–val gap should be ≤ 0.05.
34 
35## Choosing a Path
36 
37```
38├─ Do you have labeled input–output pairs?
39│  ├─ Yes → SFT
40│  └─ No
41│     ├─ Can you write a grading function? → RFT
42│     └─ Can you rank "good" vs "bad" outputs? → DPO
43│
44After SFT:
45├─ Results good enough? → Ship it
46├─ Need style refinement? → DPO on top of SFT model
47└─ Reasoning needs improvement? → RFT (if model supports it)
48```
49 
50## Model Compatibility (Azure AI Foundry)
51 
52| Model | SFT | DPO | RFT | Vision FT |
53|-------|-----|-----|-----|-----------|
54| gpt-4.1 | ✅ | ✅ | ❌ | ✅ |
55| gpt-4.1-mini | ✅ | ❌ | ❌ | ❌ |
56| gpt-4.1-nano | ✅ | ❌ | ❌ | ❌ |
57| gpt-4o (2024-08-06) | ✅ | ✅ | ❌ | ✅ |
58| gpt-4o-mini | ✅ | ❌ | ❌ | ❌ |
59| o4-mini | ❌ | ❌ | ✅ | ❌ |
60| gpt-5 | ❌ | ❌ | ✅ ⚠️ | ❌ |
61| gpt-oss-20b | ✅ | ❌ | ❌ | ❌ |
62| Ministral-3B | ✅ | ❌ | ❌ | ❌ |
63| Llama-3.3-70B | ✅ | ❌ | ❌ | ❌ |
64| Qwen-3-32B | ✅ | ❌ | ❌ | ❌ |
65 
66DPO can be applied on top of an already SFT-fine-tuned model. Vision fine-tuning follows the same SFT workflow but with image data in messages.
67 
68> ⚠️ **Feature flags**: GPT-5 RFT and agentic RFT with tool calling require access requests. Contact your Microsoft account team or request access through the Azure AI Foundry portal. o4-mini RFT without tools is generally available.
69 
70*Check Azure AI Foundry docs for the latest model availability.*
71

Preparing the source view

Microsoft Foundry Skill

finetuning/references/training-types.md