Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

154

Skill

n/a

Size

976.2 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

finetuning/workflows/iterative-training.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown95 linesFree

finetuning/workflows/iterative-training.md

1# Iterative Training Workflow
2 
3Systematically improve a fine-tuned model through successive experiments.
4 
5## The Core Loop
6 
7```
81. Train with current config
92. Analyze training curves
103. Evaluate on held-out set
114. Diagnose what to change
125. Plan next experiment
13→ Better than baseline? → Good enough? → Ship it (or loop back to 4)
14```
15 
16**Rule**: Change ONE variable per experiment.
17 
18## Experiment Tracking
19 
20| Run | Base model | Dataset | Epochs | LR | Batch | Best val_loss | Combined eval |
21|-----|-----------|---------|--------|-----|-------|--------------|---------------|
22| R1 | gpt-4.1-mini | v1 (335 ex) | 2 | 1.0 | default | 0.320 | 8.05 |
23| R2 | gpt-4.1-mini | v1 (335 ex) | 2 | 0.5 | default | 0.310 | 9.15 |
24| ... | ... | ... | ... | ... | ... | ... | ... |
25 
26## What to Try (Priority Order)
27 
28### Priority 1: Data Quality (highest leverage)
29- **Fix inconsistencies**: Contradicting examples confuse the model
30- **Add diversity**: Add examples for input types the model fails on
31- **Reduce noise**: Remove "correct but not ideal" outputs
32 
33### Priority 2: Hyperparameters
34 
35See `references/hyperparameters.md` for full guide.
36 
37**Quick sweep strategy:**
381. Baseline: epochs=2, lr=1.0
392. Overfitting → lr=0.5 or epochs=1
403. Underfitting → lr=1.5 or epochs=3
414. Good LR found → try batch_size=16 or 32
42 
43### Priority 3: Base Model
44 
45| Model | Best for |
46|-------|----------|
47| gpt-4.1-mini | Best quality-per-dollar, most tasks |
48| gpt-4.1-nano | Fastest inference, simple tasks |
49| gpt-oss-20b | Large datasets, lowest absolute loss |
50| Ministral-3B | Lightweight, fast inference |
51| Qwen-3-32B, Llama-3.3-70B | Multilingual or specialized tasks |
52 
53### Priority 4: Training Type
54- SFT plateaued + need better reasoning → RFT (if model supports it)
55- Need style alignment → DPO
56- See `references/training-types.md` before switching
57 
58## Diagnostic Decision Tree
59 
60```
61Training curves healthy (no overfitting)?
62├─ Yes
63│  ├─ Eval improved? → Refine further
64│  └─ Eval same/worse? → Data quality issue — filter or augment
65└─ No (overfitting)
66   ├─ Earlier checkpoint evals well? → Deploy that checkpoint
67   ├─ Not severe → Reduce epochs or lower LR
68   └─ Severe (ratio > 2.0)
69      ├─ Dataset too small → Add more data
70      └─ Dataset large → Lower LR dramatically (0.1-0.3)
71```
72 
73## When to Stop
74 
751. Beaten baseline by meaningful margin (>5%) and last 3 experiments didn't improve
762. Diminishing returns: each experiment improves < 0.1 points
773. Model is "good enough" for production
784. Budget exhausted (time or money)
79 
80## Multi-Model Strategy
81 
82Run the same dataset through 2-3 base models:
831. **gpt-4.1-mini** — primary candidate
842. **gpt-oss-20b** — large-dataset specialist (500+ examples)
853. **gpt-4.1-nano** — fast inference option
86 
87## Common Mistakes
88 
891. Not establishing a baseline first
902. Changing multiple variables at once
913. Overfitting to the eval set (keep a separate final test set)
924. Ignoring training curves (they tell you what to change next)
935. More data without quality check (lower-quality data often makes things worse)
946. Not cleaning up old deployments (wastes quota and money)
95

Microsoft Foundry Skill

finetuning/workflows/iterative-training.md

Preparing the source view

Microsoft Foundry Skill

finetuning/workflows/iterative-training.md