Source from repo

Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

155

Skill

n/a

Size

976.3 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

finetuning/workflows/iterative-training.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown95 linesFree

finetuning/workflows/iterative-training.md

1# Iterative Training Workflow
2 
3Systematically improve a fine-tuned model through successive experiments.
4 
5## The Core Loop
6 
7```
81. Train with current config
92. Analyze training curves
103. Evaluate on held-out set
114. Diagnose what to change
125. Plan next experiment
13→ Better than baseline? → Good enough? → Ship it (or loop back to 4)
14```
15 
16**Rule**: Change ONE variable per experiment.
17 
18## Experiment Tracking
19 
20| Run | Base model | Dataset | Epochs | LR | Batch | Best val_loss | Combined eval |
21|-----|-----------|---------|--------|-----|-------|--------------|---------------|
22| R1 | gpt-4.1-mini | v1 (335 ex) | 2 | 1.0 | default | 0.320 | 8.05 |
23| R2 | gpt-4.1-mini | v1 (335 ex) | 2 | 0.5 | default | 0.310 | 9.15 |
24| ... | ... | ... | ... | ... | ... | ... | ... |
25 
26## What to Try (Priority Order)
27 
28### Priority 1: Data Quality (highest leverage)
29- **Fix inconsistencies**: Contradicting examples confuse the model
30- **Add diversity**: Add examples for input types the model fails on
31- **Reduce noise**: Remove "correct but not ideal" outputs
32 
33### Priority 2: Hyperparameters
34 
35See `references/hyperparameters.md` for full guide.
36 
37**Quick sweep strategy:**
381. Baseline: epochs=2, lr=1.0
392. Overfitting → lr=0.5 or epochs=1
403. Underfitting → lr=1.5 or epochs=3
414. Good LR found → try batch_size=16 or 32
42 
43### Priority 3: Base Model
44 
45| Model | Best for |
46|-------|----------|
47| gpt-4.1-mini | Best quality-per-dollar, most tasks |
48| gpt-4.1-nano | Fastest inference, simple tasks |
49| gpt-oss-20b | Large datasets, lowest absolute loss |
50| Ministral-3B | Lightweight, fast inference |
51| Qwen-3-32B, Llama-3.3-70B | Multilingual or specialized tasks |
52 
53### Priority 4: Training Type
54- SFT plateaued + need better reasoning → RFT (if model supports it)
55- Need style alignment → DPO
56- See `references/training-types.md` before switching
57 
58## Diagnostic Decision Tree
59 
60```
61Training curves healthy (no overfitting)?
62├─ Yes
63│  ├─ Eval improved? → Refine further
64│  └─ Eval same/worse? → Data quality issue — filter or augment
65└─ No (overfitting)
66   ├─ Earlier checkpoint evals well? → Deploy that checkpoint
67   ├─ Not severe → Reduce epochs or lower LR
68   └─ Severe (ratio > 2.0)
69      ├─ Dataset too small → Add more data
70      └─ Dataset large → Lower LR dramatically (0.1-0.3)
71```
72 
73## When to Stop
74 
751. Beaten baseline by meaningful margin (>5%) and last 3 experiments didn't improve
762. Diminishing returns: each experiment improves < 0.1 points
773. Model is "good enough" for production
784. Budget exhausted (time or money)
79 
80## Multi-Model Strategy
81 
82Run the same dataset through 2-3 base models:
831. **gpt-4.1-mini** — primary candidate
842. **gpt-oss-20b** — large-dataset specialist (500+ examples)
853. **gpt-4.1-nano** — fast inference option
86 
87## Common Mistakes
88 
891. Not establishing a baseline first
902. Changing multiple variables at once
913. Overfitting to the eval set (keep a separate final test set)
924. Ignoring training curves (they tell you what to change next)
935. More data without quality check (lower-quality data often makes things worse)
946. Not cleaning up old deployments (wastes quota and money)
95

Microsoft Foundry Skill

finetuning/workflows/iterative-training.md

Preparing the source view

Microsoft Foundry Skill

finetuning/workflows/iterative-training.md