Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

154

Skill

n/a

Size

976.2 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

finetuning/references/hyperparameters.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown82 linesFree

finetuning/references/hyperparameters.md

1# Hyperparameter Guide
2 
3## SFT / DPO Core Parameters
4 
5| Parameter | What it controls | Default | Typical range |
6|-----------|-----------------|---------|---------------|
7| **Epochs** | Passes through data | 2 | 1–5 |
8| **Learning rate multiplier** | Weight change aggressiveness | 1.0 | 0.1–2.0 |
9| **Batch size** | Examples per gradient step | Model-dependent | 4–32 |
10 
11### Dataset Size vs Epochs
12 
13| Dataset size | Recommended epochs |
14|-------------|-------------------|
15| < 100 examples | 3–5 |
16| 100–500 examples | 2–3 |
17| 500–2,000 examples | 1–2 |
18| > 2,000 examples | 1 |
19 
20### Learning Rate Guidelines
21- **Higher LR** (1.5–2.0): Large/diverse datasets, task very different from pre-training
22- **Lower LR** (0.1–0.5): Small datasets (<200), refining not overwriting base behavior
23- For 1,000+ examples, LR 0.2–0.5 often beats default 1.0
24 
25### DPO-Specific Parameters
26- `beta` (default 0.1): Alignment strength. Lower = more conservative.
27- `l2_multiplier` (default 0.1): Regularization to prevent drift from base model.
28 
29## HP Sweep Strategy
30 
31| Run | Epochs | LR | Why |
32|-----|--------|----|-----|
33| 1 | 2 | 1.0 | Baseline |
34| 2 | 2 | 0.5 | Conservative |
35| 3 | 2 | 1.5 | Aggressive |
36| 4 | 3 | 1.0 | More training |
37| 5 | 1 | 1.0 | Minimal intervention |
38 
39## Checkpoint Trick
40 
41When overfitting (val loss rises after epoch 2): deploy the epoch-2 checkpoint directly instead of retraining. Azure saves checkpoints at each epoch boundary.
42 
43```python
44checkpoints = client.fine_tuning.jobs.checkpoints.list(job_id)
45for cp in checkpoints.data:
46    print(f"Step {cp.step_number}: val_loss={cp.metrics.valid_loss}")
47```
48 
49## Model-Specific Recommendations
50 
51| Model | Recommended Start | Notes |
52|-------|------------------|-------|
53| gpt-4.1-mini | 2ep, lr=0.5–1.0 | Very capable base; small nudges work |
54| gpt-4.1-nano | 2–3ep, lr=1.0–1.5 | Smaller capacity, needs more epochs |
55| gpt-oss-20b | 2ep, lr=0.2–0.5 | Lower LR critical; deployment may need capacity=100 |
56| o4-mini (RFT) | Grader quality > HPs | Focus on grader, not HP sweep |
57 
58## OSS Model Parameters
59 
60All OSS models require `trainingType: "globalStandard"` in the API request.
61 
62| Model | Recommended Start | Best Found | Notes |
63|-------|------------------|------------|-------|
64| Ministral-3B | 5ep, lr=1.0 | 10ep, lr=0.5 | Small model, slow convergence |
65| gpt-oss-20b | 2ep, lr=0.3 | 2ep, lr=0.3 | lr=1.0 overfits quickly |
66| Llama-3.3-70B | 3ep, lr=0.3 | 5ep, lr=0.5 | lr=2.0 causes catastrophic degradation |
67| Qwen-3-32B | 3ep, lr=0.3 | 3ep, lr=0.3 | Most fragile — more data can hurt |
68 
69**Key patterns**: OSS models need 2–5× more epochs than nano. Lower LR (0.3–0.5) is safer. More data doesn't always help.
70 
71## RFT Hyperparameters
72 
73| Parameter | Description | Recommended Start |
74|-----------|-------------|-------------------|
75| `reasoning_effort` | `"low"`, `"medium"`, `"high"` | `"medium"` |
76| `compute_multiplier` | Scales rollouts per step | `1.5` |
77| `learning_rate_multiplier` | Scales LR | `1.0` |
78| `n_epochs` | Data passes | `2–3` |
79| `eval_interval` | Eval every N steps | `5` |
80| `eval_samples` | Validation examples per eval | `10` |
81| `max_episode_steps` | Max tool calls + reasoning steps | `5–10` |
82

Microsoft Foundry Skill

finetuning/references/hyperparameters.md

Preparing the source view

Microsoft Foundry Skill

finetuning/references/hyperparameters.md