Full Pipeline Workflow
End-to-end fine-tuning on Azure AI Foundry in 9 phases.
Prerequisites
- Azure AI Foundry resource with fine-tuning enabled
- Python 3.10+ with
openaiandrequests - Azure CLI (
az) authenticated - A clear task definition: what should the model do differently after fine-tuning?
Phase 1: Define the Task
Answer before touching data or models:
- What task? (e.g., "translate natural language to Python code")
- What does good output look like? Write 5 examples by hand.
- What does bad output look like? Write 3 anti-examples.
- How will you measure success? Define evaluation dimensions (see
references/grader-design.md). - Which base model? Pick 1-3 candidates from the supported model list.
Phase 2: Prepare the Dataset
Option A: You Have Data
- Convert to SFT JSONL format (see
references/dataset-formats.md) - Split: 80% train, 10% validation, 10% held-out test
- Remove or fix low-quality examples
Option B: Synthetic Data
- Generate using LLM prompts (see
workflows/dataset-creation.md) - Convert to SFT JSONL with
scripts/convert_dataset.py
Option C: Hybrid (Seed + Synthetic)
- Use existing data as seed, generate synthetic variations
- Merge, deduplicate, and quality-filter
Checkpoint: You should have training.jsonl, validation.jsonl, and test.jsonl (never used for training).
Phase 3: Establish Baselines
- Deploy base model (or use existing deployment)
- Record scores — this is your "zero" that every fine-tune must beat
Phase 4: Choose Training Type
See references/training-types.md for the full decision framework.
| Condition | Training Type |
|---|---|
| Have input-output pairs | SFT |
| Can write a grading function | RFT (reasoning models only) |
| Need style alignment | DPO |
Most projects start with SFT. Move to RFT/DPO only if SFT isn't sufficient.
Phase 5: Upload and Submit Training
Use scripts/submit_training.py or the API directly. See references/hyperparameters.md for starting HP values.
Foundry CLI alternative (no Python):
azd ai finetuning jobs submit -f ./fine-tune-job.yamlPhase 6: Monitor and Analyze
- Wait for completion or use
scripts/monitor_training.py - Analyze training curves with
scripts/check_training.py - Read
references/training-curves.mdto interpret results - Check for overfitting — consider deploying an earlier checkpoint if detected
Phase 7: Evaluate Fine-Tuned Model
- Deploy fine-tuned model (see
references/deployment.mdfor format/SKU) - Compare against baseline and previous experiments
- Delete deployment after evaluation
Phase 8: Iterate
Follow workflows/iterative-training.md:
- Adjust hyperparameters based on training curves
- Try different data subsets or augmentations
- Test different base models
- Track everything in your leaderboard
Phase 9: Ship
When the model convincingly beats baseline:
- Deploy with production-appropriate capacity
- Monitor with Application Insights
- Periodically re-evaluate against test set for regression
- Retrain as new data becomes available