Source from repo

Microsoft Foundry Skill

Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

154

Skill

n/a

Size

976.2 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

finetuning/references/vision-fine-tuning.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown117 linesFree

finetuning/references/vision-fine-tuning.md

1# Vision Fine-Tuning
2 
3Fine-tune models with image data to customize visual understanding. Uses the same chat-completions JSONL format as text SFT, but with image content blocks in user messages.
4 
5## Supported Models
6 
7| Model | Version |
8|-------|---------|
9| gpt-4o | 2024-08-06 |
10| gpt-4.1 | 2025-04-14 |
11 
12## Image Requirements
13 
14| Constraint | Limit |
15|-----------|-------|
16| Max examples with images per training file | 50,000 |
17| Max images per example | 64 |
18| Max image file size | 10 MB |
19| Supported formats | JPEG, PNG, WEBP |
20| Color mode | RGB or RGBA |
21| Min examples | 10 |
22 
23**Important**: Images can only appear in `user` messages, never in `assistant` responses.
24 
25## Data Format
26 
27Each training example follows the standard SFT `messages` format. Images are included as `image_url` content blocks within user messages.
28 
29```jsonl
30{"messages": [{"role": "system", "content": "You are a helpful AI assistant that describes images."}, {"role": "user", "content": [{"type": "text", "text": "Describe this image."}, {"type": "image_url", "image_url": {"url": "https://example.com/photo.png", "detail": "high"}}]}, {"role": "assistant", "content": "The image shows a cityscape with tall buildings against a blue sky."}]}
31```
32 
33### Image Sources
34 
35Images can be provided in two ways:
36 
37**1. Public URL:**
38```json
39{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}
40```
41 
42**2. Base64 data URI:**
43```json
44{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}
45```
46 
47### Detail Control
48 
49The `detail` parameter controls image processing fidelity and cost:
50 
51| Value | Behavior | Cost |
52|-------|----------|------|
53| `low` | Downscales to 512×512 pixels | Lower |
54| `high` | Full resolution processing | Higher |
55| `auto` | Model decides based on image size | Default |
56 
57```json
58{"type": "image_url", "image_url": {"url": "https://example.com/image.png", "detail": "low"}}
59```
60 
61Use `low` for tasks where fine visual detail doesn't matter (classification, general description). Use `high` for tasks needing precise detail (OCR, diagram reading, defect detection).
62 
63## Content Moderation
64 
65Images are screened before training. The following are **automatically excluded**:
66 
67- Images containing **people or faces** (face detection only — no identification)
68- **CAPTCHAs**
69- Content violating Azure usage policies
70 
71This screening may add latency to file upload validation.
72 
73## Best Practices
74 
75- **Diverse examples**: Vary image content, angles, lighting, and resolution
76- **Consistent annotations**: Keep assistant response style and detail level uniform
77- **Start with `detail: low`**: Cheaper and faster — upgrade to `high` only if results need it
78- **Check for excluded images**: After upload, verify the training count matches expectations — some images may be silently skipped due to content moderation
79- **Mixed text+image**: You can include both text-only and image examples in the same training file
80 
81## Training Workflow
82 
83Vision fine-tuning follows the exact same workflow as text SFT:
84 
851. Prepare JSONL with image content blocks
862. Upload training file (validation may take longer due to image screening)
873. Create fine-tuning job with a supported vision model
884. Monitor and evaluate as usual
89 
90```python
91# Upload (image validation may take longer)
92train_file = client.files.create(purpose="fine-tune", file=open("vision_train.jsonl", "rb"))
93client.files.wait_for_processing(train_file.id)
94 
95# Submit — same as text SFT
96job = client.fine_tuning.jobs.create(
97    model="gpt-4.1-2025-04-14",
98    training_file=train_file.id,
99    validation_file=val_file.id,
100    method={"type": "supervised"}
101)
102```
103 
104## Troubleshooting
105 
106| Issue | Resolution |
107|-------|-----------|
108| Images skipped silently | Check for people/faces, oversized files, unsupported formats |
109| URL not accessible | Ensure URLs are publicly accessible, or use base64 data URIs |
110| Exceeds 10 MB | Resize or compress the image |
111| Wrong color mode | Convert to RGB or RGBA |
112| Low quality results | Try `detail: high`, add more diverse examples, increase dataset size |
113 
114## Reference
115 
116- [Official docs](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/fine-tuning-vision)
117

Microsoft Foundry Skill

finetuning/references/vision-fine-tuning.md

Preparing the source view

Microsoft Foundry Skill

finetuning/references/vision-fine-tuning.md