Source from repo

Microsoft Foundry Skill

Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services

microsoftGitHub microsoftOfficialSource repo Original GitHub link Publisher page

Files

155

Skill

n/a

Size

976.3 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

finetuning/references/vision-fine-tuning.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown117 linesFree

finetuning/references/vision-fine-tuning.md

1# Vision Fine-Tuning
2 
3Fine-tune models with image data to customize visual understanding. Uses the same chat-completions JSONL format as text SFT, but with image content blocks in user messages.
4 
5## Supported Models
6 
7| Model | Version |
8|-------|---------|
9| gpt-4o | 2024-08-06 |
10| gpt-4.1 | 2025-04-14 |
11 
12## Image Requirements
13 
14| Constraint | Limit |
15|-----------|-------|
16| Max examples with images per training file | 50,000 |
17| Max images per example | 64 |
18| Max image file size | 10 MB |
19| Supported formats | JPEG, PNG, WEBP |
20| Color mode | RGB or RGBA |
21| Min examples | 10 |
22 
23**Important**: Images can only appear in `user` messages, never in `assistant` responses.
24 
25## Data Format
26 
27Each training example follows the standard SFT `messages` format. Images are included as `image_url` content blocks within user messages.
28 
29```jsonl
30{"messages": [{"role": "system", "content": "You are a helpful AI assistant that describes images."}, {"role": "user", "content": [{"type": "text", "text": "Describe this image."}, {"type": "image_url", "image_url": {"url": "https://example.com/photo.png", "detail": "high"}}]}, {"role": "assistant", "content": "The image shows a cityscape with tall buildings against a blue sky."}]}
31```
32 
33### Image Sources
34 
35Images can be provided in two ways:
36 
37**1. Public URL:**
38```json
39{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}
40```
41 
42**2. Base64 data URI:**
43```json
44{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}
45```
46 
47### Detail Control
48 
49The `detail` parameter controls image processing fidelity and cost:
50 
51| Value | Behavior | Cost |
52|-------|----------|------|
53| `low` | Downscales to 512×512 pixels | Lower |
54| `high` | Full resolution processing | Higher |
55| `auto` | Model decides based on image size | Default |
56 
57```json
58{"type": "image_url", "image_url": {"url": "https://example.com/image.png", "detail": "low"}}
59```
60 
61Use `low` for tasks where fine visual detail doesn't matter (classification, general description). Use `high` for tasks needing precise detail (OCR, diagram reading, defect detection).
62 
63## Content Moderation
64 
65Images are screened before training. The following are **automatically excluded**:
66 
67- Images containing **people or faces** (face detection only — no identification)
68- **CAPTCHAs**
69- Content violating Azure usage policies
70 
71This screening may add latency to file upload validation.
72 
73## Best Practices
74 
75- **Diverse examples**: Vary image content, angles, lighting, and resolution
76- **Consistent annotations**: Keep assistant response style and detail level uniform
77- **Start with `detail: low`**: Cheaper and faster — upgrade to `high` only if results need it
78- **Check for excluded images**: After upload, verify the training count matches expectations — some images may be silently skipped due to content moderation
79- **Mixed text+image**: You can include both text-only and image examples in the same training file
80 
81## Training Workflow
82 
83Vision fine-tuning follows the exact same workflow as text SFT:
84 
851. Prepare JSONL with image content blocks
862. Upload training file (validation may take longer due to image screening)
873. Create fine-tuning job with a supported vision model
884. Monitor and evaluate as usual
89 
90```python
91# Upload (image validation may take longer)
92train_file = client.files.create(purpose="fine-tune", file=open("vision_train.jsonl", "rb"))
93client.files.wait_for_processing(train_file.id)
94 
95# Submit — same as text SFT
96job = client.fine_tuning.jobs.create(
97    model="gpt-4.1-2025-04-14",
98    training_file=train_file.id,
99    validation_file=val_file.id,
100    method={"type": "supervised"}
101)
102```
103 
104## Troubleshooting
105 
106| Issue | Resolution |
107|-------|-----------|
108| Images skipped silently | Check for people/faces, oversized files, unsupported formats |
109| URL not accessible | Ensure URLs are publicly accessible, or use base64 data URIs |
110| Exceeds 10 MB | Resize or compress the image |
111| Wrong color mode | Convert to RGB or RGBA |
112| Low quality results | Try `detail: high`, add more diverse examples, increase dataset size |
113 
114## Reference
115 
116- [Official docs](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/fine-tuning-vision)
117

Microsoft Foundry Skill

finetuning/references/vision-fine-tuning.md

Preparing the source view

Microsoft Foundry Skill

finetuning/references/vision-fine-tuning.md