Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
finetuning/references/vision-fine-tuning.md
1# Vision Fine-Tuning23Fine-tune models with image data to customize visual understanding. Uses the same chat-completions JSONL format as text SFT, but with image content blocks in user messages.45## Supported Models67| Model | Version |8|-------|---------|9| gpt-4o | 2024-08-06 |10| gpt-4.1 | 2025-04-14 |1112## Image Requirements1314| Constraint | Limit |15|-----------|-------|16| Max examples with images per training file | 50,000 |17| Max images per example | 64 |18| Max image file size | 10 MB |19| Supported formats | JPEG, PNG, WEBP |20| Color mode | RGB or RGBA |21| Min examples | 10 |2223**Important**: Images can only appear in `user` messages, never in `assistant` responses.2425## Data Format2627Each training example follows the standard SFT `messages` format. Images are included as `image_url` content blocks within user messages.2829```jsonl30{"messages": [{"role": "system", "content": "You are a helpful AI assistant that describes images."}, {"role": "user", "content": [{"type": "text", "text": "Describe this image."}, {"type": "image_url", "image_url": {"url": "https://example.com/photo.png", "detail": "high"}}]}, {"role": "assistant", "content": "The image shows a cityscape with tall buildings against a blue sky."}]}31```3233### Image Sources3435Images can be provided in two ways:3637**1. Public URL:**38```json39{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}40```4142**2. Base64 data URI:**43```json44{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}45```4647### Detail Control4849The `detail` parameter controls image processing fidelity and cost:5051| Value | Behavior | Cost |52|-------|----------|------|53| `low` | Downscales to 512×512 pixels | Lower |54| `high` | Full resolution processing | Higher |55| `auto` | Model decides based on image size | Default |5657```json58{"type": "image_url", "image_url": {"url": "https://example.com/image.png", "detail": "low"}}59```6061Use `low` for tasks where fine visual detail doesn't matter (classification, general description). Use `high` for tasks needing precise detail (OCR, diagram reading, defect detection).6263## Content Moderation6465Images are screened before training. The following are **automatically excluded**:6667- Images containing **people or faces** (face detection only — no identification)68- **CAPTCHAs**69- Content violating Azure usage policies7071This screening may add latency to file upload validation.7273## Best Practices7475- **Diverse examples**: Vary image content, angles, lighting, and resolution76- **Consistent annotations**: Keep assistant response style and detail level uniform77- **Start with `detail: low`**: Cheaper and faster — upgrade to `high` only if results need it78- **Check for excluded images**: After upload, verify the training count matches expectations — some images may be silently skipped due to content moderation79- **Mixed text+image**: You can include both text-only and image examples in the same training file8081## Training Workflow8283Vision fine-tuning follows the exact same workflow as text SFT:84851. Prepare JSONL with image content blocks862. Upload training file (validation may take longer due to image screening)873. Create fine-tuning job with a supported vision model884. Monitor and evaluate as usual8990```python91# Upload (image validation may take longer)92train_file = client.files.create(purpose="fine-tune", file=open("vision_train.jsonl", "rb"))93client.files.wait_for_processing(train_file.id)9495# Submit — same as text SFT96job = client.fine_tuning.jobs.create(97model="gpt-4.1-2025-04-14",98training_file=train_file.id,99validation_file=val_file.id,100method={"type": "supervised"}101)102```103104## Troubleshooting105106| Issue | Resolution |107|-------|-----------|108| Images skipped silently | Check for people/faces, oversized files, unsupported formats |109| URL not accessible | Ensure URLs are publicly accessible, or use base64 data URIs |110| Exceeds 10 MB | Resize or compress the image |111| Wrong color mode | Convert to RGB or RGBA |112| Low quality results | Try `detail: high`, add more diverse examples, increase dataset size |113114## Reference115116- [Official docs](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/fine-tuning-vision)117