Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
finetuning/references/vision-fine-tuning.md
1# Vision Fine-Tuning23Fine-tune models with image data to customize visual understanding. Uses the same chat-completions JSONL format as text SFT, but with image content blocks in user messages.45## Supported Models67| Model | Version |8|-------|---------|9| gpt-4o | 2024-08-06 |10| gpt-4.1 | 2025-04-14 |1112## Image Requirements1314| Constraint | Limit |15|-----------|-------|16| Max examples with images per training file | 50,000 |17| Max images per example | 64 |18| Max image file size | 10 MB |19| Supported formats | JPEG, PNG, WEBP |20| Color mode | RGB or RGBA |21| Min examples | 10 |2223**Important**: Images can only appear in `user` messages, never in `assistant` responses.2425## Data Format2627Each training example follows the standard SFT `messages` format. Images are included as `image_url` content blocks within user messages.2829```jsonl30{"messages": [{"role": "system", "content": "You are a helpful AI assistant that describes images."}, {"role": "user", "content": [{"type": "text", "text": "Describe this image."}, {"type": "image_url", "image_url": {"url": "https://example.com/photo.png", "detail": "high"}}]}, {"role": "assistant", "content": "The image shows a cityscape with tall buildings against a blue sky."}]}31```3233### Image Sources3435Images can be provided in two ways:3637**1. Public URL:**38```json39{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}40```4142**2. Base64 data URI:**43```json44{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}45```4647### Detail Control4849The `detail` parameter controls image processing fidelity and cost:5051| Value | Behavior | Cost |52|-------|----------|------|53| `low` | Downscales to 512×512 pixels | Lower |54| `high` | Full resolution processing | Higher |55| `auto` | Model decides based on image size | Default |5657```json58{"type": "image_url", "image_url": {"url": "https://example.com/image.png", "detail": "low"}}59```6061Use `low` for tasks where fine visual detail doesn't matter (classification, general description). Use `high` for tasks needing precise detail (OCR, diagram reading, defect detection).6263## Content Moderation6465Images are screened before training. The following are **automatically excluded**:6667- Images containing **people or faces** (face detection only — no identification)68- **CAPTCHAs**69- Content violating Azure usage policies7071This screening may add latency to file upload validation.7273## Best Practices7475- **Diverse examples**: Vary image content, angles, lighting, and resolution76- **Consistent annotations**: Keep assistant response style and detail level uniform77- **Start with `detail: low`**: Cheaper and faster — upgrade to `high` only if results need it78- **Check for excluded images**: After upload, verify the training count matches expectations — some images may be silently skipped due to content moderation79- **Mixed text+image**: You can include both text-only and image examples in the same training file8081## Training Workflow8283Vision fine-tuning follows the exact same workflow as text SFT:84851. Prepare JSONL with image content blocks862. Upload training file (validation may take longer due to image screening)873. Create fine-tuning job with a supported vision model884. Monitor and evaluate as usual8990```python91# Upload (image validation may take longer)92train_file = client.files.create(purpose="fine-tune", file=open("vision_train.jsonl", "rb"))93client.files.wait_for_processing(train_file.id)9495# Submit — same as text SFT96job = client.fine_tuning.jobs.create(97model="gpt-4.1-2025-04-14",98training_file=train_file.id,99validation_file=val_file.id,100method={"type": "supervised"}101)102```103104## Troubleshooting105106| Issue | Resolution |107|-------|-----------|108| Images skipped silently | Check for people/faces, oversized files, unsupported formats |109| URL not accessible | Ensure URLs are publicly accessible, or use base64 data URIs |110| Exceeds 10 MB | Resize or compress the image |111| Wrong color mode | Convert to RGB or RGBA |112| Low quality results | Try `detail: high`, add more diverse examples, increase dataset size |113114## Reference115116- [Official docs](https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/fine-tuning-vision)117