Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build and deploy AI applications on Azure AI Foundry using Microsoft's model catalog and AI services
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
finetuning/references/agentic-rft.md
1# Agentic RFT — Tool Calling23Train reasoning models (o4-mini) for agentic scenarios where the model invokes external tools during chain-of-thought reasoning.45> ⚠️ **Access required**: Agentic RFT with tool calling and GPT-5 RFT are behind feature flags. You must request access through the Azure AI Foundry portal or your Microsoft account team. o4-mini RFT without tools is generally available.67## Tool Definition Format89```python10tools = [11{12"name": "search",13"server_url": "https://your-function-app.azurewebsites.net/api/tools",14"headers": {15"Authorization": "Bearer <your-key>"16}17},18{19"name": "get_by_id",20"server_url": "https://your-function-app.azurewebsites.net/api/tools",21"headers": {22"Authorization": "Bearer <your-key>"23}24}25]26```2728## Submitting an Agentic RFT Job2930```python31job = client.fine_tuning.jobs.create(32model="o4-mini-2025-04-16",33training_file=train.id,34validation_file=valid.id,35method={36"type": "reinforcement",37"reinforcement": {38"grader": grader,39"tools": tools,40"max_episode_steps": 10,41"hyperparameters": {42"eval_interval": 5,43"eval_samples": 10,44"compute_multiplier": 1.5,45"reasoning_effort": "medium"46}47}48}49)50```5152## Tool Response Format5354Your tool endpoint must return:5556```json57{58"type": "function_call_output",59"call_id": "call_12345xyz",60"output": "The result of the tool call...",61"id": "fc_12345xyz"62}63```6465## Tool Endpoint Requirements6667| Constraint | Limit |68|-----------|-------|69| Recommended throughput | 50 QPS |70| Max input payload | 1 MB |71| Max return payload | 1 MB (413 error if exceeded) |72| Timeout | 10 minutes |73| Parallel calls | Supported — handle race conditions |74| Retry on 5xx | 3 attempts, then rollout discarded |75| On 4xx | Error serialized and shown to model |7677**Infrastructure**: Use Always On, sufficient compute (S2+), multiple instances. Under-provisioned endpoints can cause jobs to hang during post-training eval.7879## RFT Hyperparameters8081| Parameter | Description | Recommended Start |82|-----------|-------------|-------------------|83| `reasoning_effort` | `"low"`, `"medium"`, `"high"` | `"medium"` |84| `compute_multiplier` | Scales rollouts per step | `1.5` |85| `learning_rate_multiplier` | Scales the learning rate | `1.0` |86| `n_epochs` | Data passes | `2–3` |87| `eval_interval` | Eval every N steps | `5` |88| `eval_samples` | Validation examples per eval | `10` |89| `max_episode_steps` | Max tool calls + reasoning steps per rollout | `5–10` |9091**Notes:** Higher LR increases output verbosity without improving accuracy. Compute multiplier 1.5 balances rollout quality and training time. Platform may early-stop before all epochs.9293## When to Use Agentic RFT9495- Model needs to **decide when to call tools** (not just follow instructions)96- Task involves **multi-step reasoning** with external data lookups97- Model needs to learn **tool selection** — choosing the right tool for the job98- Standard RFT (without tools) can't capture the agentic behavior99