Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Deploy, evaluate, and manage AI agents end-to-end on Microsoft Azure AI Foundry
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
finetuning/references/agentic-rft.md
1# Agentic RFT — Tool Calling23Train reasoning models (o4-mini) for agentic scenarios where the model invokes external tools during chain-of-thought reasoning.45> ⚠️ **Access required**: Agentic RFT with tool calling and GPT-5 RFT are behind feature flags. You must request access through the Azure AI Foundry portal or your Microsoft account team. o4-mini RFT without tools is generally available.67## Tool Definition Format89```python10tools = [11{12"name": "search",13"server_url": "https://your-function-app.azurewebsites.net/api/tools",14"headers": {15"Authorization": "Bearer <your-key>"16}17},18{19"name": "get_by_id",20"server_url": "https://your-function-app.azurewebsites.net/api/tools",21"headers": {22"Authorization": "Bearer <your-key>"23}24}25]26```2728## Submitting an Agentic RFT Job2930```python31job = client.fine_tuning.jobs.create(32model="o4-mini-2025-04-16",33training_file=train.id,34validation_file=valid.id,35method={36"type": "reinforcement",37"reinforcement": {38"grader": grader,39"tools": tools,40"max_episode_steps": 10,41"hyperparameters": {42"eval_interval": 5,43"eval_samples": 10,44"compute_multiplier": 1.5,45"reasoning_effort": "medium"46}47}48}49)50```5152## Tool Response Format5354Your tool endpoint must return:5556```json57{58"type": "function_call_output",59"call_id": "call_12345xyz",60"output": "The result of the tool call...",61"id": "fc_12345xyz"62}63```6465## Tool Endpoint Requirements6667| Constraint | Limit |68|-----------|-------|69| Recommended throughput | 50 QPS |70| Max input payload | 1 MB |71| Max return payload | 1 MB (413 error if exceeded) |72| Timeout | 10 minutes |73| Parallel calls | Supported — handle race conditions |74| Retry on 5xx | 3 attempts, then rollout discarded |75| On 4xx | Error serialized and shown to model |7677**Infrastructure**: Use Always On, sufficient compute (S2+), multiple instances. Under-provisioned endpoints can cause jobs to hang during post-training eval.7879## RFT Hyperparameters8081| Parameter | Description | Recommended Start |82|-----------|-------------|-------------------|83| `reasoning_effort` | `"low"`, `"medium"`, `"high"` | `"medium"` |84| `compute_multiplier` | Scales rollouts per step | `1.5` |85| `learning_rate_multiplier` | Scales the learning rate | `1.0` |86| `n_epochs` | Data passes | `2–3` |87| `eval_interval` | Eval every N steps | `5` |88| `eval_samples` | Validation examples per eval | `10` |89| `max_episode_steps` | Max tool calls + reasoning steps per rollout | `5–10` |9091**Notes:** Higher LR increases output verbosity without improving accuracy. Compute multiplier 1.5 balances rollout quality and training time. Platform may early-stop before all epochs.9293## When to Use Agentic RFT9495- Model needs to **decide when to call tools** (not just follow instructions)96- Task involves **multi-step reasoning** with external data lookups97- Model needs to learn **tool selection** — choosing the right tool for the job98- Standard RFT (without tools) can't capture the agentic behavior99