Source from repo

Building LLM-Powered Applications with Claude

Build LLM-powered apps with the Anthropic Claude API or SDK across Python, TypeScript, Java, Go, Ruby, C#, and PHP.

anthropicsGitHub anthropicsOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

517.2 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

shared/model-migration.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown780 linesFree

shared/model-migration.md

1# Model Migration Guide
2 
3How to move existing code to newer Claude models. Covers breaking changes, deprecated parameters, and drop-in replacements for retired models.
4 
5For the latest, authoritative version (with code samples in every supported language), WebFetch the **Migration Guide** URL from `shared/live-sources.md`. Use this file for the consolidated, skill-resident reference; fall back to the live docs whenever a model launch or breaking change may have shifted the picture.
6 
7**This file is large.** Use the section names below to jump (or `Grep` this file for the heading text). Read Step 0 and Step 1 first — they apply to every migration. Then read only the per-target section for the model you are migrating to.
8 
9| Section | When you need it |
10|---|---|
11| Step 0: Confirm the migration scope | Always — before any edits |
12| Step 1: Classify each file | Always — decides whether to swap, add-alongside, or skip |
13| Per-SDK Syntax Reference | Translate the Python examples in this guide to TypeScript / Go / Ruby / Java / C# / PHP |
14| Destination Models / Retired Model Replacements | Picking a target model |
15| Breaking Changes by Source Model | Migrating to Opus 4.6 / Sonnet 4.6 |
16| Migrating to Opus 4.7 | Migrating to Opus 4.7 (breaking changes, silent defaults, behavioral shifts) |
17| Opus 4.7 Migration Checklist | The required vs optional items for 4.7, tagged `[BLOCKS]` / `[TUNE]` |
18| Verify the Migration | After edits — runtime spot-check |
19 
20**TL;DR:** Change the model ID string. If you were using `budget_tokens`, switch to `thinking: {type: "adaptive"}`. If you were using assistant prefills, they 400 on both Opus 4.6 and Sonnet 4.6 — switch to one of the prefill replacements (most often `output_config.format`; see the table in Breaking Changes by Source Model). If you're moving from Sonnet 4.5 to Sonnet 4.6, set `effort` explicitly — 4.6 defaults to `high`. Remove the `effort-2025-11-24` and `fine-grained-tool-streaming-2025-05-14` beta headers (GA on 4.6); remove `interleaved-thinking-2025-05-14` once you're on adaptive thinking (keep it only while using the transitional `budget_tokens` escape hatch). Then drop back from `client.beta.messages.create` to `client.messages.create`. Dial back any aggressive "CRITICAL: YOU MUST" tool instructions; 4.6 follows the system prompt much more closely.
21 
22---
23 
24## Step 0: Confirm the migration scope
25 
26**Before any Write, Edit, or MultiEdit call, confirm the scope.** If the user's request does not explicitly name a single file, a specific directory, or an explicit file list, **ask first — do not start editing**. This is non-negotiable: even imperative-sounding requests like "migrate my codebase", "move my project to X", "upgrade to Sonnet 4.6", or bare "migrate to Opus 4.7" leave the scope ambiguous and require a clarifying question. Phrases like "my project", "my code", "my codebase", "the whole thing", "everywhere", or "across the repo" are **ambiguous, not directive** — they tell you *what* to do but not *where*. Ask before doing.
27 
28Offer the common scopes explicitly and wait for the answer before touching any file:
29 
301. The entire working directory
312. A specific subdirectory (e.g. `src/`, `app/`, `services/billing/`)
323. A specific file or a list of files
33 
34Surface this as a single clarifying question so the user can answer in one turn. **Proceed without asking only when the scope is already unambiguous** — the user named an exact file ("migrate `extract.py` to Sonnet 4.6"), pointed at a specific directory ("migrate everything under `services/billing/` to Opus 4.6"), listed specific files ("update `a.py` and `b.py`"), or already answered the scope question in an earlier turn. If you can answer the question "which files is this change going to touch?" with a precise list from the prompt alone, proceed. If not, ask.
35 
36**Worked example.** If the user says *"Move my project to Opus 4.6. I want adaptive thinking everywhere it makes sense."* you do not know whether "my project" means the whole working directory, just `src/`, just the production code, or something else — the `everywhere` makes the intent clear (update every call site *within scope*) but the scope itself is still not defined. Do not start editing. Respond with:
37 
38> Before I start editing, can you confirm the scope? I can migrate:
39> 1. Every `.py` file in the working directory
40> 2. Just the files under `src/` (production code)
41> 3. A specific subdirectory or list of files you name
42>
43> Which one?
44 
45Then wait for the answer. The same applies to *"Migrate to Opus 4.7"* and bare *"Help me upgrade to Sonnet 4.6"* — ask before editing.
46 
47**Sizing the scope question (large repos).** Before asking, get a per-directory count so the user can pick concretely:
48 
49```sh
50rg -l "<old-model-id>" --type-not md | cut -d/ -f1 | sort | uniq -c | sort -rn
51```
52 
53Present the breakdown in your scope question (e.g. *"Found 217 references across 3 directories: api/ (130), api-go/ (62), routing/ (25). Which to migrate?"*). Also confirm `git status` is clean before surveying — unexpected modifications mean a concurrent process; stop and investigate before proceeding.
54 
55---
56 
57## Step 1: Classify each file
58 
59Not every file that contains the old model ID is a **caller** of the API. Before editing, classify each file into one of these buckets — the right action differs:
60 
61| # | Bucket | What it looks like | Action |
62|---|---|---|---|
63| 1 | **Calls the API/SDK** | `client.messages.create(model=…)`, `anthropic.Anthropic()`, request payloads | Swap the model ID **and** apply the breaking-change checklist for the target version (below). |
64| 2 | **Defines or serves the model** | Model registries, OpenAPI specs, routing/queue configs, model-policy enums, generated catalogs | The old entry **stays** (the model is still served). Ask whether to (a) add the new model alongside, (b) leave alone, or (c) retire the old model — never blind-replace. **If you can't ask, default to (a): add the new model alongside and flag it** — replacing would de-register a model that's still in production. |
65| 3 | **References the ID as an opaque string** | UI fallback constants, capability-gate substring checks, generic test fixtures, label parsers, env defaults | Usually swap the string and verify any parser/regex/substring match handles the new ID — but check the sub-cases below first. |
66| 4 | **Suffixed variant ID** | `claude-<model>-<suffix>` like `-fast`, `-1024k`, `-200k`, `[1m]`, dated snapshots | These are deployment/routing identifiers, not the public model ID. **Do not assume a new-model equivalent exists.** Verify in the registry first; if absent, leave the string alone and flag it. |
67 
68**Bucket 3 sub-cases — before swapping a string reference, check:**
69 
70- **Capability gate** (e.g. `if 'opus-4-6' in model_id:` enables a feature) → **add the new ID alongside**, don't replace. The old model is still served and still has the capability, so replacing would silently disable the feature for any old-model traffic that still flows through. If you know no old-model traffic will hit this gate (single-caller codebase fully migrating), replacing is fine; if unsure, add alongside.
71- **Registry-assert test** (e.g. `assert "claude-X" in supported_models`, `test_X_has_N_clusters`) → **add an assertion for the new model alongside; keep the old one.** The old model is still served, so its assertion stays valid — but the registry should also include the new model, so assert that too. Heuristic: if the test references multiple model versions in a list, it's a registry test; if one model in a struct compared only to itself, it's a generic fixture.
72- **Frozen / generated snapshot** → **regenerate**, don't hand-edit.
73- **Coupled to a definer** (e.g. an integration test that passes model authorization via a shared `conftest` seed list, or asserts on a billing-tier / rate-limit-group enum or a generated SKU/pricing catalog) → **verify the definer has a new-model entry first.** If not, add a seed entry (reusing the nearest existing tier as a placeholder); if you can't confidently do that, ask the user how to populate the definer. **Do not skip the test.** Swapping without populating the definer will make the test fail at runtime.
74 
75When migrating tests specifically: breaking parameters (`temperature`, `top_p`, `budget_tokens`) are usually absent — test fixtures rarely set sampling params on placeholder models. The breaking-change scan is still required, but expect mostly clean results.
76 
77**Find intentionally-flagged sync points first.** Many codebases tag spots that must change at every model launch with comment markers like `MODEL LAUNCH`, `KEEP IN SYNC`, `@model-update`, or similar. Grep for whatever convention the repo uses *before* the broad model-ID grep — those markers point at the load-bearing changes.
78 
79---
80 
81## Per-SDK Syntax Reference
82 
83Code examples in this guide are Python. **The same fields exist in every official Anthropic SDK** — Stainless generates all 7 from the same OpenAPI spec, so JSON field names map 1:1 with only case-convention differences. Use the rows below to translate the Python examples to the SDK you are migrating.
84 
85> **Verify type and method names against the SDK source before writing them into customer code.** WebFetch the relevant repository from the SDK source-code table in `shared/live-sources.md` (one row per SDK) and confirm the exact symbol — particularly for typed SDKs (Go, Java, C#) where union/builder names can differ from the JSON shape. Do not guess type names that aren't in the table below or in `<lang>/claude-api/README.md`.
86 
87 
88### `thinking` — `budget_tokens` → adaptive
89 
90| SDK | Before | After |
91|---|---|---|
92| Python | `thinking={"type": "enabled", "budget_tokens": N}` | `thinking={"type": "adaptive"}` |
93| TypeScript | `thinking: { type: 'enabled', budget_tokens: N }` | `thinking: { type: 'adaptive' }` |
94| Go | `Thinking: anthropic.ThinkingConfigParamOfEnabled(N)` | `Thinking: anthropic.ThinkingConfigParamUnion{OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{}}` |
95| Ruby | `thinking: { type: "enabled", budget_tokens: N }` | `thinking: { type: "adaptive" }` |
96| Java | `.thinking(ThinkingConfigEnabled.builder().budgetTokens(N).build())` | `.thinking(ThinkingConfigAdaptive.builder().build())` |
97| C# | `Thinking = new ThinkingConfigEnabled { BudgetTokens = N }` | `Thinking = new ThinkingConfigAdaptive()` |
98| PHP | `thinking: ['type' => 'enabled', 'budget_tokens' => N]` | `thinking: ['type' => 'adaptive']` |
99 
100### Sampling parameters — `temperature` / `top_p` / `top_k`
101 
102(Remove the field entirely on Opus 4.7; on Claude 4.x keep at most one of `temperature` or `top_p`.)
103 
104| SDK | Field(s) to remove |
105|---|---|
106| Python | `temperature=…`, `top_p=…`, `top_k=…` |
107| TypeScript | `temperature: …`, `top_p: …`, `top_k: …` |
108| Go | `Temperature: anthropic.Float(…)`, `TopP: anthropic.Float(…)`, `TopK: anthropic.Int(…)` |
109| Ruby | `temperature: …`, `top_p: …`, `top_k: …` |
110| Java | `.temperature(…)`, `.topP(…)`, `.topK(…)` |
111| C# | `Temperature = …`, `TopP = …`, `TopK = …` |
112| PHP | `temperature: …`, `topP: …`, `topK: …` |
113 
114### Prefill replacement — structured outputs via `output_config.format`
115 
116| SDK | Remove (last assistant turn) | Add |
117|---|---|---|
118| Python | `{"role": "assistant", "content": "…"}` | `output_config={"format": {"type": "json_schema", "schema": SCHEMA}}` |
119| TypeScript | `{ role: 'assistant', content: '…' }` | `output_config: { format: { type: 'json_schema', schema: SCHEMA } }` |
120| Go | trailing `anthropic.MessageParam{Role: "assistant", …}` | `OutputConfig: anthropic.OutputConfigParam{Format: anthropic.JSONOutputFormatParam{…}}` |
121| Ruby | `{ role: "assistant", content: "…" }` | `output_config: { format: { type: "json_schema", schema: SCHEMA } }` |
122| Java | trailing `Message.builder().role(ASSISTANT)…` | `.outputConfig(OutputConfig.builder().format(JsonOutputFormat.builder()…build()).build())` |
123| C# | trailing `new Message { Role = "assistant", … }` | `OutputConfig = new OutputConfig { Format = new JsonOutputFormat { … } }` |
124| PHP | trailing `['role' => 'assistant', 'content' => '…']` | `outputConfig: ['format' => ['type' => 'json_schema', 'schema' => $SCHEMA]]` |
125 
126### `thinking.display` — opt back into summarized reasoning (Opus 4.7)
127 
128| SDK | Add |
129|---|---|
130| Python | `thinking={"type": "adaptive", "display": "summarized"}` |
131| TypeScript | `thinking: { type: 'adaptive', display: 'summarized' }` |
132| Go | `Thinking: anthropic.ThinkingConfigParamUnion{OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{Display: anthropic.ThinkingConfigAdaptiveDisplaySummarized}}` |
133| Ruby | `thinking: { type: "adaptive", display: "summarized" }` (or `display_:` when constructing the model class directly) |
134| Java | `.thinking(ThinkingConfigAdaptive.builder().display(ThinkingConfigAdaptive.Display.SUMMARIZED).build())` |
135| C# | `Thinking = new ThinkingConfigAdaptive { Display = Display.Summarized }` |
136| PHP | `thinking: ['type' => 'adaptive', 'display' => 'summarized']` |
137 
138For any field not in these tables, the JSON key in the Python example translates directly: `snake_case` for Python/TypeScript/Ruby, `camelCase` named args for PHP, `PascalCase` struct fields for Go/C#, `camelCase` builder methods for Java.
139 
140---
141 
142## Explain every change you make
143 
144Migration edits often look arbitrary to a user who hasn't read the release notes — a removed `temperature`, a deleted prefill, a rewritten system-prompt sentence. **For each edit, tell the user what you changed and why**, tied to the specific API or behavioral change that motivates it. Do this in your summary as you work, not just at the end.
145 
146Be especially explicit about **system-prompt edits**. Users are rightly protective of their prompts, and prompt-tuning changes are judgment calls (not hard API requirements). For any prompt edit:
147 
148- Quote the before and after text.
149- State the behavioral shift that motivates it (e.g. *"Opus 4.7 calibrates response length to task complexity, so I added an explicit length instruction"*, or *"4.6 follows instructions more literally, so 'CRITICAL: YOU MUST use the search tool' will now overtrigger — softened to 'Use the search tool when…'"*).
150- Make clear which prompt edits are **optional tuning** (tone, length, subagent guidance) versus which code edits are **required to avoid a 400** (sampling params, `budget_tokens`, prefills). Never present an optional prompt change as mandatory.
151 
152If you're applying several prompt-tuning edits at once, offer them as a short list the user can accept or decline item-by-item rather than silently rewriting their system prompt.
153 
154---
155 
156## Before You Migrate
157 
1581. **Confirm the target model ID.** Use only the exact strings from `shared/models.md` — do not append date suffixes to aliases (`claude-opus-4-6`, not `claude-opus-4-6-20251101`). Guessing an ID will 404.
1592. **Check which features your code uses** with this checklist:
160   - `thinking: {type: "enabled", budget_tokens: N}` → migrate to adaptive thinking on Opus 4.6 / Sonnet 4.6 (still functional but deprecated)
161   - Assistant-turn prefills (`messages` ending with `role: "assistant"`) → must change on Opus 4.6 / Sonnet 4.6 (returns 400)
162   - `output_format` parameter on `messages.create()` → must change on all models (deprecated API-wide)
163   - `max_tokens > ~16000` → must stream on any model (above ~16K risks SDK HTTP timeouts). When streaming, Sonnet 4.6 / Haiku 4.5 cap at 64K and Opus 4.6 caps at 128K
164   - Beta headers `effort-2025-11-24`, `fine-grained-tool-streaming-2025-05-14`, `interleaved-thinking-2025-05-14` → GA on 4.6, remove them and switch from `client.beta.messages.create` to `client.messages.create`
165   - Moving Sonnet 4.5 → Sonnet 4.6 with no `effort` set → 4.6 defaults to `high`, which may change your latency/cost profile
166   - System prompts with `CRITICAL`, `MUST`, `If in doubt, use X` language → likely to overtrigger on 4.6 (see Prompt-Behavior Changes)
167   - Coming from 3.x / 4.0 / 4.1: also check sampling params (`temperature` + `top_p`), tool versions (`text_editor_20250728`), `refusal` + `model_context_window_exceeded` stop reasons, trailing-newline tool-param handling
1683. **Test on a single request first.** Run one call against the new model, inspect the response, then roll out.
169 
170---
171 
172## Destination Models (recommended targets)
173 
174| If you're on…                         | Migrate to         | Why                                               |
175| ------------------------------------- | ------------------ | ------------------------------------------------- |
176| Opus 4.6                              | `claude-opus-4-7`  | Most capable model; adaptive thinking only; high-res vision; see Migrating to Opus 4.7 |
177| Opus 4.0 / 4.1 / 4.5 / Opus 3         | `claude-opus-4-6`  | Most intelligent 4.x before 4.7; adaptive thinking; 128K output |
178| Sonnet 4.0 / 4.5 / 3.7 / 3.5          | `claude-sonnet-4-6`| Best speed / intelligence balance; adaptive thinking; 64K output |
179| Haiku 3 / 3.5                         | `claude-haiku-4-5` | Fastest and most cost-effective                   |
180 
181Default to the latest Opus for the caller's tier unless they explicitly chose otherwise. If you're moving from Opus 4.5 or older directly to Opus 4.7, apply the 4.6 migration first, then layer the Opus 4.7 changes on top (see Migrating to Opus 4.7 below).
182 
183---
184 
185## Retired Model Replacements
186 
187These models return 404 — update immediately:
188 
189| Retired model                 | Retired       | Drop-in replacement  |
190| ----------------------------- | ------------- | -------------------- |
191| `claude-3-7-sonnet-20250219`  | Feb 19, 2026  | `claude-sonnet-4-6`  |
192| `claude-3-5-haiku-20241022`   | Feb 19, 2026  | `claude-haiku-4-5`   |
193| `claude-3-opus-20240229`      | Jan 5, 2026   | `claude-opus-4-7`    |
194| `claude-3-5-sonnet-20241022`  | Oct 28, 2025  | `claude-sonnet-4-6`  |
195| `claude-3-5-sonnet-20240620`  | Oct 28, 2025  | `claude-sonnet-4-6`  |
196| `claude-3-sonnet-20240229`    | Jul 21, 2025  | `claude-sonnet-4-6`  |
197| `claude-2.1`, `claude-2.0`    | Jul 21, 2025  | `claude-sonnet-4-6`  |
198 
199## Deprecated Models (retiring soon)
200 
201| Model                         | Retires       | Replacement          |
202| ----------------------------- | ------------- | -------------------- |
203| `claude-3-haiku-20240307`     | Apr 19, 2026  | `claude-haiku-4-5`   |
204| `claude-opus-4-20250514`      | June 15, 2026 | `claude-opus-4-7`    |
205| `claude-sonnet-4-20250514`    | June 15, 2026 | `claude-sonnet-4-6`  |
206 
207---
208 
209## Breaking Changes by Source Model
210 
211### Migrating from Sonnet 4.5 to Sonnet 4.6 (effort default change)
212 
213Sonnet 4.5 had no `effort` parameter; Sonnet 4.6 defaults to `high`. If you just switch the model string and do nothing else, you may see noticeably higher latency and token usage. Set `effort` explicitly.
214 
215**Recommended starting points:**
216 
217| Workload                                          | Start at       | Notes                                                                                                    |
218| ------------------------------------------------- | -------------- | -------------------------------------------------------------------------------------------------------- |
219| Chat, classification, content generation          | `low`          | With `thinking: {"type": "disabled"}` you'll see similar or better performance vs. Sonnet 4.5 no-thinking |
220| Most applications (balanced)                      | `medium`       | The default sweet spot for quality vs. cost                                                              |
221| Agentic coding, tool-heavy workflows              | `medium`       | Pair with adaptive thinking and a generous `max_tokens` (up to 64K with streaming — Sonnet 4.6's ceiling) |
222| Autonomous multi-step agents, long-horizon loops  | `high`         | Scale down to `medium` if latency/tokens become a concern                                                 |
223| Computer-use agents                               | `high` + adaptive | Sonnet 4.6's best computer-use accuracy is on adaptive + high                                          |
224 
225For non-thinking chat workloads specifically:
226 
227```python
228client.messages.create(
229    model="claude-sonnet-4-6",
230    max_tokens=8192,
231    thinking={"type": "disabled"},
232    output_config={"effort": "low"},
233    messages=[{"role": "user", "content": "..."}],
234)
235```
236 
237**When to use Opus 4.6 instead:** hardest and longest-horizon problems — large code migrations, deep research, extended autonomous work. Sonnet 4.6 wins on fast turnaround and cost efficiency.
238 
239### Migrating to Opus 4.6 / Sonnet 4.6 (from any older model)
240 
241**1. Manual extended thinking is deprecated — use adaptive thinking.**
242 
243`thinking: {type: "enabled", budget_tokens: N}` (manual extended thinking with a fixed token budget) is deprecated on Opus 4.6 and Sonnet 4.6. Replace it with `thinking: {type: "adaptive"}`, which lets Claude decide when and how much to think. Adaptive thinking also enables interleaved thinking automatically (no beta header needed).
244 
245```python
246# Old (still works on older models, deprecated on 4.6)
247response = client.messages.create(
248    model="claude-sonnet-4-5",
249    max_tokens=16000,
250    thinking={"type": "enabled", "budget_tokens": 8000},
251    messages=[...]
252)
253 
254# New (Opus 4.6 / Sonnet 4.6)
255response = client.messages.create(
256    model="claude-opus-4-6",  # or "claude-sonnet-4-6"
257    max_tokens=16000,
258    thinking={"type": "adaptive"},
259    output_config={"effort": "high"},  # optional: low | medium | high | max
260    messages=[...]
261)
262```
263 
264Adaptive thinking is the long-term target, and on internal evaluations it outperforms manual extended thinking. Move when you can.
265 
266**Transitional escape hatch:** manual extended thinking is still *functional* on Opus 4.6 and Sonnet 4.6 (deprecated, will be removed in a future release). If you need a hard ceiling while migrating — for example, to bound token spend on a runaway workload before you've tuned `effort` — you can keep `budget_tokens` around alongside an explicit `effort` value, then remove it in a follow-up. `budget_tokens` must be strictly less than `max_tokens`:
267 
268```python
269# Transitional only — deprecated, plan to remove
270client.messages.create(
271    model="claude-sonnet-4-6",
272    max_tokens=16384,
273    thinking={"type": "enabled", "budget_tokens": 8192},  # must be < max_tokens
274    output_config={"effort": "medium"},
275    messages=[...],
276)
277```
278 
279If the user asks for a "thinking budget" on 4.6, the preferred answer is `effort` — use `low`, `medium`, `high`, or `max` (Opus-tier only — not Sonnet or Haiku) rather than a token count.
280 
281**2. Effort parameter (Opus 4.5, Opus 4.6, Sonnet 4.6 only).**
282 
283Controls thinking depth and overall token spend. Goes inside `output_config`, not top-level. Default is `high`. `max` is Opus-tier only (Opus 4.6 and later — not Sonnet or Haiku). Errors on Sonnet 4.5 and Haiku 4.5.
284 
285```python
286output_config={"effort": "medium"}  # often the best cost / quality balance
287```
288 
289### Migrating to the 4.6 family (Opus 4.6 and Sonnet 4.6)
290 
291**3. Assistant-turn prefills return 400 (Opus 4.6 and Sonnet 4.6).**
292 
293Prefilled responses on the final assistant turn are no longer supported on either Opus 4.6 or Sonnet 4.6 — both return a 400. Adding assistant messages *elsewhere* in the conversation (e.g., for few-shot examples) still works. Pick the replacement that matches what the prefill was doing:
294 
295| Prefill was used for                               | Replacement                                                                                                                               |
296| -------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
297| Forcing JSON / YAML / schema output                | `output_config.format` with a `json_schema` — see example below                                                                           |
298| Forcing a classification label                     | Tool with an enum field containing valid labels, or structured outputs                                                                    |
299| Skipping preambles (`Here is the summary:\n`)      | System prompt instruction: *"Respond directly without preamble. Do not start with phrases like 'Here is...' or 'Based on...'."*           |
300| Steering around bad refusals                       | Usually no longer needed — 4.6 refuses far more appropriately. Plain user-turn prompting is sufficient.                                   |
301| Continuing an interrupted response                 | Move continuation into the user turn: *"Your previous response was interrupted and ended with `[last text]`. Continue from there."*     |
302| Injecting reminders / context hydration            | Inject into the user turn instead. For complex agent harnesses, expose context via a tool call or during compaction.                      |
303 
304```python
305# Old (fails on Opus 4.6 / Sonnet 4.6) — prefill forcing JSON shape
306messages=[
307    {"role": "user", "content": "Extract the name."},
308    {"role": "assistant", "content": "{\"name\": \""},
309]
310 
311# New — structured outputs replace the prefill
312response = client.messages.create(
313    model="claude-opus-4-6",
314    max_tokens=1024,
315    output_config={"format": {"type": "json_schema", "schema": {...}}},
316    messages=[{"role": "user", "content": "Extract the name."}],
317)
318```
319 
320**4. Stream for `max_tokens > ~16K` (all models); Opus 4.6 alone reaches 128K.**
321 
322Non-streaming requests hit SDK HTTP timeouts at high `max_tokens`, regardless of model — stream for anything above ~16K output. The streamable ceiling differs by model: Sonnet 4.6 and Haiku 4.5 cap at 64K, and Opus 4.6 alone goes up to 128K.
323 
324```python
325with client.messages.stream(model="claude-opus-4-6", max_tokens=64000, ...) as stream:
326    message = stream.get_final_message()
327```
328 
329**5. Tool-call JSON escaping may differ (Opus 4.6 and Sonnet 4.6).**
330 
331Both 4.6 models can produce tool call `input` fields with Unicode or forward-slash escaping. Always parse with `json.loads()` / `JSON.parse()` — never raw-string-match the serialized input.
332 
333### All models
334 
335**6. `output_format` → `output_config.format` (API-wide).**
336 
337The old top-level `output_format` parameter on `messages.create()` is deprecated. Use `output_config.format` instead. This is not 4.6-specific — applies to every model.
338 
339---
340 
341## Beta Headers to Remove on 4.6
342 
343Several beta headers that were required on 4.5 are now GA on 4.6 and should be removed. Leaving them in is harmless but misleading; removing them also lets you move from `client.beta.messages.create(...)` back to `client.messages.create(...)`.
344 
345| Header                                    | Status on 4.6                                              | Action                                                  |
346| ----------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------- |
347| `effort-2025-11-24`                       | Effort parameter is GA                                     | Remove                                                  |
348| `fine-grained-tool-streaming-2025-05-14`  | GA                                                         | Remove                                                  |
349| `interleaved-thinking-2025-05-14`         | Adaptive thinking enables interleaved thinking automatically | Remove when using adaptive thinking; still functional on Sonnet 4.6 *with* manual extended thinking, but that path is deprecated |
350| `token-efficient-tools-2025-02-19`        | Built in to all Claude 4+ models                           | Remove (no effect)                                      |
351| `output-128k-2025-02-19`                  | Built in to Claude 4+ models                               | Remove (no effect)                                      |
352 
353Once you remove all of these and finish moving to adaptive thinking, you can switch the SDK call site from the beta namespace back to the regular one:
354 
355```python
356# Before
357response = client.beta.messages.create(
358    model="claude-opus-4-5",
359    betas=["interleaved-thinking-2025-05-14", "effort-2025-11-24"],
360    ...
361)
362 
363# After
364response = client.messages.create(
365    model="claude-opus-4-6",
366    thinking={"type": "adaptive"},
367    output_config={"effort": "high"},
368    ...
369)
370```
371 
372---
373 
374## Additional Changes When Coming from 3.x / 4.0 / 4.1 → 4.6
375 
376If you're jumping from Opus 4.1, Sonnet 4, Sonnet 3.7, or an older Claude 3.x model directly to 4.6, apply everything above *plus* the items in this section. Users already on Opus 4.5 / Sonnet 4.5 can skip this.
377 
378**1. Sampling parameters: `temperature` OR `top_p`, not both.**
379 
380Passing both will error on every Claude 4+ model:
381 
382```python
383# Old (3.x only — errors on 4+)
384client.messages.create(temperature=0.7, top_p=0.9, ...)
385 
386# New
387client.messages.create(temperature=0.7, ...)  # or top_p, not both
388```
389 
390**2. Update tool versions.**
391 
392Legacy tool versions are not supported on 4+. **Both the `type` and the `name` field change** — `text_editor_20250728` and `str_replace_based_edit_tool` are a pair; updating one without the other 400s. Also remove the `undo_edit` command from your text-editor integration:
393 
394| Old                                               | New                                                     |
395| ------------------------------------------------- | ------------------------------------------------------- |
396| `text_editor_20250124` + `str_replace_editor`     | `text_editor_20250728` + `str_replace_based_edit_tool`  |
397| `code_execution_*` (earlier versions)             | `code_execution_20250825`                               |
398| `undo_edit` command                               | *(no longer supported — delete call sites)*             |
399 
400```python
401# Before
402tools = [{"type": "text_editor_20250124", "name": "str_replace_editor"}]
403 
404# After — BOTH fields change
405tools = [{"type": "text_editor_20250728", "name": "str_replace_based_edit_tool"}]
406```
407 
408**3. Handle the `refusal` stop reason.**
409 
410Claude 4+ can return `stop_reason: "refusal"` on the response. If your code only handles `end_turn` / `tool_use` / `max_tokens`, add a branch:
411 
412```python
413if response.stop_reason == "refusal":
414    # Surface the refusal to the user; do not retry with the same prompt
415    ...
416```
417 
418**4. Handle the `model_context_window_exceeded` stop reason (4.5+).**
419 
420Distinct from `max_tokens`: it means the model hit the *context window* limit, not the requested output cap. Handle both:
421 
422```python
423if response.stop_reason == "model_context_window_exceeded":
424    # Context window exhausted — compact or split the conversation
425    ...
426elif response.stop_reason == "max_tokens":
427    # Requested output cap hit — retry with higher max_tokens or stream
428    ...
429```
430 
431**5. Trailing newlines preserved in tool call string parameters (4.5+).**
432 
4334.5 and 4.6 preserve trailing newlines that older models stripped. If your tool implementations do exact string matching against tool-call `input` values (e.g., `if name == "foo"`), verify they still match when the model sends `"foo\n"`. Normalizing with `.rstrip()` on the receiving side is usually the simplest fix.
434 
435**6. Haiku: rate limits reset between generations.**
436 
437Haiku 4.5 has its own rate-limit pool separate from Haiku 3 / 3.5. If you're ramping traffic as you migrate, check your tier's Haiku 4.5 limits at [API rate limits](https://platform.claude.com/docs/en/api/rate-limits) — a quota that comfortably served Haiku 3.5 traffic may need a tier bump for the same volume on 4.5.
438 
439---
440 
441## Prompt-Behavior Changes (Opus 4.5 / 4.6, Sonnet 4.6)
442 
443These don't break your code, but prompts that worked on 4.5-and-earlier may over- or under-trigger on 4.6. Tune as needed.
444 
445**1. Aggressive instructions cause overtriggering.** Opus 4.5 and 4.6 follow the system prompt much more closely than earlier models. Prompts written to *overcome* the old reluctance are now too aggressive:
446 
447| Before (worked on 4.0 / 4.5)                | After (use on 4.6)                        |
448| ------------------------------------------- | ----------------------------------------- |
449| `CRITICAL: You MUST use this tool when...`  | `Use this tool when...`                   |
450| `Default to using [tool]`                   | `Use [tool] when it would improve X`      |
451| `If in doubt, use [tool]`                   | *(delete — no longer needed)*             |
452 
453If the model is now overtriggering a tool or skill, the fix is almost always to dial back the language, not to add more guardrails.
454 
455**2. Overthinking and excessive exploration (Opus 4.6).** At higher `effort` settings, Opus 4.6 explores more before answering. If that burns too many thinking tokens, lower `effort` first (`medium` is often the sweet spot) before adding prose instructions to constrain reasoning.
456 
457**3. Overeager subagent spawning (Opus 4.6).** Opus 4.6 has a strong preference for delegating to subagents. If you see it spawning a subagent for something a direct `grep` or `read` would solve, add guidance: *"Use subagents only for parallel or independent workstreams. For single-file reads or sequential operations, work directly."*
458 
459**4. Overengineering (Opus 4.5 / 4.6).** Both models may add extra files, abstractions, or defensive error handling beyond what was asked. If you want minimal changes, prompt for it explicitly: *"Only make changes directly requested. Don't add helpers, abstractions, or error handling for scenarios that can't happen."*
460 
461**5. LaTeX math output (Opus 4.6).** Opus 4.6 defaults to LaTeX (`\frac{}{}`, `$...$`) for math and technical content. If you need plain text, instruct it explicitly: *"Format all math as plain text — no LaTeX, no `$`, no `\frac{}{}`. Use `/` for division and `^` for exponents."*
462 
463**6. Skipped verbal summaries (4.6 family).** The 4.6 models are more concise and may skip the summary paragraph after a tool call, jumping straight to the next action. If you rely on those summaries for visibility, add: *"After completing a task that involves tool use, provide a brief summary of what you did."*
464 
465**7. "Think" as a trigger word (Opus 4.5 with thinking disabled).** When `thinking` is off, Opus 4.5 is particularly sensitive to the word *think* and may reason more than you want. Use `consider`, `evaluate`, or `reason through` instead.
466 
467---
468 
469## Model-ID Rename Quick Reference
470 
471| Old string (migration source)  | New string         |
472| ------------------------------ | ------------------ |
473| `claude-opus-4-6`              | `claude-opus-4-7`  |
474| `claude-opus-4-5`              | `claude-opus-4-7`  |
475| `claude-opus-4-1`              | `claude-opus-4-7`  |
476| `claude-opus-4-0`              | `claude-opus-4-7`  |
477| `claude-sonnet-4-5`            | `claude-sonnet-4-6`|
478| `claude-sonnet-4-0`            | `claude-sonnet-4-6`|
479 
480Older aliases (`claude-opus-4-5`, `claude-sonnet-4-5`, `claude-opus-4-1`, etc.) are still active and can be pinned if you need time before upgrading — see `shared/models.md` for the full legacy list.
481 
482---
483 
484## Migration Checklist
485 
486Every item is tagged: **`[BLOCKS]`** items cause a 400 error, infinite loop, silent timeout, or wrong tool selection if missed — apply these as code edits, not as suggestions. **`[TUNE]`** items are quality/cost adjustments.
487 
488For each file that calls `messages.create()` / equivalent SDK method:
489 
490- [ ] **[BLOCKS]** Update the `model=` string to the new alias
491- [ ] **[BLOCKS]** Replace `budget_tokens` with `thinking={"type": "adaptive"}` (deprecated on Opus 4.6 / Sonnet 4.6)
492- [ ] **[BLOCKS]** Move `format` from top-level `output_format` into `output_config.format`
493- [ ] **[BLOCKS]** Remove any assistant-turn prefills if targeting Opus 4.6 or Sonnet 4.6 (see the prefill replacement table)
494- [ ] **[BLOCKS]** Switch to streaming if `max_tokens > ~16000` (otherwise SDK HTTP timeout)
495- [ ] **[TUNE]** Set `output_config={"effort": "..."}` explicitly — especially when moving Sonnet 4.5 → Sonnet 4.6 (4.6 defaults to `high`)
496- [ ] **[TUNE]** Remove GA beta headers: `effort-2025-11-24`, `fine-grained-tool-streaming-2025-05-14`, `token-efficient-tools-2025-02-19`, `output-128k-2025-02-19`; remove `interleaved-thinking-2025-05-14` once on adaptive thinking
497- [ ] **[TUNE]** Switch `client.beta.messages.create(...)` → `client.messages.create(...)` once all betas are removed
498- [ ] **[TUNE]** Review system prompt for aggressive tool language (`CRITICAL:`, `MUST`, `If in doubt`) and dial it back
499 
500**Extra items when coming from 3.x / 4.0 / 4.1:**
501- [ ] **[BLOCKS]** Remove either `temperature` or `top_p` (passing both 400s on Claude 4+)
502- [ ] **[BLOCKS]** Update text-editor tool `type` to `text_editor_20250728`
503- [ ] **[BLOCKS]** Update text-editor tool `name` to `str_replace_based_edit_tool` — **changing only the `type` and keeping `name: "str_replace_editor"` returns a 400**
504- [ ] **[BLOCKS]** Update code-execution tool to `code_execution_20250825`
505- [ ] **[BLOCKS]** Delete any `undo_edit` command call sites
506- [ ] **[TUNE]** Add handling for `stop_reason == "refusal"`
507- [ ] **[TUNE]** Add handling for `stop_reason == "model_context_window_exceeded"` (4.5+)
508- [ ] **[TUNE]** Verify tool-param string matching tolerates trailing newlines (preserved on 4.5+)
509- [ ] **[TUNE]** If moving to Haiku 4.5: review rate-limit tier (separate pool from Haiku 3.x)
510 
511**Verification:**
512- [ ] Run one test request and inspect `response.stop_reason`, `response.usage`, and whether tool-use / thinking behavior matches expectations
513 
514For cached prompts: the render order and hash inputs did not change, so existing `cache_control` breakpoints keep working. However, **changing the model string invalidates the existing cache** — the first request on the new model will write the cache fresh.
515 
516---
517 
518## Migrating to Opus 4.7
519 
520> **Model ID `claude-opus-4-7` is authoritative as written here.** When the user asks to migrate to Opus 4.7, write `model="claude-opus-4-7"` exactly. Do **not** WebFetch to verify — this guide is the source of truth for migration target IDs. The corresponding entry exists in `shared/models.md`.
521 
522Claude Opus 4.7 is our most capable generally available model to date. It is highly autonomous and performs exceptionally well on long-horizon agentic work, knowledge work, vision tasks, and memory tasks. This section summarizes everything new at launch. It is layered on top of the 4.6 migration above — if the caller is jumping from Opus 4.5 or older, apply the 4.6 changes first, then apply this section.
523 
524**TL;DR for someone already on Opus 4.6:** update the model ID to `claude-opus-4-7`, strip any remaining `budget_tokens` and sampling parameters (both 400 on Opus 4.7), give `max_tokens` extra headroom and re-baseline with `count_tokens()` against the new model, opt back into `thinking.display: "summarized"` if reasoning is surfaced to users, and re-tune `effort` — it matters more on 4.7 than on any prior Opus.
525 
526### Breaking changes (will 400 on Opus 4.7)
527 
528**Extended thinking removed.**
529 
530`thinking: {type: "enabled", budget_tokens: N}` is no longer supported on Claude Opus 4.7 or later models and returns a 400 error. Switch to adaptive thinking (`thinking: {type: "adaptive"}`) and use the effort parameter to control thinking depth. Adaptive thinking is **off by default** on Claude Opus 4.7: requests with no `thinking` field run without thinking, matching Opus 4.6 behavior. Set `thinking: {type: "adaptive"}` explicitly to enable it.
531 
532```python
533# Before (Opus 4.6)
534client.messages.create(
535    model="claude-opus-4-6",
536    max_tokens=64000,
537    thinking={"type": "enabled", "budget_tokens": 32000},
538    messages=[{"role": "user", "content": "..."}],
539)
540 
541# After (Opus 4.7)
542client.messages.create(
543    model="claude-opus-4-7",
544    max_tokens=64000,
545    thinking={"type": "adaptive"},
546    output_config={"effort": "high"},  # or "max", "xhigh", "medium", "low"
547    messages=[{"role": "user", "content": "..."}],
548)
549```
550 
551If the caller wasn't using extended thinking, no change is required — thinking is off by default, or can be set explicitly with `thinking={"type": "disabled"}`.
552 
553Delete `budget_tokens` plumbing entirely. For the replacement `effort` value, see **Choosing an effort level on Opus 4.7** below — there is no exact 1:1 mapping from `budget_tokens`.
554 
555**Sampling parameters removed.**
556 
557The `temperature`, `top_p`, and `top_k` parameters are no longer accepted on Claude Opus 4.7. Requests that include them return a 400 error. Remove these fields from your request payloads. Prompting is the recommended way to guide model behavior on Claude Opus 4.7. If you were using `temperature = 0` for determinism, note that it never guaranteed identical outputs on prior models.
558 
559```python
560# Before — errors on Opus 4.7
561client.messages.create(temperature=0.7, top_p=0.9, ...)
562 
563# After
564client.messages.create(...)  # no sampling params
565```
566 
567- **If the intent was determinism** — use `effort: "low"` with a tighter prompt.
568- **If the intent was creative variance** — the prompt replacement depends on the use case; **ask the user** how they want variance elicited. If you can't ask, add a use-case-appropriate instruction along the lines of *"choose something off-distribution and interesting"* — e.g. for text generation, *"Vary your phrasing and structure across responses"*; for frontend/design, use the propose-4-directions approach under **Design and frontend coding** below.
569 
570### Choosing an effort level on Opus 4.7
571 
572`budget_tokens` controlled how much to *think*; `effort` controls how much to think *and* act, so there is no exact 1:1 mapping. **Use `xhigh` for best results in coding and agentic use cases, and a minimum of `high` for most intelligence-sensitive use cases.** Experiment with other levels to further tune token usage and intelligence:
573 
574| Level | Use when | Notes |
575| --- | --- | --- |
576| `max` | Intelligence-demanding tasks worth testing at the ceiling | Can deliver gains in some use cases but may show diminishing returns from increased token usage; can be prone to overthinking |
577| `xhigh` | **Most coding and agentic use cases** | The best setting for these; used as the default in Claude Code |
578| `high` | Intelligence-sensitive use cases generally | Balances token usage and intelligence; recommended minimum for most intelligence-sensitive work |
579| `medium` | Cost-sensitive use cases that need to reduce token usage while trading off intelligence | |
580| `low` | Short, scoped tasks and latency-sensitive workloads that are not intelligence-sensitive | |
581 
582### Silent default changes (no error, but behavior differs)
583 
584**Thinking content omitted by default.**
585 
586Thinking blocks still appear in the response stream on Claude Opus 4.7, but their `thinking` field is empty unless you explicitly opt in. This is a silent change from Claude Opus 4.6, where the default was to return summarized thinking text. To restore summarized thinking content on Claude Opus 4.7, set `thinking.display` to `"summarized"`. **The block-field name is unchanged** — it is still `block.thinking` on a `thinking`-type block; do not rename it.
587 
588**Detect this:** any code that reads `block.thinking` (or equivalent) from a `thinking`-type block and renders it in a UI, log, or trace. **The fix is the request parameter, not the response handling** — add `display: "summarized"` to the `thinking` parameter:
589 
590```python
591thinking={"type": "adaptive", "display": "summarized"}  # "display" is new on Opus 4.7; values: "omitted" (default) | "summarized"
592```
593 
594The default is `"omitted"` on Claude Opus 4.7. If thinking content was never surfaced anywhere, no change needed. If your product streams reasoning to users, the new default appears as a long pause before output begins; set `display: "summarized"` to restore visible progress during thinking.
595 
596**Updated token counting.**
597 
598Claude Opus 4.7 and Claude Opus 4.6 count tokens differently. The same input text produces a higher token count on Claude Opus 4.7 than on Claude Opus 4.6, and `/v1/messages/count_tokens` will return a different number of tokens for Claude Opus 4.7 than it did for Claude Opus 4.6. The token efficiency of Claude Opus 4.7 can vary by workload shape. Prompting interventions, `task_budget`, and `effort` can help control costs and ensure appropriate token usage. Keep in mind that these controls may trade off model intelligence. **Update your `max_tokens` parameters to give additional headroom, including compaction triggers.** Claude Opus 4.7 provides a 1M context window at standard API pricing with no long-context premium.
599 
600What else to check:
601 
602- Client-side token estimators (tiktoken-style approximations) calibrated against 4.6
603- Cost calculators that multiply tokens by a fixed per-token rate
604- Rate-limit retry thresholds keyed to measured token counts
605 
606Re-baseline by re-running `client.messages.count_tokens()` against `claude-opus-4-7` on a representative sample of the caller's prompts. Do not apply a blanket multiplier. For cost-sensitive workloads, consider reducing `effort` by one level (e.g. `high` → `medium`). For agentic loops, consider adopting Task Budgets (below).
607 
608### New feature: Task Budgets (beta)
609 
610Opus 4.7 introduces **task budgets** — tell Claude how many tokens it has for a full agentic loop (thinking + tool calls + final output). The model sees a running countdown and uses it to prioritize work and wrap up gracefully as the budget is consumed.
611 
612This is a **suggestion the model is aware of**, not a hard cap. It is distinct from `max_tokens`, which remains the enforced per-response limit and is *not* surfaced to the model. Use `task_budget` when you want the model to self-moderate; use `max_tokens` as a hard ceiling to cap usage.
613 
614Requires beta header `task-budgets-2026-03-13`:
615 
616```python
617client.beta.messages.create(
618    betas=["task-budgets-2026-03-13"],
619    model="claude-opus-4-7",
620    max_tokens=64000,
621    thinking={"type": "adaptive"},
622    output_config={
623        "effort": "high",
624        "task_budget": {"type": "tokens", "total": 128000},
625    },
626    messages=[...],
627)
628```
629 
630Set a generous budget for open-ended agentic tasks and tighten it for latency-sensitive ones. **Minimum `task_budget.total` is 20,000 tokens.** If the budget is too restrictive for the task, the model may complete it less thoroughly, referencing its budget as the constraint. **Do not add `task_budget` during a migration unless you are sure the budget value is right** — if you can run the workload and measure, do so; otherwise ask the user for the value rather than guessing. This is the primary lever for offsetting the token-counting shift on agentic workloads.
631 
632### Capability improvements
633 
634**High-resolution vision.** Opus 4.7 is the first Claude model with high-resolution image support. Maximum image resolution is **2576 pixels on the long edge** (up from 1568px on Opus 4.6 and prior). This unlocks gains on vision-heavy workloads, especially computer use and screenshot/artifact/document understanding. Coordinates returned by the model now map 1:1 to actual image pixels, so no scale-factor math is needed.
635 
636High-res support is **automatic on Opus 4.7** — no beta header, no client-side opt-in required. The model accepts larger inputs and returns pixel-accurate coordinates out of the box.
637 
638**Token cost.** Full-resolution images on Opus 4.7 can use up to ~3× more image tokens than on prior models (up to ~4784 tokens per image, vs. the previous ~1,600-token cap). If the extra fidelity isn't needed, downsample client-side before sending to control cost — but **do not add downsampling by default during a migration**. If you're not sure whether the pipeline needs the fidelity, ask the user rather than guessing. Use `count_tokens()` on representative images on Opus 4.7 to re-baseline before reacting to any measured cost shift.
639 
640Beyond resolution, Opus 4.7 also improves on low-level perception (pointing, measuring, counting) and natural-image bounding-box localization and detection.
641 
642**Knowledge work.** Meaningful gains on tasks where the model visually verifies its own output — `.docx` redlining, `.pptx` editing, and programmatic chart/figure analysis (e.g. pixel-level data transcription via image-processing libraries). If prompts have scaffolding like *"double-check the slide layout before returning"*, try removing it and re-baselining.
643 
644**Memory.** Opus 4.7 is better at writing and using file-system-based memory. If an agent maintains a scratchpad, notes file, or structured memory store across turns, that agent should improve at jotting down notes to itself and leveraging its notes in future tasks.
645 
646**User-facing progress updates.** Opus 4.7 provides more regular, higher-quality interim updates during long agentic traces. If the system prompt has scaffolding like *"After every 3 tool calls, summarize progress"*, try removing it to avoid excessive user-facing text. If the length or contents of Opus 4.7's updates are not well-calibrated to your use case, explicitly describe what these updates should look like in the prompt and provide examples.
647 
648### Real-time cybersecurity safeguards
649 
650Requests that involve prohibited or high-risk topics may lead to refusals.
651 
652### Fast Mode: not available on Opus 4.7
653 
654Opus 4.7 does not have a Fast Mode variant. **Opus 4.6 Fast remains supported**. Only surface this if the caller's code actually uses a Fast Mode model string (e.g. `claude-opus-4-6-fast`); if the word "fast" does not appear in the code, say nothing about Fast Mode.
655 
656When you see `model="claude-opus-4-6-fast"` (or similar), **the migration edit is**:
657 
658```python
659# Opus 4.7 has no Fast Mode — keeping on 4.6 Fast (caller's choice to switch to standard Opus 4.7).
660model="claude-opus-4-6-fast",
661```
662 
663That is: leave the model string **unchanged**, add the comment above it, and tell the user their two options — (a) stay on Opus 4.6 Fast, which remains supported, or (b) move latency-tolerant traffic to standard Opus 4.7 for the intelligence gain. Do **not** rewrite the model string to `claude-opus-4-7` yourself; that silently trades latency for intelligence, which is the caller's decision.
664 
665### Behavioral shifts (prompt-tunable)
666 
667These don't break anything, but prompts tuned for Opus 4.6 may land differently. Opus 4.7 is more steerable than 4.6, so small prompt nudges usually close the gap.
668 
669**More literal instruction following.** Claude Opus 4.7 interprets prompts more literally and explicitly than Claude Opus 4.6, particularly at lower effort levels. It will not silently generalize an instruction from one item to another, and it will not infer requests you didn't make. The upside of this literalism is precision and less thrash. It generally performs better for API use cases with carefully tuned prompts, structured extraction, and pipelines where you want predictable behavior. A prompt and harness review may be especially helpful for migration to Claude Opus 4.7.
670 
671**Verbosity calibrates to task complexity.** Opus 4.7 scales response length to how complex it judges the task to be, rather than defaulting to a fixed verbosity — shorter answers on simple lookups, much longer on open-ended analysis. If the product depends on a particular length or style, tune the prompt explicitly. To reduce verbosity:
672 
673> *"Provide concise, focused responses. Skip non-essential context, and keep examples minimal."*
674 
675If you see specific kinds of over-verbosity (e.g. over-explaining), add instructions targeting those. Positive examples showing the desired level of concision tend to be more effective than negative examples or instructions telling the model what not to do. Do **not** assume existing "be concise" instructions should be removed — test first.
676 
677**Tone and writing style.** Opus 4.7 is more direct and opinionated, with less validation-forward phrasing and fewer emoji than Opus 4.6's warmer style. As with any new model, prose style on long-form writing may shift. If the product relies on a specific voice, re-evaluate style prompts against the new baseline. If a warmer or more conversational voice is wanted, specify it:
678 
679> *"Use a warm, collaborative tone. Acknowledge the user's framing before answering."*
680 
681**`effort` matters more than on any prior Opus.** Opus 4.7 respects `effort` levels more strictly, especially at the low end. At `low` and `medium` it scopes work to what was asked rather than going above and beyond — good for latency and cost, but on moderate tasks at `low` there is some risk of under-thinking.
682 
683- If shallow reasoning shows up on complex problems, raise `effort` to `high` or `xhigh` rather than prompting around it.
684- If `effort` must stay `low` for latency, add targeted guidance: *"This task involves multi-step reasoning. Think carefully through the problem before responding."*
685- **At `xhigh` or `max`, set a large `max_tokens`** so the model has room to think and act across tool calls and subagents. Start at 64K and tune from there. (`xhigh` is a new effort level on Opus 4.7, between `high` and `max`.)
686 
687Adaptive-thinking triggering is also steerable. If the model thinks more often than wanted — which can happen with large or complex system prompts — add: *"Thinking adds latency and should only be used when it will meaningfully improve answer quality — typically for problems that require multi-step reasoning. When in doubt, respond directly."*
688 
689**Uses tools less often by default.** Opus 4.7 tends to use tools less often than 4.6 and to use reasoning more. This produces better results in most cases, but for products that rely on tools (search/retrieval, function-calling, computer-use steps), it can drop tool-use rate. Two levers:
690 
691- **Raise `effort`** — `high` or `xhigh` show substantially more tool usage in agentic search and coding, and are especially useful for knowledge work.
692- **Prompt for it** — be explicit in tool descriptions or the system prompt about when and how to use the tool, and encourage the model to err on the side of using it more often:
693 
694> *"When the answer depends on information not present in the conversation, you MUST call the `search` tool before answering — do not answer from prior knowledge."*
695 
696**Fewer subagents by default.** Opus 4.7 tends to spawn fewer subagents than 4.6. This is steerable — give explicit guidance on when delegation is desirable. For a coding agent, for example:
697 
698> *"Do NOT spawn a subagent for work you can complete directly in a single response (e.g. refactoring a function you can already see). Spawn multiple subagents in the same turn when fanning out across items or reading multiple files."*
699 
700**Design and frontend coding.** Opus 4.7 has stronger design instincts than 4.6, with a consistent default house style: warm cream/off-white backgrounds (around `#F4F1EA`), serif display type (Georgia, Fraunces, Playfair), italic word-accents, and a terracotta/amber accent. This reads well for editorial, hospitality, and portfolio briefs, but will feel off for dashboards, dev tools, fintech, healthcare, or enterprise apps — and it appears in slide decks as well as web UIs.
701 
702The default is persistent. Generic instructions ("don't use cream," "make it clean and minimal") tend to shift the model to a different fixed palette rather than producing variety. Two approaches work reliably:
703 
7041. **Specify a concrete alternative.** The model follows explicit specs precisely — give exact hex values, typefaces, and layout constraints.
7052. **Have the model propose options before building.** This breaks the default and gives the user control:
706 
707   > *"Before building, propose 4 distinct visual directions tailored to this brief (each as: bg hex / accent hex / typeface — one-line rationale). Ask the user to pick one, then implement only that direction."*
708 
709If the caller previously relied on `temperature` for design variety, use approach (2) — it produces meaningfully different directions across runs.
710 
711Opus 4.7 also requires less frontend-design prompting than previous models to avoid generic "AI slop" aesthetics. Where earlier models needed a lengthy anti-slop snippet, Opus 4.7 generates distinctive, creative frontends with a much shorter nudge. This snippet works well alongside the variety approaches above:
712 
713> *"NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white or dark backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character. Use unique fonts, cohesive colors and themes, and animations for effects and micro-interactions."*
714 
715**Interactive coding products.** Opus 4.7's token usage and behavior can differ between autonomous, asynchronous coding agents with a single user turn and interactive, synchronous coding agents with multiple user turns. Specifically, it tends to use more tokens in interactive settings, primarily because it reasons more after user turns. This can improve long-horizon coherence, instruction following, and coding capabilities in long interactive coding sessions, but also comes with more token usage. To maximize both performance and token efficiency in coding products, use `effort: "xhigh"` or `"high"`, add autonomous features (like an auto mode), and reduce the number of human interactions required from users.
716 
717When limiting required user interactions, specify the task, intent, and relevant constraints upfront in the first human turn. Well-specified, clear, and accurate task descriptions upfront help maximize autonomy and intelligence while minimizing extra token usage after user turns — because Opus 4.7 is more autonomous than prior models, this usage pattern helps to maximize performance. In contrast, ambiguous or underspecified prompts conveyed progressively over multiple user turns tend to reduce token efficiency and sometimes performance.
718 
719**Code review.** Opus 4.7 is meaningfully better at finding bugs than prior models, with both higher recall and precision. However, if a code-review harness was tuned for an earlier model, it may initially show *lower* recall — this is likely a harness effect, not a capability regression. When a review prompt says "only report high-severity issues," "be conservative," or "don't nitpick," Opus 4.7 follows that instruction more faithfully than earlier models did: it investigates just as thoroughly, identifies the bugs, and then declines to report findings it judges to be below the stated bar. Precision rises, but measured recall can fall even though underlying bug-finding has improved.
720 
721Recommended prompt language:
722 
723> *"Report every issue you find, including ones you are uncertain about or consider low-severity. Do not filter for importance or confidence at this stage — a separate verification step will do that. Your goal here is coverage: it is better to surface a finding that later gets filtered out than to silently drop a bug. For each finding, include your confidence level and an estimated severity so a downstream filter can rank them."*
724 
725This can be used without an actual second step, but moving confidence filtering out of the finding step often helps. If the harness has a separate verification/dedup/ranking stage, tell the model explicitly that its job at the finding stage is coverage, not filtering. If single-pass self-filtering is wanted, be concrete about the bar rather than using qualitative terms like "important" — e.g. *"report any bugs that could cause incorrect behavior, a test failure, or a misleading result; only omit nits like pure style or naming preferences."* Iterate on prompts against a subset of evals to validate recall or F1 gains.
726 
727**Computer use.** Computer use works across resolutions up to the new 2576px / 3.75MP maximum. Sending images at **1080p** provides a good balance of performance and cost. For particularly cost-sensitive workloads, **720p** or **1366×768** are lower-cost options with strong performance. Test to find the ideal settings for the use case; experimenting with `effort` can also help tune behavior.
728 
729---
730 
731## Opus 4.7 Migration Checklist
732 
733Every item is tagged: **`[BLOCKS]`** items cause a 400 error, infinite loop, silent truncation, or empty output if missed — apply these as code edits, not as suggestions. **`[TUNE]`** items are quality/cost adjustments — surface them to the user as recommendations.
734 
735`[BLOCKS]` items prefixed with **"If…"** or **"At…"** are conditional. Before working through the list, **scan the file** for the conditions: does it surface thinking text to a UI/log? Does it set `output_config.effort` to `"x-high"` or `"max"`? Is it a security workload? Is it a multi-turn agentic loop? Apply only the items whose condition matches.
736 
737- [ ] **[BLOCKS]** Replace `thinking: {type: "enabled", budget_tokens: N}` with `thinking: {type: "adaptive"}` + `output_config.effort`; delete `budget_tokens` plumbing entirely
738- [ ] **[BLOCKS]** Strip `temperature`, `top_p`, `top_k` from request construction
739- [ ] **[BLOCKS]** If thinking content is surfaced to users or stored in logs: add `thinking.display: "summarized"` (otherwise the rendered text is empty)
740- [ ] **[BLOCKS]** At `output_config.effort` of `xhigh` or `max`: set `max_tokens` ≥ 64000 (otherwise output truncates mid-thought)
741- [ ] **[TUNE]** Give `max_tokens` and compaction triggers extra headroom; re-run `count_tokens()` against `claude-opus-4-7` on representative prompts to re-baseline (no blanket multiplier)
742- [ ] **[TUNE]** Re-baseline cost and rate-limit dashboards *before* reacting to measured shifts
743- [ ] **[TUNE]** Re-evaluate `effort` per route — use `xhigh` for coding/agentic and a minimum of `high` for most intelligence-sensitive work; it matters more on 4.7 than any prior Opus
744- [ ] **[TUNE]** Multi-turn agentic loops: adopt the API-native Task Budgets (`output_config.task_budget`, beta `task-budgets-2026-03-13`, minimum 20k tokens) — this is for capping *cumulative* spend across a loop; per-turn depth is `effort`
745- [ ] **[TUNE]** Check for ambiguous or underspecified instructions that relied on 4.6 generalizing intent, and update them to be clearer or more precise — 4.7 follows them literally
746- [ ] **[TUNE]** Tool-use workloads: add explicit when/how-to-use guidance to tool descriptions (4.7 reaches for tools less often)
747- [ ] **[TUNE]** Verbosity: test existing length instructions before changing them — 4.7 calibrates length to task complexity, so tune for the desired output rather than assuming a direction
748- [ ] **[TUNE]** Remove forced-progress-update scaffolding (*"after every N tool calls…"*)
749- [ ] **[TUNE]** Remove knowledge-work verification scaffolding (*"double-check the slide layout…"*) and re-baseline
750- [ ] **[TUNE]** Add tone instruction if a warmer / more conversational voice is needed; re-evaluate style prompts on writing-heavy routes
751- [ ] **[TUNE]** Subagent tool present: add explicit spawn / don't-spawn guidance
752- [ ] **[TUNE]** Frontend/design output: specify a concrete palette/typeface, or have the model propose 4 visual directions before building (the default cream/serif house style is persistent)
753- [ ] **[TUNE]** Interactive coding products: use `effort: "xhigh"` or `"high"`, add autonomous features (e.g. an auto mode) to reduce human interactions, and specify task/intent/constraints upfront in the first turn
754- [ ] **[TUNE]** Code-review harnesses: remove or loosen "only report high-severity" / "be conservative" filters and have the model report every finding with confidence + severity; move filtering to a downstream step (4.7 follows severity filters more literally, which can depress measured recall)
755- [ ] **[TUNE]** Vision-heavy pipelines (screenshots, charts, document understanding): leave images at native resolution up to 2576px long edge for the accuracy gain; remove any scale-factor math from coordinate handling (coords are now 1:1 with pixels). No beta header / opt-in needed — high-res is automatic on Opus 4.7.
756- [ ] **[TUNE]** Computer-use pipelines: send screenshots at 1080p for a good performance/cost balance (720p or 1366×768 for cost-sensitive workloads); experiment with `effort` to tune behavior
757- [ ] **[TUNE]** Cost-sensitive image pipelines: full-res images on 4.7 use up to ~4784 tokens vs ~1,600 on prior models (~3×). Downsampling client-side before upload avoids the increase, but **do not downsample by default** — if you're unsure whether fidelity is needed, ask the user. Re-baseline with `count_tokens()` on representative images before reacting to cost shifts.
758 
759---
760 
761## Verify the Migration
762 
763After updating, spot-check that the new model is actually being used. Replace `YOUR_TARGET_MODEL` with the model string you migrated to (e.g. `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`) and keep the assertion prefix in sync:
764 
765```python
766YOUR_TARGET_MODEL = "claude-opus-4-7"  # or "claude-opus-4-6", "claude-sonnet-4-6", "claude-haiku-4-5"
767response = client.messages.create(model=YOUR_TARGET_MODEL, max_tokens=64, messages=[...])
768assert response.model.startswith(YOUR_TARGET_MODEL), response.model
769```
770 
771For rate-limit headroom changes, pricing, or capability deltas (vision, structured outputs, effort support), query the Models API:
772 
773```python
774m = client.models.retrieve(YOUR_TARGET_MODEL)
775m.max_input_tokens, m.max_tokens
776m.capabilities["effort"]["max"]["supported"]
777```
778 
779See `shared/models.md` for the full capability lookup pattern.
780

Marketplace

Source from repo

Building LLM-Powered Applications with Claude

Build LLM-powered apps with the Anthropic Claude API or SDK across Python, TypeScript, Java, Go, Ruby, C#, and PHP.

anthropicsGitHub anthropicsOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

517.2 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

shared/model-migration.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown780 linesFree

shared/model-migration.md

1# Model Migration Guide
2 
3How to move existing code to newer Claude models. Covers breaking changes, deprecated parameters, and drop-in replacements for retired models.
4 
5For the latest, authoritative version (with code samples in every supported language), WebFetch the **Migration Guide** URL from `shared/live-sources.md`. Use this file for the consolidated, skill-resident reference; fall back to the live docs whenever a model launch or breaking change may have shifted the picture.
6 
7**This file is large.** Use the section names below to jump (or `Grep` this file for the heading text). Read Step 0 and Step 1 first — they apply to every migration. Then read only the per-target section for the model you are migrating to.
8 
9| Section | When you need it |
10|---|---|
11| Step 0: Confirm the migration scope | Always — before any edits |
12| Step 1: Classify each file | Always — decides whether to swap, add-alongside, or skip |
13| Per-SDK Syntax Reference | Translate the Python examples in this guide to TypeScript / Go / Ruby / Java / C# / PHP |
14| Destination Models / Retired Model Replacements | Picking a target model |
15| Breaking Changes by Source Model | Migrating to Opus 4.6 / Sonnet 4.6 |
16| Migrating to Opus 4.7 | Migrating to Opus 4.7 (breaking changes, silent defaults, behavioral shifts) |
17| Opus 4.7 Migration Checklist | The required vs optional items for 4.7, tagged `[BLOCKS]` / `[TUNE]` |
18| Verify the Migration | After edits — runtime spot-check |
19 
20**TL;DR:** Change the model ID string. If you were using `budget_tokens`, switch to `thinking: {type: "adaptive"}`. If you were using assistant prefills, they 400 on both Opus 4.6 and Sonnet 4.6 — switch to one of the prefill replacements (most often `output_config.format`; see the table in Breaking Changes by Source Model). If you're moving from Sonnet 4.5 to Sonnet 4.6, set `effort` explicitly — 4.6 defaults to `high`. Remove the `effort-2025-11-24` and `fine-grained-tool-streaming-2025-05-14` beta headers (GA on 4.6); remove `interleaved-thinking-2025-05-14` once you're on adaptive thinking (keep it only while using the transitional `budget_tokens` escape hatch). Then drop back from `client.beta.messages.create` to `client.messages.create`. Dial back any aggressive "CRITICAL: YOU MUST" tool instructions; 4.6 follows the system prompt much more closely.
21 
22---
23 
24## Step 0: Confirm the migration scope
25 
26**Before any Write, Edit, or MultiEdit call, confirm the scope.** If the user's request does not explicitly name a single file, a specific directory, or an explicit file list, **ask first — do not start editing**. This is non-negotiable: even imperative-sounding requests like "migrate my codebase", "move my project to X", "upgrade to Sonnet 4.6", or bare "migrate to Opus 4.7" leave the scope ambiguous and require a clarifying question. Phrases like "my project", "my code", "my codebase", "the whole thing", "everywhere", or "across the repo" are **ambiguous, not directive** — they tell you *what* to do but not *where*. Ask before doing.
27 
28Offer the common scopes explicitly and wait for the answer before touching any file:
29 
301. The entire working directory
312. A specific subdirectory (e.g. `src/`, `app/`, `services/billing/`)
323. A specific file or a list of files
33 
34Surface this as a single clarifying question so the user can answer in one turn. **Proceed without asking only when the scope is already unambiguous** — the user named an exact file ("migrate `extract.py` to Sonnet 4.6"), pointed at a specific directory ("migrate everything under `services/billing/` to Opus 4.6"), listed specific files ("update `a.py` and `b.py`"), or already answered the scope question in an earlier turn. If you can answer the question "which files is this change going to touch?" with a precise list from the prompt alone, proceed. If not, ask.
35 
36**Worked example.** If the user says *"Move my project to Opus 4.6. I want adaptive thinking everywhere it makes sense."* you do not know whether "my project" means the whole working directory, just `src/`, just the production code, or something else — the `everywhere` makes the intent clear (update every call site *within scope*) but the scope itself is still not defined. Do not start editing. Respond with:
37 
38> Before I start editing, can you confirm the scope? I can migrate:
39> 1. Every `.py` file in the working directory
40> 2. Just the files under `src/` (production code)
41> 3. A specific subdirectory or list of files you name
42>
43> Which one?
44 
45Then wait for the answer. The same applies to *"Migrate to Opus 4.7"* and bare *"Help me upgrade to Sonnet 4.6"* — ask before editing.
46 
47**Sizing the scope question (large repos).** Before asking, get a per-directory count so the user can pick concretely:
48 
49```sh
50rg -l "<old-model-id>" --type-not md | cut -d/ -f1 | sort | uniq -c | sort -rn
51```
52 
53Present the breakdown in your scope question (e.g. *"Found 217 references across 3 directories: api/ (130), api-go/ (62), routing/ (25). Which to migrate?"*). Also confirm `git status` is clean before surveying — unexpected modifications mean a concurrent process; stop and investigate before proceeding.
54 
55---
56 
57## Step 1: Classify each file
58 
59Not every file that contains the old model ID is a **caller** of the API. Before editing, classify each file into one of these buckets — the right action differs:
60 
61| # | Bucket | What it looks like | Action |
62|---|---|---|---|
63| 1 | **Calls the API/SDK** | `client.messages.create(model=…)`, `anthropic.Anthropic()`, request payloads | Swap the model ID **and** apply the breaking-change checklist for the target version (below). |
64| 2 | **Defines or serves the model** | Model registries, OpenAPI specs, routing/queue configs, model-policy enums, generated catalogs | The old entry **stays** (the model is still served). Ask whether to (a) add the new model alongside, (b) leave alone, or (c) retire the old model — never blind-replace. **If you can't ask, default to (a): add the new model alongside and flag it** — replacing would de-register a model that's still in production. |
65| 3 | **References the ID as an opaque string** | UI fallback constants, capability-gate substring checks, generic test fixtures, label parsers, env defaults | Usually swap the string and verify any parser/regex/substring match handles the new ID — but check the sub-cases below first. |
66| 4 | **Suffixed variant ID** | `claude-<model>-<suffix>` like `-fast`, `-1024k`, `-200k`, `[1m]`, dated snapshots | These are deployment/routing identifiers, not the public model ID. **Do not assume a new-model equivalent exists.** Verify in the registry first; if absent, leave the string alone and flag it. |
67 
68**Bucket 3 sub-cases — before swapping a string reference, check:**
69 
70- **Capability gate** (e.g. `if 'opus-4-6' in model_id:` enables a feature) → **add the new ID alongside**, don't replace. The old model is still served and still has the capability, so replacing would silently disable the feature for any old-model traffic that still flows through. If you know no old-model traffic will hit this gate (single-caller codebase fully migrating), replacing is fine; if unsure, add alongside.
71- **Registry-assert test** (e.g. `assert "claude-X" in supported_models`, `test_X_has_N_clusters`) → **add an assertion for the new model alongside; keep the old one.** The old model is still served, so its assertion stays valid — but the registry should also include the new model, so assert that too. Heuristic: if the test references multiple model versions in a list, it's a registry test; if one model in a struct compared only to itself, it's a generic fixture.
72- **Frozen / generated snapshot** → **regenerate**, don't hand-edit.
73- **Coupled to a definer** (e.g. an integration test that passes model authorization via a shared `conftest` seed list, or asserts on a billing-tier / rate-limit-group enum or a generated SKU/pricing catalog) → **verify the definer has a new-model entry first.** If not, add a seed entry (reusing the nearest existing tier as a placeholder); if you can't confidently do that, ask the user how to populate the definer. **Do not skip the test.** Swapping without populating the definer will make the test fail at runtime.
74 
75When migrating tests specifically: breaking parameters (`temperature`, `top_p`, `budget_tokens`) are usually absent — test fixtures rarely set sampling params on placeholder models. The breaking-change scan is still required, but expect mostly clean results.
76 
77**Find intentionally-flagged sync points first.** Many codebases tag spots that must change at every model launch with comment markers like `MODEL LAUNCH`, `KEEP IN SYNC`, `@model-update`, or similar. Grep for whatever convention the repo uses *before* the broad model-ID grep — those markers point at the load-bearing changes.
78 
79---
80 
81## Per-SDK Syntax Reference
82 
83Code examples in this guide are Python. **The same fields exist in every official Anthropic SDK** — Stainless generates all 7 from the same OpenAPI spec, so JSON field names map 1:1 with only case-convention differences. Use the rows below to translate the Python examples to the SDK you are migrating.
84 
85> **Verify type and method names against the SDK source before writing them into customer code.** WebFetch the relevant repository from the SDK source-code table in `shared/live-sources.md` (one row per SDK) and confirm the exact symbol — particularly for typed SDKs (Go, Java, C#) where union/builder names can differ from the JSON shape. Do not guess type names that aren't in the table below or in `<lang>/claude-api/README.md`.
86 
87 
88### `thinking` — `budget_tokens` → adaptive
89 
90| SDK | Before | After |
91|---|---|---|
92| Python | `thinking={"type": "enabled", "budget_tokens": N}` | `thinking={"type": "adaptive"}` |
93| TypeScript | `thinking: { type: 'enabled', budget_tokens: N }` | `thinking: { type: 'adaptive' }` |
94| Go | `Thinking: anthropic.ThinkingConfigParamOfEnabled(N)` | `Thinking: anthropic.ThinkingConfigParamUnion{OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{}}` |
95| Ruby | `thinking: { type: "enabled", budget_tokens: N }` | `thinking: { type: "adaptive" }` |
96| Java | `.thinking(ThinkingConfigEnabled.builder().budgetTokens(N).build())` | `.thinking(ThinkingConfigAdaptive.builder().build())` |
97| C# | `Thinking = new ThinkingConfigEnabled { BudgetTokens = N }` | `Thinking = new ThinkingConfigAdaptive()` |
98| PHP | `thinking: ['type' => 'enabled', 'budget_tokens' => N]` | `thinking: ['type' => 'adaptive']` |
99 
100### Sampling parameters — `temperature` / `top_p` / `top_k`
101 
102(Remove the field entirely on Opus 4.7; on Claude 4.x keep at most one of `temperature` or `top_p`.)
103 
104| SDK | Field(s) to remove |
105|---|---|
106| Python | `temperature=…`, `top_p=…`, `top_k=…` |
107| TypeScript | `temperature: …`, `top_p: …`, `top_k: …` |
108| Go | `Temperature: anthropic.Float(…)`, `TopP: anthropic.Float(…)`, `TopK: anthropic.Int(…)` |
109| Ruby | `temperature: …`, `top_p: …`, `top_k: …` |
110| Java | `.temperature(…)`, `.topP(…)`, `.topK(…)` |
111| C# | `Temperature = …`, `TopP = …`, `TopK = …` |
112| PHP | `temperature: …`, `topP: …`, `topK: …` |
113 
114### Prefill replacement — structured outputs via `output_config.format`
115 
116| SDK | Remove (last assistant turn) | Add |
117|---|---|---|
118| Python | `{"role": "assistant", "content": "…"}` | `output_config={"format": {"type": "json_schema", "schema": SCHEMA}}` |
119| TypeScript | `{ role: 'assistant', content: '…' }` | `output_config: { format: { type: 'json_schema', schema: SCHEMA } }` |
120| Go | trailing `anthropic.MessageParam{Role: "assistant", …}` | `OutputConfig: anthropic.OutputConfigParam{Format: anthropic.JSONOutputFormatParam{…}}` |
121| Ruby | `{ role: "assistant", content: "…" }` | `output_config: { format: { type: "json_schema", schema: SCHEMA } }` |
122| Java | trailing `Message.builder().role(ASSISTANT)…` | `.outputConfig(OutputConfig.builder().format(JsonOutputFormat.builder()…build()).build())` |
123| C# | trailing `new Message { Role = "assistant", … }` | `OutputConfig = new OutputConfig { Format = new JsonOutputFormat { … } }` |
124| PHP | trailing `['role' => 'assistant', 'content' => '…']` | `outputConfig: ['format' => ['type' => 'json_schema', 'schema' => $SCHEMA]]` |
125 
126### `thinking.display` — opt back into summarized reasoning (Opus 4.7)
127 
128| SDK | Add |
129|---|---|
130| Python | `thinking={"type": "adaptive", "display": "summarized"}` |
131| TypeScript | `thinking: { type: 'adaptive', display: 'summarized' }` |
132| Go | `Thinking: anthropic.ThinkingConfigParamUnion{OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{Display: anthropic.ThinkingConfigAdaptiveDisplaySummarized}}` |
133| Ruby | `thinking: { type: "adaptive", display: "summarized" }` (or `display_:` when constructing the model class directly) |
134| Java | `.thinking(ThinkingConfigAdaptive.builder().display(ThinkingConfigAdaptive.Display.SUMMARIZED).build())` |
135| C# | `Thinking = new ThinkingConfigAdaptive { Display = Display.Summarized }` |
136| PHP | `thinking: ['type' => 'adaptive', 'display' => 'summarized']` |
137 
138For any field not in these tables, the JSON key in the Python example translates directly: `snake_case` for Python/TypeScript/Ruby, `camelCase` named args for PHP, `PascalCase` struct fields for Go/C#, `camelCase` builder methods for Java.
139 
140---
141 
142## Explain every change you make
143 
144Migration edits often look arbitrary to a user who hasn't read the release notes — a removed `temperature`, a deleted prefill, a rewritten system-prompt sentence. **For each edit, tell the user what you changed and why**, tied to the specific API or behavioral change that motivates it. Do this in your summary as you work, not just at the end.
145 
146Be especially explicit about **system-prompt edits**. Users are rightly protective of their prompts, and prompt-tuning changes are judgment calls (not hard API requirements). For any prompt edit:
147 
148- Quote the before and after text.
149- State the behavioral shift that motivates it (e.g. *"Opus 4.7 calibrates response length to task complexity, so I added an explicit length instruction"*, or *"4.6 follows instructions more literally, so 'CRITICAL: YOU MUST use the search tool' will now overtrigger — softened to 'Use the search tool when…'"*).
150- Make clear which prompt edits are **optional tuning** (tone, length, subagent guidance) versus which code edits are **required to avoid a 400** (sampling params, `budget_tokens`, prefills). Never present an optional prompt change as mandatory.
151 
152If you're applying several prompt-tuning edits at once, offer them as a short list the user can accept or decline item-by-item rather than silently rewriting their system prompt.
153 
154---
155 
156## Before You Migrate
157 
1581. **Confirm the target model ID.** Use only the exact strings from `shared/models.md` — do not append date suffixes to aliases (`claude-opus-4-6`, not `claude-opus-4-6-20251101`). Guessing an ID will 404.
1592. **Check which features your code uses** with this checklist:
160   - `thinking: {type: "enabled", budget_tokens: N}` → migrate to adaptive thinking on Opus 4.6 / Sonnet 4.6 (still functional but deprecated)
161   - Assistant-turn prefills (`messages` ending with `role: "assistant"`) → must change on Opus 4.6 / Sonnet 4.6 (returns 400)
162   - `output_format` parameter on `messages.create()` → must change on all models (deprecated API-wide)
163   - `max_tokens > ~16000` → must stream on any model (above ~16K risks SDK HTTP timeouts). When streaming, Sonnet 4.6 / Haiku 4.5 cap at 64K and Opus 4.6 caps at 128K
164   - Beta headers `effort-2025-11-24`, `fine-grained-tool-streaming-2025-05-14`, `interleaved-thinking-2025-05-14` → GA on 4.6, remove them and switch from `client.beta.messages.create` to `client.messages.create`
165   - Moving Sonnet 4.5 → Sonnet 4.6 with no `effort` set → 4.6 defaults to `high`, which may change your latency/cost profile
166   - System prompts with `CRITICAL`, `MUST`, `If in doubt, use X` language → likely to overtrigger on 4.6 (see Prompt-Behavior Changes)
167   - Coming from 3.x / 4.0 / 4.1: also check sampling params (`temperature` + `top_p`), tool versions (`text_editor_20250728`), `refusal` + `model_context_window_exceeded` stop reasons, trailing-newline tool-param handling
1683. **Test on a single request first.** Run one call against the new model, inspect the response, then roll out.
169 
170---
171 
172## Destination Models (recommended targets)
173 
174| If you're on…                         | Migrate to         | Why                                               |
175| ------------------------------------- | ------------------ | ------------------------------------------------- |
176| Opus 4.6                              | `claude-opus-4-7`  | Most capable model; adaptive thinking only; high-res vision; see Migrating to Opus 4.7 |
177| Opus 4.0 / 4.1 / 4.5 / Opus 3         | `claude-opus-4-6`  | Most intelligent 4.x before 4.7; adaptive thinking; 128K output |
178| Sonnet 4.0 / 4.5 / 3.7 / 3.5          | `claude-sonnet-4-6`| Best speed / intelligence balance; adaptive thinking; 64K output |
179| Haiku 3 / 3.5                         | `claude-haiku-4-5` | Fastest and most cost-effective                   |
180 
181Default to the latest Opus for the caller's tier unless they explicitly chose otherwise. If you're moving from Opus 4.5 or older directly to Opus 4.7, apply the 4.6 migration first, then layer the Opus 4.7 changes on top (see Migrating to Opus 4.7 below).
182 
183---
184 
185## Retired Model Replacements
186 
187These models return 404 — update immediately:
188 
189| Retired model                 | Retired       | Drop-in replacement  |
190| ----------------------------- | ------------- | -------------------- |
191| `claude-3-7-sonnet-20250219`  | Feb 19, 2026  | `claude-sonnet-4-6`  |
192| `claude-3-5-haiku-20241022`   | Feb 19, 2026  | `claude-haiku-4-5`   |
193| `claude-3-opus-20240229`      | Jan 5, 2026   | `claude-opus-4-7`    |
194| `claude-3-5-sonnet-20241022`  | Oct 28, 2025  | `claude-sonnet-4-6`  |
195| `claude-3-5-sonnet-20240620`  | Oct 28, 2025  | `claude-sonnet-4-6`  |
196| `claude-3-sonnet-20240229`    | Jul 21, 2025  | `claude-sonnet-4-6`  |
197| `claude-2.1`, `claude-2.0`    | Jul 21, 2025  | `claude-sonnet-4-6`  |
198 
199## Deprecated Models (retiring soon)
200 
201| Model                         | Retires       | Replacement          |
202| ----------------------------- | ------------- | -------------------- |
203| `claude-3-haiku-20240307`     | Apr 19, 2026  | `claude-haiku-4-5`   |
204| `claude-opus-4-20250514`      | June 15, 2026 | `claude-opus-4-7`    |
205| `claude-sonnet-4-20250514`    | June 15, 2026 | `claude-sonnet-4-6`  |
206 
207---
208 
209## Breaking Changes by Source Model
210 
211### Migrating from Sonnet 4.5 to Sonnet 4.6 (effort default change)
212 
213Sonnet 4.5 had no `effort` parameter; Sonnet 4.6 defaults to `high`. If you just switch the model string and do nothing else, you may see noticeably higher latency and token usage. Set `effort` explicitly.
214 
215**Recommended starting points:**
216 
217| Workload                                          | Start at       | Notes                                                                                                    |
218| ------------------------------------------------- | -------------- | -------------------------------------------------------------------------------------------------------- |
219| Chat, classification, content generation          | `low`          | With `thinking: {"type": "disabled"}` you'll see similar or better performance vs. Sonnet 4.5 no-thinking |
220| Most applications (balanced)                      | `medium`       | The default sweet spot for quality vs. cost                                                              |
221| Agentic coding, tool-heavy workflows              | `medium`       | Pair with adaptive thinking and a generous `max_tokens` (up to 64K with streaming — Sonnet 4.6's ceiling) |
222| Autonomous multi-step agents, long-horizon loops  | `high`         | Scale down to `medium` if latency/tokens become a concern                                                 |
223| Computer-use agents                               | `high` + adaptive | Sonnet 4.6's best computer-use accuracy is on adaptive + high                                          |
224 
225For non-thinking chat workloads specifically:
226 
227```python
228client.messages.create(
229    model="claude-sonnet-4-6",
230    max_tokens=8192,
231    thinking={"type": "disabled"},
232    output_config={"effort": "low"},
233    messages=[{"role": "user", "content": "..."}],
234)
235```
236 
237**When to use Opus 4.6 instead:** hardest and longest-horizon problems — large code migrations, deep research, extended autonomous work. Sonnet 4.6 wins on fast turnaround and cost efficiency.
238 
239### Migrating to Opus 4.6 / Sonnet 4.6 (from any older model)
240 
241**1. Manual extended thinking is deprecated — use adaptive thinking.**
242 
243`thinking: {type: "enabled", budget_tokens: N}` (manual extended thinking with a fixed token budget) is deprecated on Opus 4.6 and Sonnet 4.6. Replace it with `thinking: {type: "adaptive"}`, which lets Claude decide when and how much to think. Adaptive thinking also enables interleaved thinking automatically (no beta header needed).
244 
245```python
246# Old (still works on older models, deprecated on 4.6)
247response = client.messages.create(
248    model="claude-sonnet-4-5",
249    max_tokens=16000,
250    thinking={"type": "enabled", "budget_tokens": 8000},
251    messages=[...]
252)
253 
254# New (Opus 4.6 / Sonnet 4.6)
255response = client.messages.create(
256    model="claude-opus-4-6",  # or "claude-sonnet-4-6"
257    max_tokens=16000,
258    thinking={"type": "adaptive"},
259    output_config={"effort": "high"},  # optional: low | medium | high | max
260    messages=[...]
261)
262```
263 
264Adaptive thinking is the long-term target, and on internal evaluations it outperforms manual extended thinking. Move when you can.
265 
266**Transitional escape hatch:** manual extended thinking is still *functional* on Opus 4.6 and Sonnet 4.6 (deprecated, will be removed in a future release). If you need a hard ceiling while migrating — for example, to bound token spend on a runaway workload before you've tuned `effort` — you can keep `budget_tokens` around alongside an explicit `effort` value, then remove it in a follow-up. `budget_tokens` must be strictly less than `max_tokens`:
267 
268```python
269# Transitional only — deprecated, plan to remove
270client.messages.create(
271    model="claude-sonnet-4-6",
272    max_tokens=16384,
273    thinking={"type": "enabled", "budget_tokens": 8192},  # must be < max_tokens
274    output_config={"effort": "medium"},
275    messages=[...],
276)
277```
278 
279If the user asks for a "thinking budget" on 4.6, the preferred answer is `effort` — use `low`, `medium`, `high`, or `max` (Opus-tier only — not Sonnet or Haiku) rather than a token count.
280 
281**2. Effort parameter (Opus 4.5, Opus 4.6, Sonnet 4.6 only).**
282 
283Controls thinking depth and overall token spend. Goes inside `output_config`, not top-level. Default is `high`. `max` is Opus-tier only (Opus 4.6 and later — not Sonnet or Haiku). Errors on Sonnet 4.5 and Haiku 4.5.
284 
285```python
286output_config={"effort": "medium"}  # often the best cost / quality balance
287```
288 
289### Migrating to the 4.6 family (Opus 4.6 and Sonnet 4.6)
290 
291**3. Assistant-turn prefills return 400 (Opus 4.6 and Sonnet 4.6).**
292 
293Prefilled responses on the final assistant turn are no longer supported on either Opus 4.6 or Sonnet 4.6 — both return a 400. Adding assistant messages *elsewhere* in the conversation (e.g., for few-shot examples) still works. Pick the replacement that matches what the prefill was doing:
294 
295| Prefill was used for                               | Replacement                                                                                                                               |
296| -------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
297| Forcing JSON / YAML / schema output                | `output_config.format` with a `json_schema` — see example below                                                                           |
298| Forcing a classification label                     | Tool with an enum field containing valid labels, or structured outputs                                                                    |
299| Skipping preambles (`Here is the summary:\n`)      | System prompt instruction: *"Respond directly without preamble. Do not start with phrases like 'Here is...' or 'Based on...'."*           |
300| Steering around bad refusals                       | Usually no longer needed — 4.6 refuses far more appropriately. Plain user-turn prompting is sufficient.                                   |
301| Continuing an interrupted response                 | Move continuation into the user turn: *"Your previous response was interrupted and ended with `[last text]`. Continue from there."*     |
302| Injecting reminders / context hydration            | Inject into the user turn instead. For complex agent harnesses, expose context via a tool call or during compaction.                      |
303 
304```python
305# Old (fails on Opus 4.6 / Sonnet 4.6) — prefill forcing JSON shape
306messages=[
307    {"role": "user", "content": "Extract the name."},
308    {"role": "assistant", "content": "{\"name\": \""},
309]
310 
311# New — structured outputs replace the prefill
312response = client.messages.create(
313    model="claude-opus-4-6",
314    max_tokens=1024,
315    output_config={"format": {"type": "json_schema", "schema": {...}}},
316    messages=[{"role": "user", "content": "Extract the name."}],
317)
318```
319 
320**4. Stream for `max_tokens > ~16K` (all models); Opus 4.6 alone reaches 128K.**
321 
322Non-streaming requests hit SDK HTTP timeouts at high `max_tokens`, regardless of model — stream for anything above ~16K output. The streamable ceiling differs by model: Sonnet 4.6 and Haiku 4.5 cap at 64K, and Opus 4.6 alone goes up to 128K.
323 
324```python
325with client.messages.stream(model="claude-opus-4-6", max_tokens=64000, ...) as stream:
326    message = stream.get_final_message()
327```
328 
329**5. Tool-call JSON escaping may differ (Opus 4.6 and Sonnet 4.6).**
330 
331Both 4.6 models can produce tool call `input` fields with Unicode or forward-slash escaping. Always parse with `json.loads()` / `JSON.parse()` — never raw-string-match the serialized input.
332 
333### All models
334 
335**6. `output_format` → `output_config.format` (API-wide).**
336 
337The old top-level `output_format` parameter on `messages.create()` is deprecated. Use `output_config.format` instead. This is not 4.6-specific — applies to every model.
338 
339---
340 
341## Beta Headers to Remove on 4.6
342 
343Several beta headers that were required on 4.5 are now GA on 4.6 and should be removed. Leaving them in is harmless but misleading; removing them also lets you move from `client.beta.messages.create(...)` back to `client.messages.create(...)`.
344 
345| Header                                    | Status on 4.6                                              | Action                                                  |
346| ----------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------- |
347| `effort-2025-11-24`                       | Effort parameter is GA                                     | Remove                                                  |
348| `fine-grained-tool-streaming-2025-05-14`  | GA                                                         | Remove                                                  |
349| `interleaved-thinking-2025-05-14`         | Adaptive thinking enables interleaved thinking automatically | Remove when using adaptive thinking; still functional on Sonnet 4.6 *with* manual extended thinking, but that path is deprecated |
350| `token-efficient-tools-2025-02-19`        | Built in to all Claude 4+ models                           | Remove (no effect)                                      |
351| `output-128k-2025-02-19`                  | Built in to Claude 4+ models                               | Remove (no effect)                                      |
352 
353Once you remove all of these and finish moving to adaptive thinking, you can switch the SDK call site from the beta namespace back to the regular one:
354 
355```python
356# Before
357response = client.beta.messages.create(
358    model="claude-opus-4-5",
359    betas=["interleaved-thinking-2025-05-14", "effort-2025-11-24"],
360    ...
361)
362 
363# After
364response = client.messages.create(
365    model="claude-opus-4-6",
366    thinking={"type": "adaptive"},
367    output_config={"effort": "high"},
368    ...
369)
370```
371 
372---
373 
374## Additional Changes When Coming from 3.x / 4.0 / 4.1 → 4.6
375 
376If you're jumping from Opus 4.1, Sonnet 4, Sonnet 3.7, or an older Claude 3.x model directly to 4.6, apply everything above *plus* the items in this section. Users already on Opus 4.5 / Sonnet 4.5 can skip this.
377 
378**1. Sampling parameters: `temperature` OR `top_p`, not both.**
379 
380Passing both will error on every Claude 4+ model:
381 
382```python
383# Old (3.x only — errors on 4+)
384client.messages.create(temperature=0.7, top_p=0.9, ...)
385 
386# New
387client.messages.create(temperature=0.7, ...)  # or top_p, not both
388```
389 
390**2. Update tool versions.**
391 
392Legacy tool versions are not supported on 4+. **Both the `type` and the `name` field change** — `text_editor_20250728` and `str_replace_based_edit_tool` are a pair; updating one without the other 400s. Also remove the `undo_edit` command from your text-editor integration:
393 
394| Old                                               | New                                                     |
395| ------------------------------------------------- | ------------------------------------------------------- |
396| `text_editor_20250124` + `str_replace_editor`     | `text_editor_20250728` + `str_replace_based_edit_tool`  |
397| `code_execution_*` (earlier versions)             | `code_execution_20250825`                               |
398| `undo_edit` command                               | *(no longer supported — delete call sites)*             |
399 
400```python
401# Before
402tools = [{"type": "text_editor_20250124", "name": "str_replace_editor"}]
403 
404# After — BOTH fields change
405tools = [{"type": "text_editor_20250728", "name": "str_replace_based_edit_tool"}]
406```
407 
408**3. Handle the `refusal` stop reason.**
409 
410Claude 4+ can return `stop_reason: "refusal"` on the response. If your code only handles `end_turn` / `tool_use` / `max_tokens`, add a branch:
411 
412```python
413if response.stop_reason == "refusal":
414    # Surface the refusal to the user; do not retry with the same prompt
415    ...
416```
417 
418**4. Handle the `model_context_window_exceeded` stop reason (4.5+).**
419 
420Distinct from `max_tokens`: it means the model hit the *context window* limit, not the requested output cap. Handle both:
421 
422```python
423if response.stop_reason == "model_context_window_exceeded":
424    # Context window exhausted — compact or split the conversation
425    ...
426elif response.stop_reason == "max_tokens":
427    # Requested output cap hit — retry with higher max_tokens or stream
428    ...
429```
430 
431**5. Trailing newlines preserved in tool call string parameters (4.5+).**
432 
4334.5 and 4.6 preserve trailing newlines that older models stripped. If your tool implementations do exact string matching against tool-call `input` values (e.g., `if name == "foo"`), verify they still match when the model sends `"foo\n"`. Normalizing with `.rstrip()` on the receiving side is usually the simplest fix.
434 
435**6. Haiku: rate limits reset between generations.**
436 
437Haiku 4.5 has its own rate-limit pool separate from Haiku 3 / 3.5. If you're ramping traffic as you migrate, check your tier's Haiku 4.5 limits at [API rate limits](https://platform.claude.com/docs/en/api/rate-limits) — a quota that comfortably served Haiku 3.5 traffic may need a tier bump for the same volume on 4.5.
438 
439---
440 
441## Prompt-Behavior Changes (Opus 4.5 / 4.6, Sonnet 4.6)
442 
443These don't break your code, but prompts that worked on 4.5-and-earlier may over- or under-trigger on 4.6. Tune as needed.
444 
445**1. Aggressive instructions cause overtriggering.** Opus 4.5 and 4.6 follow the system prompt much more closely than earlier models. Prompts written to *overcome* the old reluctance are now too aggressive:
446 
447| Before (worked on 4.0 / 4.5)                | After (use on 4.6)                        |
448| ------------------------------------------- | ----------------------------------------- |
449| `CRITICAL: You MUST use this tool when...`  | `Use this tool when...`                   |
450| `Default to using [tool]`                   | `Use [tool] when it would improve X`      |
451| `If in doubt, use [tool]`                   | *(delete — no longer needed)*             |
452 
453If the model is now overtriggering a tool or skill, the fix is almost always to dial back the language, not to add more guardrails.
454 
455**2. Overthinking and excessive exploration (Opus 4.6).** At higher `effort` settings, Opus 4.6 explores more before answering. If that burns too many thinking tokens, lower `effort` first (`medium` is often the sweet spot) before adding prose instructions to constrain reasoning.
456 
457**3. Overeager subagent spawning (Opus 4.6).** Opus 4.6 has a strong preference for delegating to subagents. If you see it spawning a subagent for something a direct `grep` or `read` would solve, add guidance: *"Use subagents only for parallel or independent workstreams. For single-file reads or sequential operations, work directly."*
458 
459**4. Overengineering (Opus 4.5 / 4.6).** Both models may add extra files, abstractions, or defensive error handling beyond what was asked. If you want minimal changes, prompt for it explicitly: *"Only make changes directly requested. Don't add helpers, abstractions, or error handling for scenarios that can't happen."*
460 
461**5. LaTeX math output (Opus 4.6).** Opus 4.6 defaults to LaTeX (`\frac{}{}`, `$...$`) for math and technical content. If you need plain text, instruct it explicitly: *"Format all math as plain text — no LaTeX, no `$`, no `\frac{}{}`. Use `/` for division and `^` for exponents."*
462 
463**6. Skipped verbal summaries (4.6 family).** The 4.6 models are more concise and may skip the summary paragraph after a tool call, jumping straight to the next action. If you rely on those summaries for visibility, add: *"After completing a task that involves tool use, provide a brief summary of what you did."*
464 
465**7. "Think" as a trigger word (Opus 4.5 with thinking disabled).** When `thinking` is off, Opus 4.5 is particularly sensitive to the word *think* and may reason more than you want. Use `consider`, `evaluate`, or `reason through` instead.
466 
467---
468 
469## Model-ID Rename Quick Reference
470 
471| Old string (migration source)  | New string         |
472| ------------------------------ | ------------------ |
473| `claude-opus-4-6`              | `claude-opus-4-7`  |
474| `claude-opus-4-5`              | `claude-opus-4-7`  |
475| `claude-opus-4-1`              | `claude-opus-4-7`  |
476| `claude-opus-4-0`              | `claude-opus-4-7`  |
477| `claude-sonnet-4-5`            | `claude-sonnet-4-6`|
478| `claude-sonnet-4-0`            | `claude-sonnet-4-6`|
479 
480Older aliases (`claude-opus-4-5`, `claude-sonnet-4-5`, `claude-opus-4-1`, etc.) are still active and can be pinned if you need time before upgrading — see `shared/models.md` for the full legacy list.
481 
482---
483 
484## Migration Checklist
485 
486Every item is tagged: **`[BLOCKS]`** items cause a 400 error, infinite loop, silent timeout, or wrong tool selection if missed — apply these as code edits, not as suggestions. **`[TUNE]`** items are quality/cost adjustments.
487 
488For each file that calls `messages.create()` / equivalent SDK method:
489 
490- [ ] **[BLOCKS]** Update the `model=` string to the new alias
491- [ ] **[BLOCKS]** Replace `budget_tokens` with `thinking={"type": "adaptive"}` (deprecated on Opus 4.6 / Sonnet 4.6)
492- [ ] **[BLOCKS]** Move `format` from top-level `output_format` into `output_config.format`
493- [ ] **[BLOCKS]** Remove any assistant-turn prefills if targeting Opus 4.6 or Sonnet 4.6 (see the prefill replacement table)
494- [ ] **[BLOCKS]** Switch to streaming if `max_tokens > ~16000` (otherwise SDK HTTP timeout)
495- [ ] **[TUNE]** Set `output_config={"effort": "..."}` explicitly — especially when moving Sonnet 4.5 → Sonnet 4.6 (4.6 defaults to `high`)
496- [ ] **[TUNE]** Remove GA beta headers: `effort-2025-11-24`, `fine-grained-tool-streaming-2025-05-14`, `token-efficient-tools-2025-02-19`, `output-128k-2025-02-19`; remove `interleaved-thinking-2025-05-14` once on adaptive thinking
497- [ ] **[TUNE]** Switch `client.beta.messages.create(...)` → `client.messages.create(...)` once all betas are removed
498- [ ] **[TUNE]** Review system prompt for aggressive tool language (`CRITICAL:`, `MUST`, `If in doubt`) and dial it back
499 
500**Extra items when coming from 3.x / 4.0 / 4.1:**
501- [ ] **[BLOCKS]** Remove either `temperature` or `top_p` (passing both 400s on Claude 4+)
502- [ ] **[BLOCKS]** Update text-editor tool `type` to `text_editor_20250728`
503- [ ] **[BLOCKS]** Update text-editor tool `name` to `str_replace_based_edit_tool` — **changing only the `type` and keeping `name: "str_replace_editor"` returns a 400**
504- [ ] **[BLOCKS]** Update code-execution tool to `code_execution_20250825`
505- [ ] **[BLOCKS]** Delete any `undo_edit` command call sites
506- [ ] **[TUNE]** Add handling for `stop_reason == "refusal"`
507- [ ] **[TUNE]** Add handling for `stop_reason == "model_context_window_exceeded"` (4.5+)
508- [ ] **[TUNE]** Verify tool-param string matching tolerates trailing newlines (preserved on 4.5+)
509- [ ] **[TUNE]** If moving to Haiku 4.5: review rate-limit tier (separate pool from Haiku 3.x)
510 
511**Verification:**
512- [ ] Run one test request and inspect `response.stop_reason`, `response.usage`, and whether tool-use / thinking behavior matches expectations
513 
514For cached prompts: the render order and hash inputs did not change, so existing `cache_control` breakpoints keep working. However, **changing the model string invalidates the existing cache** — the first request on the new model will write the cache fresh.
515 
516---
517 
518## Migrating to Opus 4.7
519 
520> **Model ID `claude-opus-4-7` is authoritative as written here.** When the user asks to migrate to Opus 4.7, write `model="claude-opus-4-7"` exactly. Do **not** WebFetch to verify — this guide is the source of truth for migration target IDs. The corresponding entry exists in `shared/models.md`.
521 
522Claude Opus 4.7 is our most capable generally available model to date. It is highly autonomous and performs exceptionally well on long-horizon agentic work, knowledge work, vision tasks, and memory tasks. This section summarizes everything new at launch. It is layered on top of the 4.6 migration above — if the caller is jumping from Opus 4.5 or older, apply the 4.6 changes first, then apply this section.
523 
524**TL;DR for someone already on Opus 4.6:** update the model ID to `claude-opus-4-7`, strip any remaining `budget_tokens` and sampling parameters (both 400 on Opus 4.7), give `max_tokens` extra headroom and re-baseline with `count_tokens()` against the new model, opt back into `thinking.display: "summarized"` if reasoning is surfaced to users, and re-tune `effort` — it matters more on 4.7 than on any prior Opus.
525 
526### Breaking changes (will 400 on Opus 4.7)
527 
528**Extended thinking removed.**
529 
530`thinking: {type: "enabled", budget_tokens: N}` is no longer supported on Claude Opus 4.7 or later models and returns a 400 error. Switch to adaptive thinking (`thinking: {type: "adaptive"}`) and use the effort parameter to control thinking depth. Adaptive thinking is **off by default** on Claude Opus 4.7: requests with no `thinking` field run without thinking, matching Opus 4.6 behavior. Set `thinking: {type: "adaptive"}` explicitly to enable it.
531 
532```python
533# Before (Opus 4.6)
534client.messages.create(
535    model="claude-opus-4-6",
536    max_tokens=64000,
537    thinking={"type": "enabled", "budget_tokens": 32000},
538    messages=[{"role": "user", "content": "..."}],
539)
540 
541# After (Opus 4.7)
542client.messages.create(
543    model="claude-opus-4-7",
544    max_tokens=64000,
545    thinking={"type": "adaptive"},
546    output_config={"effort": "high"},  # or "max", "xhigh", "medium", "low"
547    messages=[{"role": "user", "content": "..."}],
548)
549```
550 
551If the caller wasn't using extended thinking, no change is required — thinking is off by default, or can be set explicitly with `thinking={"type": "disabled"}`.
552 
553Delete `budget_tokens` plumbing entirely. For the replacement `effort` value, see **Choosing an effort level on Opus 4.7** below — there is no exact 1:1 mapping from `budget_tokens`.
554 
555**Sampling parameters removed.**
556 
557The `temperature`, `top_p`, and `top_k` parameters are no longer accepted on Claude Opus 4.7. Requests that include them return a 400 error. Remove these fields from your request payloads. Prompting is the recommended way to guide model behavior on Claude Opus 4.7. If you were using `temperature = 0` for determinism, note that it never guaranteed identical outputs on prior models.
558 
559```python
560# Before — errors on Opus 4.7
561client.messages.create(temperature=0.7, top_p=0.9, ...)
562 
563# After
564client.messages.create(...)  # no sampling params
565```
566 
567- **If the intent was determinism** — use `effort: "low"` with a tighter prompt.
568- **If the intent was creative variance** — the prompt replacement depends on the use case; **ask the user** how they want variance elicited. If you can't ask, add a use-case-appropriate instruction along the lines of *"choose something off-distribution and interesting"* — e.g. for text generation, *"Vary your phrasing and structure across responses"*; for frontend/design, use the propose-4-directions approach under **Design and frontend coding** below.
569 
570### Choosing an effort level on Opus 4.7
571 
572`budget_tokens` controlled how much to *think*; `effort` controls how much to think *and* act, so there is no exact 1:1 mapping. **Use `xhigh` for best results in coding and agentic use cases, and a minimum of `high` for most intelligence-sensitive use cases.** Experiment with other levels to further tune token usage and intelligence:
573 
574| Level | Use when | Notes |
575| --- | --- | --- |
576| `max` | Intelligence-demanding tasks worth testing at the ceiling | Can deliver gains in some use cases but may show diminishing returns from increased token usage; can be prone to overthinking |
577| `xhigh` | **Most coding and agentic use cases** | The best setting for these; used as the default in Claude Code |
578| `high` | Intelligence-sensitive use cases generally | Balances token usage and intelligence; recommended minimum for most intelligence-sensitive work |
579| `medium` | Cost-sensitive use cases that need to reduce token usage while trading off intelligence | |
580| `low` | Short, scoped tasks and latency-sensitive workloads that are not intelligence-sensitive | |
581 
582### Silent default changes (no error, but behavior differs)
583 
584**Thinking content omitted by default.**
585 
586Thinking blocks still appear in the response stream on Claude Opus 4.7, but their `thinking` field is empty unless you explicitly opt in. This is a silent change from Claude Opus 4.6, where the default was to return summarized thinking text. To restore summarized thinking content on Claude Opus 4.7, set `thinking.display` to `"summarized"`. **The block-field name is unchanged** — it is still `block.thinking` on a `thinking`-type block; do not rename it.
587 
588**Detect this:** any code that reads `block.thinking` (or equivalent) from a `thinking`-type block and renders it in a UI, log, or trace. **The fix is the request parameter, not the response handling** — add `display: "summarized"` to the `thinking` parameter:
589 
590```python
591thinking={"type": "adaptive", "display": "summarized"}  # "display" is new on Opus 4.7; values: "omitted" (default) | "summarized"
592```
593 
594The default is `"omitted"` on Claude Opus 4.7. If thinking content was never surfaced anywhere, no change needed. If your product streams reasoning to users, the new default appears as a long pause before output begins; set `display: "summarized"` to restore visible progress during thinking.
595 
596**Updated token counting.**
597 
598Claude Opus 4.7 and Claude Opus 4.6 count tokens differently. The same input text produces a higher token count on Claude Opus 4.7 than on Claude Opus 4.6, and `/v1/messages/count_tokens` will return a different number of tokens for Claude Opus 4.7 than it did for Claude Opus 4.6. The token efficiency of Claude Opus 4.7 can vary by workload shape. Prompting interventions, `task_budget`, and `effort` can help control costs and ensure appropriate token usage. Keep in mind that these controls may trade off model intelligence. **Update your `max_tokens` parameters to give additional headroom, including compaction triggers.** Claude Opus 4.7 provides a 1M context window at standard API pricing with no long-context premium.
599 
600What else to check:
601 
602- Client-side token estimators (tiktoken-style approximations) calibrated against 4.6
603- Cost calculators that multiply tokens by a fixed per-token rate
604- Rate-limit retry thresholds keyed to measured token counts
605 
606Re-baseline by re-running `client.messages.count_tokens()` against `claude-opus-4-7` on a representative sample of the caller's prompts. Do not apply a blanket multiplier. For cost-sensitive workloads, consider reducing `effort` by one level (e.g. `high` → `medium`). For agentic loops, consider adopting Task Budgets (below).
607 
608### New feature: Task Budgets (beta)
609 
610Opus 4.7 introduces **task budgets** — tell Claude how many tokens it has for a full agentic loop (thinking + tool calls + final output). The model sees a running countdown and uses it to prioritize work and wrap up gracefully as the budget is consumed.
611 
612This is a **suggestion the model is aware of**, not a hard cap. It is distinct from `max_tokens`, which remains the enforced per-response limit and is *not* surfaced to the model. Use `task_budget` when you want the model to self-moderate; use `max_tokens` as a hard ceiling to cap usage.
613 
614Requires beta header `task-budgets-2026-03-13`:
615 
616```python
617client.beta.messages.create(
618    betas=["task-budgets-2026-03-13"],
619    model="claude-opus-4-7",
620    max_tokens=64000,
621    thinking={"type": "adaptive"},
622    output_config={
623        "effort": "high",
624        "task_budget": {"type": "tokens", "total": 128000},
625    },
626    messages=[...],
627)
628```
629 
630Set a generous budget for open-ended agentic tasks and tighten it for latency-sensitive ones. **Minimum `task_budget.total` is 20,000 tokens.** If the budget is too restrictive for the task, the model may complete it less thoroughly, referencing its budget as the constraint. **Do not add `task_budget` during a migration unless you are sure the budget value is right** — if you can run the workload and measure, do so; otherwise ask the user for the value rather than guessing. This is the primary lever for offsetting the token-counting shift on agentic workloads.
631 
632### Capability improvements
633 
634**High-resolution vision.** Opus 4.7 is the first Claude model with high-resolution image support. Maximum image resolution is **2576 pixels on the long edge** (up from 1568px on Opus 4.6 and prior). This unlocks gains on vision-heavy workloads, especially computer use and screenshot/artifact/document understanding. Coordinates returned by the model now map 1:1 to actual image pixels, so no scale-factor math is needed.
635 
636High-res support is **automatic on Opus 4.7** — no beta header, no client-side opt-in required. The model accepts larger inputs and returns pixel-accurate coordinates out of the box.
637 
638**Token cost.** Full-resolution images on Opus 4.7 can use up to ~3× more image tokens than on prior models (up to ~4784 tokens per image, vs. the previous ~1,600-token cap). If the extra fidelity isn't needed, downsample client-side before sending to control cost — but **do not add downsampling by default during a migration**. If you're not sure whether the pipeline needs the fidelity, ask the user rather than guessing. Use `count_tokens()` on representative images on Opus 4.7 to re-baseline before reacting to any measured cost shift.
639 
640Beyond resolution, Opus 4.7 also improves on low-level perception (pointing, measuring, counting) and natural-image bounding-box localization and detection.
641 
642**Knowledge work.** Meaningful gains on tasks where the model visually verifies its own output — `.docx` redlining, `.pptx` editing, and programmatic chart/figure analysis (e.g. pixel-level data transcription via image-processing libraries). If prompts have scaffolding like *"double-check the slide layout before returning"*, try removing it and re-baselining.
643 
644**Memory.** Opus 4.7 is better at writing and using file-system-based memory. If an agent maintains a scratchpad, notes file, or structured memory store across turns, that agent should improve at jotting down notes to itself and leveraging its notes in future tasks.
645 
646**User-facing progress updates.** Opus 4.7 provides more regular, higher-quality interim updates during long agentic traces. If the system prompt has scaffolding like *"After every 3 tool calls, summarize progress"*, try removing it to avoid excessive user-facing text. If the length or contents of Opus 4.7's updates are not well-calibrated to your use case, explicitly describe what these updates should look like in the prompt and provide examples.
647 
648### Real-time cybersecurity safeguards
649 
650Requests that involve prohibited or high-risk topics may lead to refusals.
651 
652### Fast Mode: not available on Opus 4.7
653 
654Opus 4.7 does not have a Fast Mode variant. **Opus 4.6 Fast remains supported**. Only surface this if the caller's code actually uses a Fast Mode model string (e.g. `claude-opus-4-6-fast`); if the word "fast" does not appear in the code, say nothing about Fast Mode.
655 
656When you see `model="claude-opus-4-6-fast"` (or similar), **the migration edit is**:
657 
658```python
659# Opus 4.7 has no Fast Mode — keeping on 4.6 Fast (caller's choice to switch to standard Opus 4.7).
660model="claude-opus-4-6-fast",
661```
662 
663That is: leave the model string **unchanged**, add the comment above it, and tell the user their two options — (a) stay on Opus 4.6 Fast, which remains supported, or (b) move latency-tolerant traffic to standard Opus 4.7 for the intelligence gain. Do **not** rewrite the model string to `claude-opus-4-7` yourself; that silently trades latency for intelligence, which is the caller's decision.
664 
665### Behavioral shifts (prompt-tunable)
666 
667These don't break anything, but prompts tuned for Opus 4.6 may land differently. Opus 4.7 is more steerable than 4.6, so small prompt nudges usually close the gap.
668 
669**More literal instruction following.** Claude Opus 4.7 interprets prompts more literally and explicitly than Claude Opus 4.6, particularly at lower effort levels. It will not silently generalize an instruction from one item to another, and it will not infer requests you didn't make. The upside of this literalism is precision and less thrash. It generally performs better for API use cases with carefully tuned prompts, structured extraction, and pipelines where you want predictable behavior. A prompt and harness review may be especially helpful for migration to Claude Opus 4.7.
670 
671**Verbosity calibrates to task complexity.** Opus 4.7 scales response length to how complex it judges the task to be, rather than defaulting to a fixed verbosity — shorter answers on simple lookups, much longer on open-ended analysis. If the product depends on a particular length or style, tune the prompt explicitly. To reduce verbosity:
672 
673> *"Provide concise, focused responses. Skip non-essential context, and keep examples minimal."*
674 
675If you see specific kinds of over-verbosity (e.g. over-explaining), add instructions targeting those. Positive examples showing the desired level of concision tend to be more effective than negative examples or instructions telling the model what not to do. Do **not** assume existing "be concise" instructions should be removed — test first.
676 
677**Tone and writing style.** Opus 4.7 is more direct and opinionated, with less validation-forward phrasing and fewer emoji than Opus 4.6's warmer style. As with any new model, prose style on long-form writing may shift. If the product relies on a specific voice, re-evaluate style prompts against the new baseline. If a warmer or more conversational voice is wanted, specify it:
678 
679> *"Use a warm, collaborative tone. Acknowledge the user's framing before answering."*
680 
681**`effort` matters more than on any prior Opus.** Opus 4.7 respects `effort` levels more strictly, especially at the low end. At `low` and `medium` it scopes work to what was asked rather than going above and beyond — good for latency and cost, but on moderate tasks at `low` there is some risk of under-thinking.
682 
683- If shallow reasoning shows up on complex problems, raise `effort` to `high` or `xhigh` rather than prompting around it.
684- If `effort` must stay `low` for latency, add targeted guidance: *"This task involves multi-step reasoning. Think carefully through the problem before responding."*
685- **At `xhigh` or `max`, set a large `max_tokens`** so the model has room to think and act across tool calls and subagents. Start at 64K and tune from there. (`xhigh` is a new effort level on Opus 4.7, between `high` and `max`.)
686 
687Adaptive-thinking triggering is also steerable. If the model thinks more often than wanted — which can happen with large or complex system prompts — add: *"Thinking adds latency and should only be used when it will meaningfully improve answer quality — typically for problems that require multi-step reasoning. When in doubt, respond directly."*
688 
689**Uses tools less often by default.** Opus 4.7 tends to use tools less often than 4.6 and to use reasoning more. This produces better results in most cases, but for products that rely on tools (search/retrieval, function-calling, computer-use steps), it can drop tool-use rate. Two levers:
690 
691- **Raise `effort`** — `high` or `xhigh` show substantially more tool usage in agentic search and coding, and are especially useful for knowledge work.
692- **Prompt for it** — be explicit in tool descriptions or the system prompt about when and how to use the tool, and encourage the model to err on the side of using it more often:
693 
694> *"When the answer depends on information not present in the conversation, you MUST call the `search` tool before answering — do not answer from prior knowledge."*
695 
696**Fewer subagents by default.** Opus 4.7 tends to spawn fewer subagents than 4.6. This is steerable — give explicit guidance on when delegation is desirable. For a coding agent, for example:
697 
698> *"Do NOT spawn a subagent for work you can complete directly in a single response (e.g. refactoring a function you can already see). Spawn multiple subagents in the same turn when fanning out across items or reading multiple files."*
699 
700**Design and frontend coding.** Opus 4.7 has stronger design instincts than 4.6, with a consistent default house style: warm cream/off-white backgrounds (around `#F4F1EA`), serif display type (Georgia, Fraunces, Playfair), italic word-accents, and a terracotta/amber accent. This reads well for editorial, hospitality, and portfolio briefs, but will feel off for dashboards, dev tools, fintech, healthcare, or enterprise apps — and it appears in slide decks as well as web UIs.
701 
702The default is persistent. Generic instructions ("don't use cream," "make it clean and minimal") tend to shift the model to a different fixed palette rather than producing variety. Two approaches work reliably:
703 
7041. **Specify a concrete alternative.** The model follows explicit specs precisely — give exact hex values, typefaces, and layout constraints.
7052. **Have the model propose options before building.** This breaks the default and gives the user control:
706 
707   > *"Before building, propose 4 distinct visual directions tailored to this brief (each as: bg hex / accent hex / typeface — one-line rationale). Ask the user to pick one, then implement only that direction."*
708 
709If the caller previously relied on `temperature` for design variety, use approach (2) — it produces meaningfully different directions across runs.
710 
711Opus 4.7 also requires less frontend-design prompting than previous models to avoid generic "AI slop" aesthetics. Where earlier models needed a lengthy anti-slop snippet, Opus 4.7 generates distinctive, creative frontends with a much shorter nudge. This snippet works well alongside the variety approaches above:
712 
713> *"NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white or dark backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character. Use unique fonts, cohesive colors and themes, and animations for effects and micro-interactions."*
714 
715**Interactive coding products.** Opus 4.7's token usage and behavior can differ between autonomous, asynchronous coding agents with a single user turn and interactive, synchronous coding agents with multiple user turns. Specifically, it tends to use more tokens in interactive settings, primarily because it reasons more after user turns. This can improve long-horizon coherence, instruction following, and coding capabilities in long interactive coding sessions, but also comes with more token usage. To maximize both performance and token efficiency in coding products, use `effort: "xhigh"` or `"high"`, add autonomous features (like an auto mode), and reduce the number of human interactions required from users.
716 
717When limiting required user interactions, specify the task, intent, and relevant constraints upfront in the first human turn. Well-specified, clear, and accurate task descriptions upfront help maximize autonomy and intelligence while minimizing extra token usage after user turns — because Opus 4.7 is more autonomous than prior models, this usage pattern helps to maximize performance. In contrast, ambiguous or underspecified prompts conveyed progressively over multiple user turns tend to reduce token efficiency and sometimes performance.
718 
719**Code review.** Opus 4.7 is meaningfully better at finding bugs than prior models, with both higher recall and precision. However, if a code-review harness was tuned for an earlier model, it may initially show *lower* recall — this is likely a harness effect, not a capability regression. When a review prompt says "only report high-severity issues," "be conservative," or "don't nitpick," Opus 4.7 follows that instruction more faithfully than earlier models did: it investigates just as thoroughly, identifies the bugs, and then declines to report findings it judges to be below the stated bar. Precision rises, but measured recall can fall even though underlying bug-finding has improved.
720 
721Recommended prompt language:
722 
723> *"Report every issue you find, including ones you are uncertain about or consider low-severity. Do not filter for importance or confidence at this stage — a separate verification step will do that. Your goal here is coverage: it is better to surface a finding that later gets filtered out than to silently drop a bug. For each finding, include your confidence level and an estimated severity so a downstream filter can rank them."*
724 
725This can be used without an actual second step, but moving confidence filtering out of the finding step often helps. If the harness has a separate verification/dedup/ranking stage, tell the model explicitly that its job at the finding stage is coverage, not filtering. If single-pass self-filtering is wanted, be concrete about the bar rather than using qualitative terms like "important" — e.g. *"report any bugs that could cause incorrect behavior, a test failure, or a misleading result; only omit nits like pure style or naming preferences."* Iterate on prompts against a subset of evals to validate recall or F1 gains.
726 
727**Computer use.** Computer use works across resolutions up to the new 2576px / 3.75MP maximum. Sending images at **1080p** provides a good balance of performance and cost. For particularly cost-sensitive workloads, **720p** or **1366×768** are lower-cost options with strong performance. Test to find the ideal settings for the use case; experimenting with `effort` can also help tune behavior.
728 
729---
730 
731## Opus 4.7 Migration Checklist
732 
733Every item is tagged: **`[BLOCKS]`** items cause a 400 error, infinite loop, silent truncation, or empty output if missed — apply these as code edits, not as suggestions. **`[TUNE]`** items are quality/cost adjustments — surface them to the user as recommendations.
734 
735`[BLOCKS]` items prefixed with **"If…"** or **"At…"** are conditional. Before working through the list, **scan the file** for the conditions: does it surface thinking text to a UI/log? Does it set `output_config.effort` to `"x-high"` or `"max"`? Is it a security workload? Is it a multi-turn agentic loop? Apply only the items whose condition matches.
736 
737- [ ] **[BLOCKS]** Replace `thinking: {type: "enabled", budget_tokens: N}` with `thinking: {type: "adaptive"}` + `output_config.effort`; delete `budget_tokens` plumbing entirely
738- [ ] **[BLOCKS]** Strip `temperature`, `top_p`, `top_k` from request construction
739- [ ] **[BLOCKS]** If thinking content is surfaced to users or stored in logs: add `thinking.display: "summarized"` (otherwise the rendered text is empty)
740- [ ] **[BLOCKS]** At `output_config.effort` of `xhigh` or `max`: set `max_tokens` ≥ 64000 (otherwise output truncates mid-thought)
741- [ ] **[TUNE]** Give `max_tokens` and compaction triggers extra headroom; re-run `count_tokens()` against `claude-opus-4-7` on representative prompts to re-baseline (no blanket multiplier)
742- [ ] **[TUNE]** Re-baseline cost and rate-limit dashboards *before* reacting to measured shifts
743- [ ] **[TUNE]** Re-evaluate `effort` per route — use `xhigh` for coding/agentic and a minimum of `high` for most intelligence-sensitive work; it matters more on 4.7 than any prior Opus
744- [ ] **[TUNE]** Multi-turn agentic loops: adopt the API-native Task Budgets (`output_config.task_budget`, beta `task-budgets-2026-03-13`, minimum 20k tokens) — this is for capping *cumulative* spend across a loop; per-turn depth is `effort`
745- [ ] **[TUNE]** Check for ambiguous or underspecified instructions that relied on 4.6 generalizing intent, and update them to be clearer or more precise — 4.7 follows them literally
746- [ ] **[TUNE]** Tool-use workloads: add explicit when/how-to-use guidance to tool descriptions (4.7 reaches for tools less often)
747- [ ] **[TUNE]** Verbosity: test existing length instructions before changing them — 4.7 calibrates length to task complexity, so tune for the desired output rather than assuming a direction
748- [ ] **[TUNE]** Remove forced-progress-update scaffolding (*"after every N tool calls…"*)
749- [ ] **[TUNE]** Remove knowledge-work verification scaffolding (*"double-check the slide layout…"*) and re-baseline
750- [ ] **[TUNE]** Add tone instruction if a warmer / more conversational voice is needed; re-evaluate style prompts on writing-heavy routes
751- [ ] **[TUNE]** Subagent tool present: add explicit spawn / don't-spawn guidance
752- [ ] **[TUNE]** Frontend/design output: specify a concrete palette/typeface, or have the model propose 4 visual directions before building (the default cream/serif house style is persistent)
753- [ ] **[TUNE]** Interactive coding products: use `effort: "xhigh"` or `"high"`, add autonomous features (e.g. an auto mode) to reduce human interactions, and specify task/intent/constraints upfront in the first turn
754- [ ] **[TUNE]** Code-review harnesses: remove or loosen "only report high-severity" / "be conservative" filters and have the model report every finding with confidence + severity; move filtering to a downstream step (4.7 follows severity filters more literally, which can depress measured recall)
755- [ ] **[TUNE]** Vision-heavy pipelines (screenshots, charts, document understanding): leave images at native resolution up to 2576px long edge for the accuracy gain; remove any scale-factor math from coordinate handling (coords are now 1:1 with pixels). No beta header / opt-in needed — high-res is automatic on Opus 4.7.
756- [ ] **[TUNE]** Computer-use pipelines: send screenshots at 1080p for a good performance/cost balance (720p or 1366×768 for cost-sensitive workloads); experiment with `effort` to tune behavior
757- [ ] **[TUNE]** Cost-sensitive image pipelines: full-res images on 4.7 use up to ~4784 tokens vs ~1,600 on prior models (~3×). Downsampling client-side before upload avoids the increase, but **do not downsample by default** — if you're unsure whether fidelity is needed, ask the user. Re-baseline with `count_tokens()` on representative images before reacting to cost shifts.
758 
759---
760 
761## Verify the Migration
762 
763After updating, spot-check that the new model is actually being used. Replace `YOUR_TARGET_MODEL` with the model string you migrated to (e.g. `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`) and keep the assertion prefix in sync:
764 
765```python
766YOUR_TARGET_MODEL = "claude-opus-4-7"  # or "claude-opus-4-6", "claude-sonnet-4-6", "claude-haiku-4-5"
767response = client.messages.create(model=YOUR_TARGET_MODEL, max_tokens=64, messages=[...])
768assert response.model.startswith(YOUR_TARGET_MODEL), response.model
769```
770 
771For rate-limit headroom changes, pricing, or capability deltas (vision, structured outputs, effort support), query the Models API:
772 
773```python
774m = client.models.retrieve(YOUR_TARGET_MODEL)
775m.max_input_tokens, m.max_tokens
776m.capabilities["effort"]["max"]["supported"]
777```
778 
779See `shared/models.md` for the full capability lookup pattern.
780

Building LLM-Powered Applications with Claude

shared/model-migration.md

Preparing the source view

Building LLM-Powered Applications with Claude

shared/model-migration.md