Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build LLM-powered apps with the Anthropic Claude API or SDK across Python, TypeScript, Java, Go, Ruby, C#, and PHP.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
python/claude-api/README.md
1# Claude API — Python23## Installation45```bash6pip install anthropic7```89## Client Initialization1011```python12import anthropic1314# Default — resolves credentials from the environment:15# ANTHROPIC_API_KEY, or ANTHROPIC_AUTH_TOKEN, or an `ant auth login` profile.16# Prefer this for local dev; don't hardcode a key.17client = anthropic.Anthropic()1819# Explicit API key (only when you must inject a specific key)20client = anthropic.Anthropic(api_key="your-api-key")2122# Async client23async_client = anthropic.AsyncAnthropic()24```2526---2728## Client Configuration2930### Per-request overrides3132Use `with_options()` to override client settings for a single call without mutating the client:3334```python35client.with_options(timeout=5.0, max_retries=5).messages.create(36model="claude-opus-4-8",37max_tokens=1024,38messages=[{"role": "user", "content": "Hello"}],39)40```4142### Timeouts4344Default request timeout is 10 minutes. Pass a float (seconds) or an `httpx.Timeout` for granular control. On timeout the SDK raises `anthropic.APITimeoutError` (and retries per `max_retries`).4546```python47import httpx4849client = anthropic.Anthropic(timeout=20.0)50client = anthropic.Anthropic(51timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),52)53```5455### Retries5657The SDK auto-retries connection errors, 408, 409, 429, and ≥500 with exponential backoff (default 2 retries). Set `max_retries` on the client or via `with_options()`; `max_retries=0` disables.5859### Async performance (aiohttp backend)6061For high-concurrency async workloads, install `anthropic[aiohttp]` and pass `DefaultAioHttpClient` instead of the default httpx backend:6263```python64from anthropic import AsyncAnthropic, DefaultAioHttpClient6566async with AsyncAnthropic(http_client=DefaultAioHttpClient()) as client:67...68```6970### Custom HTTP client (proxy, base URL)7172Use `DefaultHttpxClient` / `DefaultAsyncHttpxClient` — not raw `httpx.Client` — so the SDK's default timeouts and connection limits are preserved:7374```python75from anthropic import Anthropic, DefaultHttpxClient7677client = Anthropic(78base_url="http://my.test.server.example.com:8083", # or ANTHROPIC_BASE_URL env var79http_client=DefaultHttpxClient(proxy="http://my.test.proxy.example.com"),80)81```8283### Logging8485Set `ANTHROPIC_LOG=debug` (or `info`) to enable SDK logging via the standard `logging` module.8687---8889## Basic Message Request9091```python92response = client.messages.create(93model="claude-opus-4-8",94max_tokens=16000,95messages=[96{"role": "user", "content": "What is the capital of France?"}97]98)99# response.content is a list of content block objects (TextBlock, ThinkingBlock,100# ToolUseBlock, ...). Check .type before accessing .text.101for block in response.content:102if block.type == "text":103print(block.text)104```105106---107108## System Prompts109110```python111response = client.messages.create(112model="claude-opus-4-8",113max_tokens=16000,114system="You are a helpful coding assistant. Always provide examples in Python.",115messages=[{"role": "user", "content": "How do I read a JSON file?"}]116)117```118119### Mid-conversation system messages (beta, model-gated)120121For operator instructions that arrive mid-conversation (mode switches, injected state), append `{"role": "system", ...}` to `messages` instead of editing top-level `system` — this preserves the cached prefix and carries operator authority. Must follow a user message; cannot be `messages[0]`. Unsupported models return a 400 (`role 'system' is not supported on this model`). See `shared/prompt-caching.md` for when to use this vs. top-level `system`.122123```python124response = client.messages.create(125model=MODEL_ID, # must support mid-conversation system messages126max_tokens=16000,127system=[{"type": "text", "text": STABLE_SYSTEM, "cache_control": {"type": "ephemeral"}}],128messages=history + [129{"role": "user", "content": user_message},130{"role": "system", "content": "Terse mode enabled — keep responses under 40 words."},131],132extra_headers={"anthropic-beta": "mid-conversation-system-2026-04-07"},133)134```135136---137138## Vision (Images)139140### Base64141142```python143import base64144145with open("image.png", "rb") as f:146image_data = base64.standard_b64encode(f.read()).decode("utf-8")147148response = client.messages.create(149model="claude-opus-4-8",150max_tokens=16000,151messages=[{152"role": "user",153"content": [154{155"type": "image",156"source": {157"type": "base64",158"media_type": "image/png",159"data": image_data160}161},162{"type": "text", "text": "What's in this image?"}163]164}]165)166```167168### URL169170```python171response = client.messages.create(172model="claude-opus-4-8",173max_tokens=16000,174messages=[{175"role": "user",176"content": [177{178"type": "image",179"source": {180"type": "url",181"url": "https://example.com/image.png"182}183},184{"type": "text", "text": "Describe this image"}185]186}]187)188```189190---191192## Prompt Caching193194Cache large context to reduce costs (up to 90% savings). **Caching is a prefix match** — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read `shared/prompt-caching.md`.195196### Automatic Caching (Recommended)197198Use top-level `cache_control` to automatically cache the last cacheable block in the request — no need to annotate individual content blocks:199200```python201response = client.messages.create(202model="claude-opus-4-8",203max_tokens=16000,204cache_control={"type": "ephemeral"}, # auto-caches the last cacheable block205system="You are an expert on this large document...",206messages=[{"role": "user", "content": "Summarize the key points"}]207)208```209210### Manual Cache Control211212For fine-grained control, add `cache_control` to specific content blocks:213214```python215response = client.messages.create(216model="claude-opus-4-8",217max_tokens=16000,218system=[{219"type": "text",220"text": "You are an expert on this large document...",221"cache_control": {"type": "ephemeral"} # default TTL is 5 minutes222}],223messages=[{"role": "user", "content": "Summarize the key points"}]224)225226# With explicit TTL (time-to-live)227response = client.messages.create(228model="claude-opus-4-8",229max_tokens=16000,230system=[{231"type": "text",232"text": "You are an expert on this large document...",233"cache_control": {"type": "ephemeral", "ttl": "1h"} # 1 hour TTL234}],235messages=[{"role": "user", "content": "Summarize the key points"}]236)237```238239### Verifying Cache Hits240241```python242print(response.usage.cache_creation_input_tokens) # tokens written to cache (~1.25x cost)243print(response.usage.cache_read_input_tokens) # tokens served from cache (~0.1x cost)244print(response.usage.input_tokens) # uncached tokens (full cost)245```246247If `cache_read_input_tokens` is zero across repeated identical-prefix requests, a silent invalidator is at work — `datetime.now()` or a UUID in the system prompt, unsorted `json.dumps()`, or a varying tool set. See `shared/prompt-caching.md` for the full audit table.248249---250251## Extended Thinking252253> **Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Fable 5, Opus 4.8, and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.254> **Older models:** Use `thinking: {type: "enabled", budget_tokens: N}` (must be < `max_tokens`, min 1024).255256```python257# Fable 5 / Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)258response = client.messages.create(259model="claude-opus-4-8",260max_tokens=16000,261thinking={"type": "adaptive"},262output_config={"effort": "high"}, # low | medium | high | max263messages=[{"role": "user", "content": "Solve this step by step..."}]264)265266# Access thinking and response267for block in response.content:268if block.type == "thinking":269print(f"Thinking: {block.thinking}")270elif block.type == "text":271print(f"Response: {block.text}")272```273274---275276## Error Handling277278```python279import anthropic280281try:282response = client.messages.create(...)283except anthropic.BadRequestError as e:284print(f"Bad request: {e.message}")285except anthropic.AuthenticationError:286print("Invalid API key")287except anthropic.PermissionDeniedError:288print("API key lacks required permissions")289except anthropic.NotFoundError:290print("Invalid model or endpoint")291except anthropic.RateLimitError as e:292retry_after = int(e.response.headers.get("retry-after", "60"))293print(f"Rate limited. Retry after {retry_after}s.")294except anthropic.APIStatusError as e:295if e.status_code >= 500:296print(f"Server error ({e.status_code}). Retry later.")297else:298print(f"API error: {e.message}")299except anthropic.APIConnectionError:300print("Network error. Check internet connection.")301```302303---304305## Response Helpers306307Every response object exposes `_request_id` (populated from the `request-id` header) — log it when reporting failures to Anthropic. Despite the underscore prefix, this property is public.308309```python310message = client.messages.create(...)311print(message._request_id) # req_018EeWyXxfu5pfWkrYcMdjWG312print(message.to_json()) # serialize the Pydantic model313print(message.to_dict()) # plain dict314```315316To access raw headers or other response metadata, use `.with_raw_response`:317318```python319raw = client.messages.with_raw_response.create(320model="claude-opus-4-8",321max_tokens=1024,322messages=[{"role": "user", "content": "Hello"}],323)324print(raw.headers.get("request-id"))325message = raw.parse() # the Message object messages.create() would have returned326```327328---329330## Multi-Turn Conversations331332The API is stateless — send the full conversation history each time.333334```python335class ConversationManager:336"""Manage multi-turn conversations with the Claude API."""337338def __init__(self, client: anthropic.Anthropic, model: str, system: str = None):339self.client = client340self.model = model341self.system = system342self.messages = []343344def send(self, user_message: str, **kwargs) -> str:345"""Send a message and get a response."""346self.messages.append({"role": "user", "content": user_message})347348response = self.client.messages.create(349model=self.model,350max_tokens=kwargs.get("max_tokens", 16000),351system=self.system,352messages=self.messages,353**kwargs354)355356assistant_message = next(357(b.text for b in response.content if b.type == "text"), ""358)359self.messages.append({"role": "assistant", "content": assistant_message})360361return assistant_message362363# Usage364conversation = ConversationManager(365client=anthropic.Anthropic(),366model="claude-opus-4-8",367system="You are a helpful assistant."368)369370response1 = conversation.send("My name is Alice.")371response2 = conversation.send("What's my name?") # Claude remembers "Alice"372```373374**Rules:**375376- Consecutive same-role messages are allowed — the API combines them into a single turn377- First message must be `user`378- `role: "system"` messages are allowed mid-conversation under the `mid-conversation-system-2026-04-07` beta on supporting models — see § Mid-conversation system messages above379380---381382### Compaction (long conversations)383384> **Beta, Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.385386```python387import anthropic388389client = anthropic.Anthropic()390messages = []391392def chat(user_message: str) -> str:393messages.append({"role": "user", "content": user_message})394395response = client.beta.messages.create(396betas=["compact-2026-01-12"],397model="claude-opus-4-8",398max_tokens=16000,399messages=messages,400context_management={401"edits": [{"type": "compact_20260112"}]402}403)404405# Append full content — compaction blocks must be preserved406messages.append({"role": "assistant", "content": response.content})407408return next(block.text for block in response.content if block.type == "text")409410# Compaction triggers automatically when context grows large411print(chat("Help me build a Python web scraper"))412print(chat("Add support for JavaScript-rendered pages"))413print(chat("Now add rate limiting and error handling"))414```415416---417418## Stop Reasons419420The `stop_reason` field in the response indicates why the model stopped generating:421422| Value | Meaning |423|-------|---------|424| `end_turn` | Claude finished its response naturally |425| `max_tokens` | Hit the `max_tokens` limit — increase it or use streaming |426| `stop_sequence` | Hit a custom stop sequence |427| `tool_use` | Claude wants to call a tool — execute it and continue |428| `pause_turn` | Model paused and can be resumed (agentic flows) |429| `refusal` | Claude refused for safety reasons — check `stop_details` |430431### Structured Stop Details432433When `stop_reason` is `"refusal"`, the response includes a `stop_details` object with structured information about the refusal:434435```python436if response.stop_reason == "refusal" and response.stop_details:437print(f"Category: {response.stop_details.category}") # "cyber" | "bio" | None438print(f"Explanation: {response.stop_details.explanation}")439```440441---442443## Cost Optimization Strategies444445### 1. Use Prompt Caching for Repeated Context446447```python448# Automatic caching (simplest — caches the last cacheable block)449response = client.messages.create(450model="claude-opus-4-8",451max_tokens=16000,452cache_control={"type": "ephemeral"},453system=large_document_text, # e.g., 50KB of context454messages=[{"role": "user", "content": "Summarize the key points"}]455)456457# First request: full cost458# Subsequent requests: ~90% cheaper for cached portion459```460461### 2. Choose the Right Model462463```python464# Default to Opus for most tasks465response = client.messages.create(466model="claude-opus-4-8", # $5.00/$25.00 per 1M tokens467max_tokens=16000,468messages=[{"role": "user", "content": "Explain quantum computing"}]469)470471# Use Sonnet for high-volume production workloads472standard_response = client.messages.create(473model="claude-sonnet-4-6", # $3.00/$15.00 per 1M tokens474max_tokens=16000,475messages=[{"role": "user", "content": "Summarize this document"}]476)477478# Use Haiku only for simple, speed-critical tasks479simple_response = client.messages.create(480model="claude-haiku-4-5", # $1.00/$5.00 per 1M tokens481max_tokens=256,482messages=[{"role": "user", "content": "Classify this as positive or negative"}]483)484```485486### 3. Use Token Counting Before Requests487488```python489count_response = client.messages.count_tokens(490model="claude-opus-4-8",491messages=messages,492system=system493)494495estimated_input_cost = count_response.input_tokens * 0.000005 # $5/1M tokens496print(f"Estimated input cost: ${estimated_input_cost:.4f}")497```498499---500501## Retry with Exponential Backoff502503> **Note:** The Anthropic SDK automatically retries rate limit (429) and server errors (5xx) with exponential backoff. You can configure this with `max_retries` (default: 2). Only implement custom retry logic if you need behavior beyond what the SDK provides.504505```python506import time507import random508import anthropic509510def call_with_retry(511client: anthropic.Anthropic,512max_retries: int = 5,513base_delay: float = 1.0,514max_delay: float = 60.0,515**kwargs516):517"""Call the API with exponential backoff retry."""518last_exception = None519520for attempt in range(max_retries):521try:522return client.messages.create(**kwargs)523except anthropic.RateLimitError as e:524last_exception = e525except anthropic.APIStatusError as e:526if e.status_code >= 500:527last_exception = e528else:529raise # Client errors (4xx except 429) should not be retried530531delay = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)532print(f"Retry {attempt + 1}/{max_retries} after {delay:.1f}s")533time.sleep(delay)534535raise last_exception536```537