Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build LLM-powered apps with the Anthropic Claude API or SDK across Python, TypeScript, Java, Go, Ruby, C#, and PHP.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
python/claude-api/README.md
1# Claude API — Python23## Installation45```bash6pip install anthropic7```89## Client Initialization1011```python12import anthropic1314# Default (uses ANTHROPIC_API_KEY env var)15client = anthropic.Anthropic()1617# Explicit API key18client = anthropic.Anthropic(api_key="your-api-key")1920# Async client21async_client = anthropic.AsyncAnthropic()22```2324---2526## Basic Message Request2728```python29response = client.messages.create(30model="claude-opus-4-7",31max_tokens=16000,32messages=[33{"role": "user", "content": "What is the capital of France?"}34]35)36# response.content is a list of content block objects (TextBlock, ThinkingBlock,37# ToolUseBlock, ...). Check .type before accessing .text.38for block in response.content:39if block.type == "text":40print(block.text)41```4243---4445## System Prompts4647```python48response = client.messages.create(49model="claude-opus-4-7",50max_tokens=16000,51system="You are a helpful coding assistant. Always provide examples in Python.",52messages=[{"role": "user", "content": "How do I read a JSON file?"}]53)54```5556---5758## Vision (Images)5960### Base646162```python63import base646465with open("image.png", "rb") as f:66image_data = base64.standard_b64encode(f.read()).decode("utf-8")6768response = client.messages.create(69model="claude-opus-4-7",70max_tokens=16000,71messages=[{72"role": "user",73"content": [74{75"type": "image",76"source": {77"type": "base64",78"media_type": "image/png",79"data": image_data80}81},82{"type": "text", "text": "What's in this image?"}83]84}]85)86```8788### URL8990```python91response = client.messages.create(92model="claude-opus-4-7",93max_tokens=16000,94messages=[{95"role": "user",96"content": [97{98"type": "image",99"source": {100"type": "url",101"url": "https://example.com/image.png"102}103},104{"type": "text", "text": "Describe this image"}105]106}]107)108```109110---111112## Prompt Caching113114Cache large context to reduce costs (up to 90% savings). **Caching is a prefix match** — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read `shared/prompt-caching.md`.115116### Automatic Caching (Recommended)117118Use top-level `cache_control` to automatically cache the last cacheable block in the request — no need to annotate individual content blocks:119120```python121response = client.messages.create(122model="claude-opus-4-7",123max_tokens=16000,124cache_control={"type": "ephemeral"}, # auto-caches the last cacheable block125system="You are an expert on this large document...",126messages=[{"role": "user", "content": "Summarize the key points"}]127)128```129130### Manual Cache Control131132For fine-grained control, add `cache_control` to specific content blocks:133134```python135response = client.messages.create(136model="claude-opus-4-7",137max_tokens=16000,138system=[{139"type": "text",140"text": "You are an expert on this large document...",141"cache_control": {"type": "ephemeral"} # default TTL is 5 minutes142}],143messages=[{"role": "user", "content": "Summarize the key points"}]144)145146# With explicit TTL (time-to-live)147response = client.messages.create(148model="claude-opus-4-7",149max_tokens=16000,150system=[{151"type": "text",152"text": "You are an expert on this large document...",153"cache_control": {"type": "ephemeral", "ttl": "1h"} # 1 hour TTL154}],155messages=[{"role": "user", "content": "Summarize the key points"}]156)157```158159### Verifying Cache Hits160161```python162print(response.usage.cache_creation_input_tokens) # tokens written to cache (~1.25x cost)163print(response.usage.cache_read_input_tokens) # tokens served from cache (~0.1x cost)164print(response.usage.input_tokens) # uncached tokens (full cost)165```166167If `cache_read_input_tokens` is zero across repeated identical-prefix requests, a silent invalidator is at work — `datetime.now()` or a UUID in the system prompt, unsorted `json.dumps()`, or a varying tool set. See `shared/prompt-caching.md` for the full audit table.168169---170171## Extended Thinking172173> **Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Opus 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.174> **Older models:** Use `thinking: {type: "enabled", budget_tokens: N}` (must be < `max_tokens`, min 1024).175176```python177# Opus 4.7 / 4.6: adaptive thinking (recommended)178response = client.messages.create(179model="claude-opus-4-7",180max_tokens=16000,181thinking={"type": "adaptive"},182output_config={"effort": "high"}, # low | medium | high | max183messages=[{"role": "user", "content": "Solve this step by step..."}]184)185186# Access thinking and response187for block in response.content:188if block.type == "thinking":189print(f"Thinking: {block.thinking}")190elif block.type == "text":191print(f"Response: {block.text}")192```193194---195196## Error Handling197198```python199import anthropic200201try:202response = client.messages.create(...)203except anthropic.BadRequestError as e:204print(f"Bad request: {e.message}")205except anthropic.AuthenticationError:206print("Invalid API key")207except anthropic.PermissionDeniedError:208print("API key lacks required permissions")209except anthropic.NotFoundError:210print("Invalid model or endpoint")211except anthropic.RateLimitError as e:212retry_after = int(e.response.headers.get("retry-after", "60"))213print(f"Rate limited. Retry after {retry_after}s.")214except anthropic.APIStatusError as e:215if e.status_code >= 500:216print(f"Server error ({e.status_code}). Retry later.")217else:218print(f"API error: {e.message}")219except anthropic.APIConnectionError:220print("Network error. Check internet connection.")221```222223---224225## Multi-Turn Conversations226227The API is stateless — send the full conversation history each time.228229```python230class ConversationManager:231"""Manage multi-turn conversations with the Claude API."""232233def __init__(self, client: anthropic.Anthropic, model: str, system: str = None):234self.client = client235self.model = model236self.system = system237self.messages = []238239def send(self, user_message: str, **kwargs) -> str:240"""Send a message and get a response."""241self.messages.append({"role": "user", "content": user_message})242243response = self.client.messages.create(244model=self.model,245max_tokens=kwargs.get("max_tokens", 16000),246system=self.system,247messages=self.messages,248**kwargs249)250251assistant_message = next(252(b.text for b in response.content if b.type == "text"), ""253)254self.messages.append({"role": "assistant", "content": assistant_message})255256return assistant_message257258# Usage259conversation = ConversationManager(260client=anthropic.Anthropic(),261model="claude-opus-4-7",262system="You are a helpful assistant."263)264265response1 = conversation.send("My name is Alice.")266response2 = conversation.send("What's my name?") # Claude remembers "Alice"267```268269**Rules:**270271- Messages must alternate between `user` and `assistant`272- First message must be `user`273274---275276### Compaction (long conversations)277278> **Beta, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.279280```python281import anthropic282283client = anthropic.Anthropic()284messages = []285286def chat(user_message: str) -> str:287messages.append({"role": "user", "content": user_message})288289response = client.beta.messages.create(290betas=["compact-2026-01-12"],291model="claude-opus-4-7",292max_tokens=16000,293messages=messages,294context_management={295"edits": [{"type": "compact_20260112"}]296}297)298299# Append full content — compaction blocks must be preserved300messages.append({"role": "assistant", "content": response.content})301302return next(block.text for block in response.content if block.type == "text")303304# Compaction triggers automatically when context grows large305print(chat("Help me build a Python web scraper"))306print(chat("Add support for JavaScript-rendered pages"))307print(chat("Now add rate limiting and error handling"))308```309310---311312## Stop Reasons313314The `stop_reason` field in the response indicates why the model stopped generating:315316| Value | Meaning |317|-------|---------|318| `end_turn` | Claude finished its response naturally |319| `max_tokens` | Hit the `max_tokens` limit — increase it or use streaming |320| `stop_sequence` | Hit a custom stop sequence |321| `tool_use` | Claude wants to call a tool — execute it and continue |322| `pause_turn` | Model paused and can be resumed (agentic flows) |323| `refusal` | Claude refused for safety reasons — output may not match your schema |324325---326327## Cost Optimization Strategies328329### 1. Use Prompt Caching for Repeated Context330331```python332# Automatic caching (simplest — caches the last cacheable block)333response = client.messages.create(334model="claude-opus-4-7",335max_tokens=16000,336cache_control={"type": "ephemeral"},337system=large_document_text, # e.g., 50KB of context338messages=[{"role": "user", "content": "Summarize the key points"}]339)340341# First request: full cost342# Subsequent requests: ~90% cheaper for cached portion343```344345### 2. Choose the Right Model346347```python348# Default to Opus for most tasks349response = client.messages.create(350model="claude-opus-4-7", # $5.00/$25.00 per 1M tokens351max_tokens=16000,352messages=[{"role": "user", "content": "Explain quantum computing"}]353)354355# Use Sonnet for high-volume production workloads356standard_response = client.messages.create(357model="claude-sonnet-4-6", # $3.00/$15.00 per 1M tokens358max_tokens=16000,359messages=[{"role": "user", "content": "Summarize this document"}]360)361362# Use Haiku only for simple, speed-critical tasks363simple_response = client.messages.create(364model="claude-haiku-4-5", # $1.00/$5.00 per 1M tokens365max_tokens=256,366messages=[{"role": "user", "content": "Classify this as positive or negative"}]367)368```369370### 3. Use Token Counting Before Requests371372```python373count_response = client.messages.count_tokens(374model="claude-opus-4-7",375messages=messages,376system=system377)378379estimated_input_cost = count_response.input_tokens * 0.000005 # $5/1M tokens380print(f"Estimated input cost: ${estimated_input_cost:.4f}")381```382383---384385## Retry with Exponential Backoff386387> **Note:** The Anthropic SDK automatically retries rate limit (429) and server errors (5xx) with exponential backoff. You can configure this with `max_retries` (default: 2). Only implement custom retry logic if you need behavior beyond what the SDK provides.388389```python390import time391import random392import anthropic393394def call_with_retry(395client: anthropic.Anthropic,396max_retries: int = 5,397base_delay: float = 1.0,398max_delay: float = 60.0,399**kwargs400):401"""Call the API with exponential backoff retry."""402last_exception = None403404for attempt in range(max_retries):405try:406return client.messages.create(**kwargs)407except anthropic.RateLimitError as e:408last_exception = e409except anthropic.APIStatusError as e:410if e.status_code >= 500:411last_exception = e412else:413raise # Client errors (4xx except 429) should not be retried414415delay = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)416print(f"Retry {attempt + 1}/{max_retries} after {delay:.1f}s")417time.sleep(delay)418419raise last_exception420```421