Source from repo
Building LLM-Powered Applications with Claude

Build LLM-powered apps with the Anthropic Claude API or SDK across Python, TypeScript, Java, Go, Ruby, C#, and PHP.
anthropicsGitHub anthropicsOfficialSource repo Original GitHub link Publisher page
Files
Skill
n/a
Size
517.2 KB
Entrypoint
SKILL.md
Format
git-repo
Open file
python/claude-api/README.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown421 linesFree
python/claude-api/README.md
1# Claude API — Python
2 
3## Installation
4 
5```bash
6pip install anthropic
7```
8 
9## Client Initialization
10 
11```python
12import anthropic
13 
14# Default (uses ANTHROPIC_API_KEY env var)
15client = anthropic.Anthropic()
16 
17# Explicit API key
18client = anthropic.Anthropic(api_key="your-api-key")
19 
20# Async client
21async_client = anthropic.AsyncAnthropic()
22```
23 
24---
25 
26## Basic Message Request
27 
28```python
29response = client.messages.create(
30    model="claude-opus-4-7",
31    max_tokens=16000,
32    messages=[
33        {"role": "user", "content": "What is the capital of France?"}
34    ]
35)
36# response.content is a list of content block objects (TextBlock, ThinkingBlock,
37# ToolUseBlock, ...). Check .type before accessing .text.
38for block in response.content:
39    if block.type == "text":
40        print(block.text)
41```
42 
43---
44 
45## System Prompts
46 
47```python
48response = client.messages.create(
49    model="claude-opus-4-7",
50    max_tokens=16000,
51    system="You are a helpful coding assistant. Always provide examples in Python.",
52    messages=[{"role": "user", "content": "How do I read a JSON file?"}]
53)
54```
55 
56---
57 
58## Vision (Images)
59 
60### Base64
61 
62```python
63import base64
64 
65with open("image.png", "rb") as f:
66    image_data = base64.standard_b64encode(f.read()).decode("utf-8")
67 
68response = client.messages.create(
69    model="claude-opus-4-7",
70    max_tokens=16000,
71    messages=[{
72        "role": "user",
73        "content": [
74            {
75                "type": "image",
76                "source": {
77                    "type": "base64",
78                    "media_type": "image/png",
79                    "data": image_data
80                }
81            },
82            {"type": "text", "text": "What's in this image?"}
83        ]
84    }]
85)
86```
87 
88### URL
89 
90```python
91response = client.messages.create(
92    model="claude-opus-4-7",
93    max_tokens=16000,
94    messages=[{
95        "role": "user",
96        "content": [
97            {
98                "type": "image",
99                "source": {
100                    "type": "url",
101                    "url": "https://example.com/image.png"
102                }
103            },
104            {"type": "text", "text": "Describe this image"}
105        ]
106    }]
107)
108```
109 
110---
111 
112## Prompt Caching
113 
114Cache large context to reduce costs (up to 90% savings). **Caching is a prefix match** — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read `shared/prompt-caching.md`.
115 
116### Automatic Caching (Recommended)
117 
118Use top-level `cache_control` to automatically cache the last cacheable block in the request — no need to annotate individual content blocks:
119 
120```python
121response = client.messages.create(
122    model="claude-opus-4-7",
123    max_tokens=16000,
124    cache_control={"type": "ephemeral"},  # auto-caches the last cacheable block
125    system="You are an expert on this large document...",
126    messages=[{"role": "user", "content": "Summarize the key points"}]
127)
128```
129 
130### Manual Cache Control
131 
132For fine-grained control, add `cache_control` to specific content blocks:
133 
134```python
135response = client.messages.create(
136    model="claude-opus-4-7",
137    max_tokens=16000,
138    system=[{
139        "type": "text",
140        "text": "You are an expert on this large document...",
141        "cache_control": {"type": "ephemeral"}  # default TTL is 5 minutes
142    }],
143    messages=[{"role": "user", "content": "Summarize the key points"}]
144)
145 
146# With explicit TTL (time-to-live)
147response = client.messages.create(
148    model="claude-opus-4-7",
149    max_tokens=16000,
150    system=[{
151        "type": "text",
152        "text": "You are an expert on this large document...",
153        "cache_control": {"type": "ephemeral", "ttl": "1h"}  # 1 hour TTL
154    }],
155    messages=[{"role": "user", "content": "Summarize the key points"}]
156)
157```
158 
159### Verifying Cache Hits
160 
161```python
162print(response.usage.cache_creation_input_tokens)  # tokens written to cache (~1.25x cost)
163print(response.usage.cache_read_input_tokens)      # tokens served from cache (~0.1x cost)
164print(response.usage.input_tokens)                 # uncached tokens (full cost)
165```
166 
167If `cache_read_input_tokens` is zero across repeated identical-prefix requests, a silent invalidator is at work — `datetime.now()` or a UUID in the system prompt, unsorted `json.dumps()`, or a varying tool set. See `shared/prompt-caching.md` for the full audit table.
168 
169---
170 
171## Extended Thinking
172 
173> **Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Opus 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
174> **Older models:** Use `thinking: {type: "enabled", budget_tokens: N}` (must be < `max_tokens`, min 1024).
175 
176```python
177# Opus 4.7 / 4.6: adaptive thinking (recommended)
178response = client.messages.create(
179    model="claude-opus-4-7",
180    max_tokens=16000,
181    thinking={"type": "adaptive"},
182    output_config={"effort": "high"},  # low | medium | high | max
183    messages=[{"role": "user", "content": "Solve this step by step..."}]
184)
185 
186# Access thinking and response
187for block in response.content:
188    if block.type == "thinking":
189        print(f"Thinking: {block.thinking}")
190    elif block.type == "text":
191        print(f"Response: {block.text}")
192```
193 
194---
195 
196## Error Handling
197 
198```python
199import anthropic
200 
201try:
202    response = client.messages.create(...)
203except anthropic.BadRequestError as e:
204    print(f"Bad request: {e.message}")
205except anthropic.AuthenticationError:
206    print("Invalid API key")
207except anthropic.PermissionDeniedError:
208    print("API key lacks required permissions")
209except anthropic.NotFoundError:
210    print("Invalid model or endpoint")
211except anthropic.RateLimitError as e:
212    retry_after = int(e.response.headers.get("retry-after", "60"))
213    print(f"Rate limited. Retry after {retry_after}s.")
214except anthropic.APIStatusError as e:
215    if e.status_code >= 500:
216        print(f"Server error ({e.status_code}). Retry later.")
217    else:
218        print(f"API error: {e.message}")
219except anthropic.APIConnectionError:
220    print("Network error. Check internet connection.")
221```
222 
223---
224 
225## Multi-Turn Conversations
226 
227The API is stateless — send the full conversation history each time.
228 
229```python
230class ConversationManager:
231    """Manage multi-turn conversations with the Claude API."""
232 
233    def __init__(self, client: anthropic.Anthropic, model: str, system: str = None):
234        self.client = client
235        self.model = model
236        self.system = system
237        self.messages = []
238 
239    def send(self, user_message: str, **kwargs) -> str:
240        """Send a message and get a response."""
241        self.messages.append({"role": "user", "content": user_message})
242 
243        response = self.client.messages.create(
244            model=self.model,
245            max_tokens=kwargs.get("max_tokens", 16000),
246            system=self.system,
247            messages=self.messages,
248            **kwargs
249        )
250 
251        assistant_message = next(
252            (b.text for b in response.content if b.type == "text"), ""
253        )
254        self.messages.append({"role": "assistant", "content": assistant_message})
255 
256        return assistant_message
257 
258# Usage
259conversation = ConversationManager(
260    client=anthropic.Anthropic(),
261    model="claude-opus-4-7",
262    system="You are a helpful assistant."
263)
264 
265response1 = conversation.send("My name is Alice.")
266response2 = conversation.send("What's my name?")  # Claude remembers "Alice"
267```
268 
269**Rules:**
270 
271- Messages must alternate between `user` and `assistant`
272- First message must be `user`
273 
274---
275 
276### Compaction (long conversations)
277 
278> **Beta, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.
279 
280```python
281import anthropic
282 
283client = anthropic.Anthropic()
284messages = []
285 
286def chat(user_message: str) -> str:
287    messages.append({"role": "user", "content": user_message})
288 
289    response = client.beta.messages.create(
290        betas=["compact-2026-01-12"],
291        model="claude-opus-4-7",
292        max_tokens=16000,
293        messages=messages,
294        context_management={
295            "edits": [{"type": "compact_20260112"}]
296        }
297    )
298 
299    # Append full content — compaction blocks must be preserved
300    messages.append({"role": "assistant", "content": response.content})
301 
302    return next(block.text for block in response.content if block.type == "text")
303 
304# Compaction triggers automatically when context grows large
305print(chat("Help me build a Python web scraper"))
306print(chat("Add support for JavaScript-rendered pages"))
307print(chat("Now add rate limiting and error handling"))
308```
309 
310---
311 
312## Stop Reasons
313 
314The `stop_reason` field in the response indicates why the model stopped generating:
315 
316| Value | Meaning |
317|-------|---------|
318| `end_turn` | Claude finished its response naturally |
319| `max_tokens` | Hit the `max_tokens` limit — increase it or use streaming |
320| `stop_sequence` | Hit a custom stop sequence |
321| `tool_use` | Claude wants to call a tool — execute it and continue |
322| `pause_turn` | Model paused and can be resumed (agentic flows) |
323| `refusal` | Claude refused for safety reasons — output may not match your schema |
324 
325---
326 
327## Cost Optimization Strategies
328 
329### 1. Use Prompt Caching for Repeated Context
330 
331```python
332# Automatic caching (simplest — caches the last cacheable block)
333response = client.messages.create(
334    model="claude-opus-4-7",
335    max_tokens=16000,
336    cache_control={"type": "ephemeral"},
337    system=large_document_text,  # e.g., 50KB of context
338    messages=[{"role": "user", "content": "Summarize the key points"}]
339)
340 
341# First request: full cost
342# Subsequent requests: ~90% cheaper for cached portion
343```
344 
345### 2. Choose the Right Model
346 
347```python
348# Default to Opus for most tasks
349response = client.messages.create(
350    model="claude-opus-4-7",  # $5.00/$25.00 per 1M tokens
351    max_tokens=16000,
352    messages=[{"role": "user", "content": "Explain quantum computing"}]
353)
354 
355# Use Sonnet for high-volume production workloads
356standard_response = client.messages.create(
357    model="claude-sonnet-4-6",  # $3.00/$15.00 per 1M tokens
358    max_tokens=16000,
359    messages=[{"role": "user", "content": "Summarize this document"}]
360)
361 
362# Use Haiku only for simple, speed-critical tasks
363simple_response = client.messages.create(
364    model="claude-haiku-4-5",  # $1.00/$5.00 per 1M tokens
365    max_tokens=256,
366    messages=[{"role": "user", "content": "Classify this as positive or negative"}]
367)
368```
369 
370### 3. Use Token Counting Before Requests
371 
372```python
373count_response = client.messages.count_tokens(
374    model="claude-opus-4-7",
375    messages=messages,
376    system=system
377)
378 
379estimated_input_cost = count_response.input_tokens * 0.000005  # $5/1M tokens
380print(f"Estimated input cost: ${estimated_input_cost:.4f}")
381```
382 
383---
384 
385## Retry with Exponential Backoff
386 
387> **Note:** The Anthropic SDK automatically retries rate limit (429) and server errors (5xx) with exponential backoff. You can configure this with `max_retries` (default: 2). Only implement custom retry logic if you need behavior beyond what the SDK provides.
388 
389```python
390import time
391import random
392import anthropic
393 
394def call_with_retry(
395    client: anthropic.Anthropic,
396    max_retries: int = 5,
397    base_delay: float = 1.0,
398    max_delay: float = 60.0,
399    **kwargs
400):
401    """Call the API with exponential backoff retry."""
402    last_exception = None
403 
404    for attempt in range(max_retries):
405        try:
406            return client.messages.create(**kwargs)
407        except anthropic.RateLimitError as e:
408            last_exception = e
409        except anthropic.APIStatusError as e:
410            if e.status_code >= 500:
411                last_exception = e
412            else:
413                raise  # Client errors (4xx except 429) should not be retried
414 
415        delay = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)
416        print(f"Retry {attempt + 1}/{max_retries} after {delay:.1f}s")
417        time.sleep(delay)
418 
419    raise last_exception
420```
421
Preparing the source view

Building LLM-Powered Applications with Claude

python/claude-api/README.md