Source from repo
Cloudflare Platform Skill

Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
cloudflareGitHub cloudflareSource repo Original GitHub link Publisher page
Files
321
Skill
n/a
Size
1.4 MB
Entrypoint
SKILL.md
Format
git-repo
Open file
references/workers-ai/README.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown198 linesFree
references/workers-ai/README.md
1# Cloudflare Workers AI
2 
3Expert guidance for Cloudflare Workers AI - serverless GPU-powered AI inference at the edge.
4 
5## Overview
6 
7Workers AI provides:
8- 50+ pre-trained models (LLMs, embeddings, image generation, speech-to-text, translation)
9- Native Workers binding (no external API calls)
10- Pay-per-use pricing (neurons consumed per inference)
11- OpenAI-compatible REST API
12- Streaming support for text generation
13- Function calling with compatible models
14 
15**Architecture**: Inference runs on Cloudflare's GPU network. Models load on first request (cold start 1-3s), subsequent requests are faster.
16 
17## Quick Start
18 
19```typescript
20interface Env {
21  AI: Ai;
22}
23 
24export default {
25  async fetch(request: Request, env: Env) {
26    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
27      messages: [{ role: 'user', content: 'What is Cloudflare?' }]
28    });
29    return Response.json(response);
30  }
31};
32```
33 
34```bash
35# Setup - add binding to wrangler.jsonc
36wrangler dev --remote  # Must use --remote for AI
37wrangler deploy
38```
39 
40## Model Selection Decision Tree
41 
42### Text Generation (Chat/Completion)
43 
44**Quality Priority**:
45- **Best quality**: `@cf/meta/llama-3.1-70b-instruct` (expensive, ~2000 neurons)
46- **Balanced**: `@cf/meta/llama-3.1-8b-instruct` (good quality, ~200 neurons)
47- **Fastest/cheapest**: `@cf/mistral/mistral-7b-instruct-v0.1` (~50 neurons)
48 
49**Function Calling**:
50- Use `@cf/meta/llama-3.1-8b-instruct` or `@cf/meta/llama-3.1-70b-instruct` (native tool support)
51 
52**Code Generation**:
53- Use `@cf/deepseek-ai/deepseek-coder-6.7b-instruct` (specialized for code)
54 
55### Embeddings (Semantic Search/RAG)
56 
57**English text**:
58- **Best**: `@cf/baai/bge-large-en-v1.5` (1024 dims, highest quality)
59- **Balanced**: `@cf/baai/bge-base-en-v1.5` (768 dims, good quality)
60- **Fast**: `@cf/baai/bge-small-en-v1.5` (384 dims, lower quality but fast)
61 
62**Multilingual**:
63- Use `@hf/sentence-transformers/paraphrase-multilingual-minilm-l12-v2`
64 
65### Image Generation
66 
67- **Stable Diffusion**: `@cf/stabilityai/stable-diffusion-xl-base-1.0` (~10,000 neurons)
68- **Portraits**: `@cf/lykon/dreamshaper-8-lcm` (optimized for faces)
69 
70### Other Tasks
71 
72- **Speech-to-text**: `@cf/openai/whisper`
73- **Translation**: `@cf/meta/m2m100-1.2b` (100 languages)
74- **Image classification**: `@cf/microsoft/resnet-50`
75 
76## SDK Approach Decision Tree
77 
78### Native Binding (Recommended)
79 
80**When**: Building Workers/Pages with TypeScript  
81**Why**: Zero external dependencies, best performance, native types
82 
83```typescript
84await env.AI.run(model, input);
85```
86 
87### REST API
88 
89**When**: External services, non-Workers environments, testing  
90**Why**: Standard HTTP, works anywhere
91 
92```bash
93curl https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/run/@cf/meta/llama-3.1-8b-instruct \
94  -H "Authorization: Bearer <API_TOKEN>" \
95  -d '{"messages":[{"role":"user","content":"Hello"}]}'
96```
97 
98### Vercel AI SDK Integration
99 
100**When**: Using Vercel AI SDK features (streaming UI, tool calling abstractions)  
101**Why**: Unified interface across providers
102 
103```typescript
104import { openai } from '@ai-sdk/openai';
105 
106const model = openai('model-name', {
107  baseURL: 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/v1',
108  headers: { Authorization: 'Bearer <API_TOKEN>' }
109});
110```
111 
112## RAG vs Direct Generation
113 
114### Use RAG (Vectorize + Workers AI) When:
115- Answering questions about specific documents/data
116- Need factual accuracy from known corpus
117- Context exceeds model's window (>4K tokens)
118- Building knowledge base chat
119 
120### Use Direct Generation When:
121- Creative writing, brainstorming
122- General knowledge questions
123- Small context fits in prompt (<4K tokens)
124- Cost optimization (RAG adds embedding + vector search costs)
125 
126## Platform Limits
127 
128| Limit | Free Tier | Paid Plans |
129|-------|-----------|------------|
130| Neurons/day | 10,000 | Pay per use |
131| Rate limit | Varies by model | Higher (contact support) |
132| Context window | Model dependent (2K-8K) | Same |
133| Streaming | ✅ Supported | ✅ Supported |
134| Function calling | ✅ Supported (select models) | ✅ Supported |
135 
136**Pricing**: Free 10K neurons/day, then pay per neuron consumed (varies by model)
137 
138## Common Tasks
139 
140```typescript
141// Streaming text generation
142const stream = await env.AI.run(model, { messages, stream: true });
143for await (const chunk of stream) {
144  console.log(chunk.response);
145}
146 
147// Embeddings for RAG
148const { data } = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
149  text: ['Query text', 'Document 1', 'Document 2']
150});
151 
152// Function calling
153const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
154  messages: [{ role: 'user', content: 'What is the weather?' }],
155  tools: [{
156    type: 'function',
157    function: { name: 'getWeather', parameters: { ... } }
158  }]
159});
160```
161 
162## Development Workflow
163 
164```bash
165# Always use --remote for AI (local doesn't have models)
166wrangler dev --remote
167 
168# Deploy to production
169wrangler deploy
170 
171# View model catalog
172# https://developers.cloudflare.com/workers-ai/models/
173```
174 
175## Reading Order
176 
177**Start here**: Quick Start above → configuration.md (setup)
178 
179**Common tasks**:
180- First time setup: configuration.md → Add binding + deploy
181- Choose model: Model Selection Decision Tree (above) → api.md
182- Build RAG: patterns.md → Vectorize integration
183- Optimize costs: Model Selection + gotchas.md (rate limits)
184- Debugging: gotchas.md → Common errors
185 
186## In This Reference
187 
188- [configuration.md](./configuration.md) - wrangler.jsonc setup, TypeScript types, bindings, environment variables
189- [api.md](./api.md) - env.AI.run(), streaming, function calling, REST API, response types
190- [patterns.md](./patterns.md) - RAG with Vectorize, prompt engineering, batching, error handling, caching
191- [gotchas.md](./gotchas.md) - Deprecated @cloudflare/ai package, rate limits, pricing, common errors
192 
193## See Also
194 
195- [vectorize](../vectorize/) - Vector database for RAG patterns
196- [ai-gateway](../ai-gateway/) - Caching, rate limiting, analytics for AI requests
197- [workers](../workers/) - Worker runtime and fetch handler patterns
198
Preparing the source view

Cloudflare Platform Skill

references/workers-ai/README.md