Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
references/workers-ai/README.md
1# Cloudflare Workers AI23Expert guidance for Cloudflare Workers AI - serverless GPU-powered AI inference at the edge.45## Overview67Workers AI provides:8- 50+ pre-trained models (LLMs, embeddings, image generation, speech-to-text, translation)9- Native Workers binding (no external API calls)10- Pay-per-use pricing (neurons consumed per inference)11- OpenAI-compatible REST API12- Streaming support for text generation13- Function calling with compatible models1415**Architecture**: Inference runs on Cloudflare's GPU network. Models load on first request (cold start 1-3s), subsequent requests are faster.1617## Quick Start1819```typescript20interface Env {21AI: Ai;22}2324export default {25async fetch(request: Request, env: Env) {26const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {27messages: [{ role: 'user', content: 'What is Cloudflare?' }]28});29return Response.json(response);30}31};32```3334```bash35# Setup - add binding to wrangler.jsonc36wrangler dev --remote # Must use --remote for AI37wrangler deploy38```3940## Model Selection Decision Tree4142### Text Generation (Chat/Completion)4344**Quality Priority**:45- **Best quality**: `@cf/meta/llama-3.1-70b-instruct` (expensive, ~2000 neurons)46- **Balanced**: `@cf/meta/llama-3.1-8b-instruct` (good quality, ~200 neurons)47- **Fastest/cheapest**: `@cf/mistral/mistral-7b-instruct-v0.1` (~50 neurons)4849**Function Calling**:50- Use `@cf/meta/llama-3.1-8b-instruct` or `@cf/meta/llama-3.1-70b-instruct` (native tool support)5152**Code Generation**:53- Use `@cf/deepseek-ai/deepseek-coder-6.7b-instruct` (specialized for code)5455### Embeddings (Semantic Search/RAG)5657**English text**:58- **Best**: `@cf/baai/bge-large-en-v1.5` (1024 dims, highest quality)59- **Balanced**: `@cf/baai/bge-base-en-v1.5` (768 dims, good quality)60- **Fast**: `@cf/baai/bge-small-en-v1.5` (384 dims, lower quality but fast)6162**Multilingual**:63- Use `@hf/sentence-transformers/paraphrase-multilingual-minilm-l12-v2`6465### Image Generation6667- **Stable Diffusion**: `@cf/stabilityai/stable-diffusion-xl-base-1.0` (~10,000 neurons)68- **Portraits**: `@cf/lykon/dreamshaper-8-lcm` (optimized for faces)6970### Other Tasks7172- **Speech-to-text**: `@cf/openai/whisper`73- **Translation**: `@cf/meta/m2m100-1.2b` (100 languages)74- **Image classification**: `@cf/microsoft/resnet-50`7576## SDK Approach Decision Tree7778### Native Binding (Recommended)7980**When**: Building Workers/Pages with TypeScript81**Why**: Zero external dependencies, best performance, native types8283```typescript84await env.AI.run(model, input);85```8687### REST API8889**When**: External services, non-Workers environments, testing90**Why**: Standard HTTP, works anywhere9192```bash93curl https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/run/@cf/meta/llama-3.1-8b-instruct \94-H "Authorization: Bearer <API_TOKEN>" \95-d '{"messages":[{"role":"user","content":"Hello"}]}'96```9798### Vercel AI SDK Integration99100**When**: Using Vercel AI SDK features (streaming UI, tool calling abstractions)101**Why**: Unified interface across providers102103```typescript104import { openai } from '@ai-sdk/openai';105106const model = openai('model-name', {107baseURL: 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/v1',108headers: { Authorization: 'Bearer <API_TOKEN>' }109});110```111112## RAG vs Direct Generation113114### Use RAG (Vectorize + Workers AI) When:115- Answering questions about specific documents/data116- Need factual accuracy from known corpus117- Context exceeds model's window (>4K tokens)118- Building knowledge base chat119120### Use Direct Generation When:121- Creative writing, brainstorming122- General knowledge questions123- Small context fits in prompt (<4K tokens)124- Cost optimization (RAG adds embedding + vector search costs)125126## Platform Limits127128| Limit | Free Tier | Paid Plans |129|-------|-----------|------------|130| Neurons/day | 10,000 | Pay per use |131| Rate limit | Varies by model | Higher (contact support) |132| Context window | Model dependent (2K-8K) | Same |133| Streaming | ✅ Supported | ✅ Supported |134| Function calling | ✅ Supported (select models) | ✅ Supported |135136**Pricing**: Free 10K neurons/day, then pay per neuron consumed (varies by model)137138## Common Tasks139140```typescript141// Streaming text generation142const stream = await env.AI.run(model, { messages, stream: true });143for await (const chunk of stream) {144console.log(chunk.response);145}146147// Embeddings for RAG148const { data } = await env.AI.run('@cf/baai/bge-base-en-v1.5', {149text: ['Query text', 'Document 1', 'Document 2']150});151152// Function calling153const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {154messages: [{ role: 'user', content: 'What is the weather?' }],155tools: [{156type: 'function',157function: { name: 'getWeather', parameters: { ... } }158}]159});160```161162## Development Workflow163164```bash165# Always use --remote for AI (local doesn't have models)166wrangler dev --remote167168# Deploy to production169wrangler deploy170171# View model catalog172# https://developers.cloudflare.com/workers-ai/models/173```174175## Reading Order176177**Start here**: Quick Start above → configuration.md (setup)178179**Common tasks**:180- First time setup: configuration.md → Add binding + deploy181- Choose model: Model Selection Decision Tree (above) → api.md182- Build RAG: patterns.md → Vectorize integration183- Optimize costs: Model Selection + gotchas.md (rate limits)184- Debugging: gotchas.md → Common errors185186## In This Reference187188- [configuration.md](./configuration.md) - wrangler.jsonc setup, TypeScript types, bindings, environment variables189- [api.md](./api.md) - env.AI.run(), streaming, function calling, REST API, response types190- [patterns.md](./patterns.md) - RAG with Vectorize, prompt engineering, batching, error handling, caching191- [gotchas.md](./gotchas.md) - Deprecated @cloudflare/ai package, rate limits, pricing, common errors192193## See Also194195- [vectorize](../vectorize/) - Vector database for RAG patterns196- [ai-gateway](../ai-gateway/) - Caching, rate limiting, analytics for AI requests197- [workers](../workers/) - Worker runtime and fetch handler patterns198