Source from repo
Cloudflare Platform Skill

Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
cloudflareGitHub cloudflareSource repo Original GitHub link Publisher page
Files
321
Skill
n/a
Size
1.4 MB
Entrypoint
SKILL.md
Format
git-repo
Open file
references/r2-sql/README.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown129 linesFree
references/r2-sql/README.md
1# Cloudflare R2 SQL Skill Reference
2 
3Expert guidance for Cloudflare R2 SQL - serverless distributed query engine for Apache Iceberg tables.
4 
5## Reading Order
6 
7**New to R2 SQL?** Start here:
81. Read "What is R2 SQL?" and "When to Use" below
92. [configuration.md](configuration.md) - Enable catalog, create tokens
103. [patterns.md](patterns.md) - Wrangler CLI and integration examples
114. [api.md](api.md) - SQL syntax and query reference
125. [gotchas.md](gotchas.md) - Limitations and troubleshooting
13 
14**Quick reference?** Jump to:
15- [Run a query via Wrangler](patterns.md#wrangler-cli-query)
16- [SQL syntax reference](api.md#sql-syntax)
17- [ORDER BY limitations](gotchas.md#order-by-limitations)
18 
19## What is R2 SQL?
20 
21R2 SQL is Cloudflare's **serverless distributed analytics query engine** for querying Apache Iceberg tables in R2 Data Catalog. Features:
22 
23- **Serverless** - No clusters to manage, no infrastructure
24- **Distributed** - Leverages Cloudflare's global network for parallel execution
25- **SQL interface** - Familiar SQL syntax for analytics queries
26- **Zero egress fees** - Query from any cloud/region without data transfer costs
27- **Open beta** - Free during beta (standard R2 storage costs apply)
28 
29### What is Apache Iceberg?
30 
31Open table format for large-scale analytics datasets in object storage:
32- **ACID transactions** - Safe concurrent reads/writes
33- **Metadata optimization** - Fast queries without full table scans
34- **Schema evolution** - Add/rename/drop columns without rewrites
35- **Partitioning** - Organize data for efficient pruning
36 
37## When to Use
38 
39**Use R2 SQL for:**
40- **Log analytics** - Query application/system logs with WHERE filters and aggregations
41- **BI dashboards** - Generate reports from large analytical datasets
42- **Fraud detection** - Analyze transaction patterns with GROUP BY/HAVING
43- **Multi-cloud analytics** - Query data from any cloud without egress fees
44- **Ad-hoc exploration** - Run SQL queries on Iceberg tables via Wrangler CLI
45 
46**Don't use R2 SQL for:**
47- **Workers/Pages runtime** - R2 SQL has no Workers binding, use HTTP API from external systems
48- **Real-time queries (<100ms)** - Optimized for analytical batch queries, not OLTP
49- **Complex joins/CTEs** - Limited SQL feature set (no JOINs, subqueries, CTEs currently)
50- **Small datasets (<1GB)** - Setup overhead not justified
51 
52## Decision Tree: Need to Query R2 Data?
53 
54```
55Do you need to query structured data in R2?
56├─ YES, data is in Iceberg tables
57│  ├─ Need SQL interface? → Use R2 SQL (this reference)
58│  ├─ Need Python API? → See r2-data-catalog reference (PyIceberg)
59│  └─ Need other engine? → See r2-data-catalog reference (Spark, Trino, etc.)
60│
61├─ YES, but not in Iceberg format
62│  ├─ Streaming data? → Use Pipelines to write to Data Catalog, then R2 SQL
63│  └─ Static files? → Use PyIceberg to create Iceberg tables, then R2 SQL
64│
65└─ NO, just need object storage → Use R2 reference (not R2 SQL)
66```
67 
68## Architecture Overview
69 
70**Query Planner:**
71- Top-down metadata investigation with multi-layer pruning
72- Partition-level, column-level, and row-group pruning
73- Streaming pipeline - execution starts before planning completes
74- Early termination with LIMIT - stops when result complete
75 
76**Query Execution:**
77- Coordinator distributes work to workers across Cloudflare network
78- Workers run Apache DataFusion for parallel query execution
79- Parquet column pruning - reads only required columns
80- Ranged reads from R2 for efficiency
81 
82**Aggregation Strategies:**
83- Scatter-gather - simple aggregations (SUM, COUNT, AVG)
84- Shuffling - ORDER BY/HAVING on aggregates via hash partitioning
85 
86## Quick Start
87 
88```bash
89# 1. Enable R2 Data Catalog on bucket
90npx wrangler r2 bucket catalog enable my-bucket
91 
92# 2. Create API token (Admin Read & Write)
93# Dashboard: R2 → Manage API tokens → Create API token
94 
95# 3. Set environment variable
96export WRANGLER_R2_SQL_AUTH_TOKEN=<your-token>
97 
98# 4. Run query
99npx wrangler r2 sql query "my-bucket" "SELECT * FROM default.my_table LIMIT 10"
100```
101 
102## Important Limitations
103 
104**CRITICAL: No Workers Binding**
105- R2 SQL cannot be called directly from Workers/Pages code
106- For programmatic access, use HTTP API from external systems
107- Or query via PyIceberg, Spark, etc. (see r2-data-catalog reference)
108 
109**SQL Feature Set:**
110- No JOINs, CTEs, subqueries, window functions
111- ORDER BY supports aggregation columns (not just partition keys)
112- LIMIT max 10,000 (default 500)
113- See [gotchas.md](gotchas.md) for complete limitations
114 
115## In This Reference
116 
117- **[configuration.md](configuration.md)** - Enable catalog, create API tokens
118- **[api.md](api.md)** - SQL syntax, functions, operators, data types
119- **[patterns.md](patterns.md)** - Wrangler CLI, HTTP API, Pipelines, PyIceberg
120- **[gotchas.md](gotchas.md)** - Limitations, troubleshooting, performance tips
121 
122## See Also
123 
124- [r2-data-catalog](../r2-data-catalog/) - PyIceberg, REST API, external engines
125- [pipelines](../pipelines/) - Streaming ingestion to Iceberg tables
126- [r2](../r2/) - R2 object storage fundamentals
127- [Cloudflare R2 SQL Docs](https://developers.cloudflare.com/r2-sql/)
128- [R2 SQL Deep Dive Blog](https://blog.cloudflare.com/r2-sql-deep-dive/)
129
Preparing the source view

Cloudflare Platform Skill

references/r2-sql/README.md