Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
references/r2-sql/README.md
1# Cloudflare R2 SQL Skill Reference23Expert guidance for Cloudflare R2 SQL - serverless distributed query engine for Apache Iceberg tables.45## Reading Order67**New to R2 SQL?** Start here:81. Read "What is R2 SQL?" and "When to Use" below92. [configuration.md](configuration.md) - Enable catalog, create tokens103. [patterns.md](patterns.md) - Wrangler CLI and integration examples114. [api.md](api.md) - SQL syntax and query reference125. [gotchas.md](gotchas.md) - Limitations and troubleshooting1314**Quick reference?** Jump to:15- [Run a query via Wrangler](patterns.md#wrangler-cli-query)16- [SQL syntax reference](api.md#sql-syntax)17- [ORDER BY limitations](gotchas.md#order-by-limitations)1819## What is R2 SQL?2021R2 SQL is Cloudflare's **serverless distributed analytics query engine** for querying Apache Iceberg tables in R2 Data Catalog. Features:2223- **Serverless** - No clusters to manage, no infrastructure24- **Distributed** - Leverages Cloudflare's global network for parallel execution25- **SQL interface** - Familiar SQL syntax for analytics queries26- **Zero egress fees** - Query from any cloud/region without data transfer costs27- **Open beta** - Free during beta (standard R2 storage costs apply)2829### What is Apache Iceberg?3031Open table format for large-scale analytics datasets in object storage:32- **ACID transactions** - Safe concurrent reads/writes33- **Metadata optimization** - Fast queries without full table scans34- **Schema evolution** - Add/rename/drop columns without rewrites35- **Partitioning** - Organize data for efficient pruning3637## When to Use3839**Use R2 SQL for:**40- **Log analytics** - Query application/system logs with WHERE filters and aggregations41- **BI dashboards** - Generate reports from large analytical datasets42- **Fraud detection** - Analyze transaction patterns with GROUP BY/HAVING43- **Multi-cloud analytics** - Query data from any cloud without egress fees44- **Ad-hoc exploration** - Run SQL queries on Iceberg tables via Wrangler CLI4546**Don't use R2 SQL for:**47- **Workers/Pages runtime** - R2 SQL has no Workers binding, use HTTP API from external systems48- **Real-time queries (<100ms)** - Optimized for analytical batch queries, not OLTP49- **Complex joins/CTEs** - Limited SQL feature set (no JOINs, subqueries, CTEs currently)50- **Small datasets (<1GB)** - Setup overhead not justified5152## Decision Tree: Need to Query R2 Data?5354```55Do you need to query structured data in R2?56├─ YES, data is in Iceberg tables57│ ├─ Need SQL interface? → Use R2 SQL (this reference)58│ ├─ Need Python API? → See r2-data-catalog reference (PyIceberg)59│ └─ Need other engine? → See r2-data-catalog reference (Spark, Trino, etc.)60│61├─ YES, but not in Iceberg format62│ ├─ Streaming data? → Use Pipelines to write to Data Catalog, then R2 SQL63│ └─ Static files? → Use PyIceberg to create Iceberg tables, then R2 SQL64│65└─ NO, just need object storage → Use R2 reference (not R2 SQL)66```6768## Architecture Overview6970**Query Planner:**71- Top-down metadata investigation with multi-layer pruning72- Partition-level, column-level, and row-group pruning73- Streaming pipeline - execution starts before planning completes74- Early termination with LIMIT - stops when result complete7576**Query Execution:**77- Coordinator distributes work to workers across Cloudflare network78- Workers run Apache DataFusion for parallel query execution79- Parquet column pruning - reads only required columns80- Ranged reads from R2 for efficiency8182**Aggregation Strategies:**83- Scatter-gather - simple aggregations (SUM, COUNT, AVG)84- Shuffling - ORDER BY/HAVING on aggregates via hash partitioning8586## Quick Start8788```bash89# 1. Enable R2 Data Catalog on bucket90npx wrangler r2 bucket catalog enable my-bucket9192# 2. Create API token (Admin Read & Write)93# Dashboard: R2 → Manage API tokens → Create API token9495# 3. Set environment variable96export WRANGLER_R2_SQL_AUTH_TOKEN=<your-token>9798# 4. Run query99npx wrangler r2 sql query "my-bucket" "SELECT * FROM default.my_table LIMIT 10"100```101102## Important Limitations103104**CRITICAL: No Workers Binding**105- R2 SQL cannot be called directly from Workers/Pages code106- For programmatic access, use HTTP API from external systems107- Or query via PyIceberg, Spark, etc. (see r2-data-catalog reference)108109**SQL Feature Set:**110- No JOINs, CTEs, subqueries, window functions111- ORDER BY supports aggregation columns (not just partition keys)112- LIMIT max 10,000 (default 500)113- See [gotchas.md](gotchas.md) for complete limitations114115## In This Reference116117- **[configuration.md](configuration.md)** - Enable catalog, create API tokens118- **[api.md](api.md)** - SQL syntax, functions, operators, data types119- **[patterns.md](patterns.md)** - Wrangler CLI, HTTP API, Pipelines, PyIceberg120- **[gotchas.md](gotchas.md)** - Limitations, troubleshooting, performance tips121122## See Also123124- [r2-data-catalog](../r2-data-catalog/) - PyIceberg, REST API, external engines125- [pipelines](../pipelines/) - Streaming ingestion to Iceberg tables126- [r2](../r2/) - R2 object storage fundamentals127- [Cloudflare R2 SQL Docs](https://developers.cloudflare.com/r2-sql/)128- [R2 SQL Deep Dive Blog](https://blog.cloudflare.com/r2-sql-deep-dive/)129