Source from repo
Cloudflare Platform Skill

Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
cloudflareGitHub cloudflareSource repo Original GitHub link Publisher page
Files
321
Skill
n/a
Size
1.4 MB
Entrypoint
SKILL.md
Format
git-repo
Open file
references/r2-sql/patterns.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown223 linesFree
references/r2-sql/patterns.md
1# R2 SQL Patterns
2 
3Common patterns, use cases, and integration examples for R2 SQL.
4 
5## Wrangler CLI Query
6 
7```bash
8# Basic query
9npx wrangler r2 sql query "my-bucket" "SELECT * FROM default.logs LIMIT 10"
10 
11# Multi-line query
12npx wrangler r2 sql query "my-bucket" "
13  SELECT status, COUNT(*), AVG(response_time)
14  FROM logs.http_requests
15  WHERE timestamp >= '2025-01-01T00:00:00Z'
16  GROUP BY status
17  ORDER BY COUNT(*) DESC
18  LIMIT 100
19"
20 
21# Use environment variable
22export R2_SQL_WAREHOUSE="my-bucket"
23npx wrangler r2 sql query "$R2_SQL_WAREHOUSE" "SELECT * FROM default.logs"
24```
25 
26## HTTP API Query
27 
28For programmatic access from external systems (not Workers - see gotchas.md).
29 
30```bash
31curl -X POST https://api.cloudflare.com/client/v4/accounts/{account_id}/r2/sql/query \
32  -H "Authorization: Bearer <your-token>" \
33  -H "Content-Type: application/json" \
34  -d '{
35    "warehouse": "my-bucket",
36    "query": "SELECT * FROM default.my_table WHERE status = 200 LIMIT 100"
37  }'
38```
39 
40Response:
41```json
42{
43  "success": true,
44  "result": [{"user_id": "user_123", "timestamp": "2025-01-15T10:30:00Z", "status": 200}],
45  "errors": []
46}
47```
48 
49## Pipelines Integration
50 
51Stream data to Iceberg tables via Pipelines, then query with R2 SQL.
52 
53```bash
54# Setup pipeline (select Data Catalog Table destination)
55npx wrangler pipelines setup
56 
57# Key settings:
58# - Destination: Data Catalog Table
59# - Compression: zstd (recommended)
60# - Roll file time: 300+ sec (production), 10 sec (dev)
61 
62# Send data to pipeline
63curl -X POST https://{stream-id}.ingest.cloudflare.com \
64  -H "Content-Type: application/json" \
65  -d '[{"user_id": "user_123", "event_type": "purchase", "timestamp": "2025-01-15T10:30:00Z", "amount": 29.99}]'
66 
67# Query ingested data (wait for roll interval)
68npx wrangler r2 sql query "my-bucket" "
69  SELECT event_type, COUNT(*), SUM(amount)
70  FROM default.events
71  WHERE timestamp >= '2025-01-15T00:00:00Z'
72  GROUP BY event_type
73"
74```
75 
76See [pipelines/patterns.md](../pipelines/patterns.md) for detailed setup.
77 
78## PyIceberg Integration
79 
80Create and populate Iceberg tables with PyIceberg, then query with R2 SQL.
81 
82```python
83from pyiceberg.catalog.rest import RestCatalog
84import pyarrow as pa
85import pandas as pd
86 
87# Setup catalog
88catalog = RestCatalog(
89    name="my_catalog",
90    warehouse="my-bucket",
91    uri="https://<account-id>.r2.cloudflarestorage.com/iceberg/my-bucket",
92    token="<your-token>",
93)
94catalog.create_namespace_if_not_exists("analytics")
95 
96# Create table
97schema = pa.schema([
98    pa.field("user_id", pa.string(), nullable=False),
99    pa.field("event_time", pa.timestamp("us", tz="UTC"), nullable=False),
100    pa.field("page_views", pa.int64(), nullable=False),
101])
102table = catalog.create_table(("analytics", "user_metrics"), schema=schema)
103 
104# Append data
105df = pd.DataFrame({
106    "user_id": ["user_1", "user_2"],
107    "event_time": pd.to_datetime(["2025-01-15 10:00:00", "2025-01-15 11:00:00"], utc=True),
108    "page_views": [10, 25],
109})
110table.append(pa.Table.from_pandas(df, schema=schema))
111```
112 
113Query with R2 SQL:
114```bash
115npx wrangler r2 sql query "my-bucket" "
116  SELECT user_id, SUM(page_views)
117  FROM analytics.user_metrics
118  WHERE event_time >= '2025-01-15T00:00:00Z'
119  GROUP BY user_id
120"
121```
122 
123See [r2-data-catalog/patterns.md](../r2-data-catalog/patterns.md) for advanced PyIceberg patterns.
124 
125## Use Cases
126 
127### Log Analytics
128```sql
129-- Error rate by endpoint
130SELECT path, COUNT(*), SUM(CASE WHEN status >= 400 THEN 1 ELSE 0 END) as errors
131FROM logs.http_requests
132WHERE timestamp BETWEEN '2025-01-01T00:00:00Z' AND '2025-01-31T23:59:59Z'
133GROUP BY path ORDER BY errors DESC LIMIT 20;
134 
135-- Response time stats
136SELECT method, MIN(response_time_ms), AVG(response_time_ms), MAX(response_time_ms)
137FROM logs.http_requests WHERE timestamp >= '2025-01-15T00:00:00Z' GROUP BY method;
138 
139-- Traffic by status
140SELECT status, COUNT(*) FROM logs.http_requests
141WHERE timestamp >= '2025-01-15T00:00:00Z' AND method = 'GET'
142GROUP BY status ORDER BY COUNT(*) DESC;
143```
144 
145### Fraud Detection
146```sql
147-- High-value transactions
148SELECT location, COUNT(*), SUM(amount), AVG(amount)
149FROM fraud.transactions WHERE transaction_timestamp >= '2025-01-01T00:00:00Z' AND amount > 1000.0
150GROUP BY location ORDER BY SUM(amount) DESC LIMIT 20;
151 
152-- Flagged transactions
153SELECT merchant_category, COUNT(*), AVG(amount) FROM fraud.transactions
154WHERE is_fraud_flag = true AND transaction_timestamp >= '2025-01-01T00:00:00Z'
155GROUP BY merchant_category HAVING COUNT(*) > 10 ORDER BY COUNT(*) DESC;
156```
157 
158### Business Intelligence
159```sql
160-- Sales by department
161SELECT department, SUM(revenue), AVG(revenue), COUNT(*) FROM sales.transactions
162WHERE sale_date >= '2024-01-01' GROUP BY department ORDER BY SUM(revenue) DESC LIMIT 10;
163 
164-- Product performance
165SELECT category, COUNT(DISTINCT product_id), SUM(units_sold), SUM(revenue)
166FROM sales.product_sales WHERE sale_date BETWEEN '2024-10-01' AND '2024-12-31'
167GROUP BY category ORDER BY SUM(revenue) DESC;
168```
169 
170## Connecting External Engines
171 
172R2 Data Catalog exposes Iceberg REST API. Connect Spark, Snowflake, Trino, DuckDB, etc.
173 
174```scala
175// Apache Spark example
176val spark = SparkSession.builder()
177  .config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog")
178  .config("spark.sql.catalog.my_catalog.catalog-impl", "org.apache.iceberg.rest.RESTCatalog")
179  .config("spark.sql.catalog.my_catalog.uri", "https://<account-id>.r2.cloudflarestorage.com/iceberg/my-bucket")
180  .config("spark.sql.catalog.my_catalog.token", "<token>")
181  .getOrCreate()
182 
183spark.sql("SELECT * FROM my_catalog.default.my_table LIMIT 10").show()
184```
185 
186See [r2-data-catalog/patterns.md](../r2-data-catalog/patterns.md) for more engines.
187 
188## Performance Optimization
189 
190### Partitioning
191- **Time-series:** day/hour on timestamp
192- **Geographic:** region/country
193- **Avoid:** High-cardinality keys (user_id)
194 
195```python
196from pyiceberg.partitioning import PartitionSpec, PartitionField
197from pyiceberg.transforms import DayTransform
198 
199PartitionSpec(PartitionField(source_id=1, field_id=1000, transform=DayTransform(), name="day"))
200```
201 
202### Query Optimization
203- **Always use LIMIT** for early termination
204- **Filter on partition keys first**
205- **Multiple filters** for better pruning
206 
207```sql
208-- Better: Multiple filters on partition key
209SELECT * FROM logs.requests 
210WHERE timestamp >= '2025-01-15T00:00:00Z' AND status = 404 AND method = 'GET' LIMIT 100;
211```
212 
213### File Organization
214- **Pipelines roll:** Dev 10-30s, Prod 300+s
215- **Target Parquet:** 100-500MB compressed
216 
217## See Also
218 
219- [api.md](api.md) - SQL syntax reference
220- [gotchas.md](gotchas.md) - Limitations and troubleshooting
221- [r2-data-catalog/patterns.md](../r2-data-catalog/patterns.md) - PyIceberg advanced patterns
222- [pipelines/patterns.md](../pipelines/patterns.md) - Streaming ingestion patterns
223
Preparing the source view

Cloudflare Platform Skill

references/r2-sql/patterns.md