Source from repo
Cloudflare Platform Skill

Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
cloudflareGitHub cloudflareSource repo Original GitHub link Publisher page
Files
321
Skill
n/a
Size
1.4 MB
Entrypoint
SKILL.md
Format
git-repo
Open file
references/r2-sql/gotchas.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown213 linesFree
references/r2-sql/gotchas.md
1# R2 SQL Gotchas
2 
3Limitations, troubleshooting, and common pitfalls for R2 SQL.
4 
5## Critical Limitations
6 
7### No Workers Binding
8 
9**Cannot call R2 SQL from Workers/Pages code** - no binding exists.
10 
11```typescript
12// ❌ This doesn't exist
13export default {
14  async fetch(request, env) {
15    const result = await env.R2_SQL.query("SELECT * FROM table");  // Not possible
16    return Response.json(result);
17  }
18};
19```
20 
21**Solutions:**
22- HTTP API from external systems (not Workers)
23- PyIceberg/Spark via r2-data-catalog REST API
24- For Workers, use D1 or external databases
25 
26### ORDER BY Limitations
27 
28Can only order by:
291. **Partition key columns** - Always supported
302. **Aggregation functions** - Supported via shuffle strategy
31 
32**Cannot order by** regular non-partition columns.
33 
34```sql
35-- ✅ Valid: ORDER BY partition key
36SELECT * FROM logs.requests ORDER BY timestamp DESC LIMIT 100;
37 
38-- ✅ Valid: ORDER BY aggregation
39SELECT region, SUM(amount) FROM sales.transactions
40GROUP BY region ORDER BY SUM(amount) DESC;
41 
42-- ❌ Invalid: ORDER BY non-partition column
43SELECT * FROM logs.requests ORDER BY user_id;
44 
45-- ❌ Invalid: ORDER BY alias (must repeat function)
46SELECT region, SUM(amount) as total FROM sales.transactions
47GROUP BY region ORDER BY total;  -- Use ORDER BY SUM(amount)
48```
49 
50Check partition spec: `DESCRIBE namespace.table_name`
51 
52## SQL Feature Limitations
53 
54| Feature | Supported | Notes |
55|---------|-----------|-------|
56| SELECT, WHERE, GROUP BY, HAVING | ✅ | Standard support |
57| COUNT, SUM, AVG, MIN, MAX | ✅ | Standard aggregations |
58| ORDER BY partition/aggregation | ✅ | See above |
59| LIMIT | ✅ | Max 10,000 |
60| Column aliases | ❌ | No AS alias |
61| Expressions in SELECT | ❌ | No col1 + col2 |
62| ORDER BY non-partition | ❌ | Fails at runtime |
63| JOINs, subqueries, CTEs | ❌ | Denormalize at write time |
64| Window functions, UNION | ❌ | Use external engines |
65| INSERT/UPDATE/DELETE | ❌ | Use PyIceberg/Pipelines |
66| Nested columns, arrays, JSON | ❌ | Flatten at write time |
67 
68**Workarounds:**
69- No JOINs: Denormalize data or use Spark/PyIceberg
70- No subqueries: Split into multiple queries
71- No aliases: Accept generated names, transform in app
72 
73## Common Errors
74 
75### "Column not found"
76**Cause:** Typo, column doesn't exist, or case mismatch  
77**Solution:** `DESCRIBE namespace.table_name` to check schema
78 
79### "Type mismatch"
80```sql
81-- ❌ Wrong types
82WHERE status = '200'              -- string instead of integer
83WHERE timestamp > '2025-01-01'    -- missing time/timezone
84 
85-- ✅ Correct types
86WHERE status = 200
87WHERE timestamp > '2025-01-01T00:00:00Z'
88```
89 
90### "ORDER BY column not in partition key"
91**Cause:** Ordering by non-partition column  
92**Solution:** Use partition key, aggregation, or remove ORDER BY. Check: `DESCRIBE table`
93 
94### "Token authentication failed"
95```bash
96# Check/set token
97echo $WRANGLER_R2_SQL_AUTH_TOKEN
98export WRANGLER_R2_SQL_AUTH_TOKEN=<your-token>
99 
100# Or .env file
101echo "WRANGLER_R2_SQL_AUTH_TOKEN=<your-token>" > .env
102```
103 
104### "Table not found"
105```sql
106-- Verify catalog and tables
107SHOW DATABASES;
108SHOW TABLES IN namespace_name;
109```
110 
111Enable catalog: `npx wrangler r2 bucket catalog enable <bucket>`
112 
113### "LIMIT exceeds maximum"
114Max LIMIT is 10,000. For pagination, use WHERE filters with partition keys.
115 
116### "No data returned" (unexpected)
117**Debug steps:**
1181. `SELECT COUNT(*) FROM table` - verify data exists
1192. Remove WHERE filters incrementally
1203. `SELECT * FROM table LIMIT 10` - inspect actual data/types
121 
122## Performance Issues
123 
124### Slow Queries
125 
126**Causes:** Too many partitions, large LIMIT, no filters, small files
127 
128```sql
129-- ❌ Slow: No filters
130SELECT * FROM logs.requests LIMIT 10000;
131 
132-- ✅ Fast: Filter on partition key
133SELECT * FROM logs.requests 
134WHERE timestamp >= '2025-01-15T00:00:00Z' AND timestamp < '2025-01-16T00:00:00Z'
135LIMIT 1000;
136 
137-- ✅ Faster: Multiple filters
138SELECT * FROM logs.requests 
139WHERE timestamp >= '2025-01-15T00:00:00Z' AND status = 404 AND method = 'GET'
140LIMIT 1000;
141```
142 
143**File optimization:**
144- Target Parquet size: 100-500MB compressed
145- Pipelines roll interval: 300+ sec (prod), 10 sec (dev)
146- Run compaction to merge small files
147 
148### Query Timeout
149 
150**Solution:** Add restrictive WHERE filters, reduce time range, query smaller intervals
151 
152```sql
153-- ❌ Times out: Year-long aggregation
154SELECT status, COUNT(*) FROM logs.requests 
155WHERE timestamp >= '2024-01-01T00:00:00Z' GROUP BY status;
156 
157-- ✅ Faster: Month-long aggregation
158SELECT status, COUNT(*) FROM logs.requests 
159WHERE timestamp >= '2025-01-01T00:00:00Z' AND timestamp < '2025-02-01T00:00:00Z'
160GROUP BY status;
161```
162 
163## Best Practices
164 
165### Partitioning
166- **Time-series:** Partition by day/hour on timestamp
167- **Avoid:** High-cardinality keys (user_id), >10,000 partitions
168 
169```python
170from pyiceberg.partitioning import PartitionSpec, PartitionField
171from pyiceberg.transforms import DayTransform
172 
173PartitionSpec(PartitionField(source_id=1, field_id=1000, transform=DayTransform(), name="day"))
174```
175 
176### Query Writing
177- **Always use LIMIT** for early termination
178- **Filter on partition keys first** for pruning
179- **Combine filters with AND** for more pruning
180 
181```sql
182-- Good
183WHERE timestamp >= '2025-01-15T00:00:00Z' AND status = 404 AND method = 'GET' LIMIT 100
184```
185 
186### Type Safety
187- Quote strings: `'GET'` not `GET`
188- RFC3339 timestamps: `'2025-01-01T00:00:00Z'` not `'2025-01-01'`
189- ISO dates: `'2025-01-15'` not `'01/15/2025'`
190 
191### Data Organization
192- **Pipelines:** Dev `roll_file_time: 10`, Prod `roll_file_time: 300+`
193- **Compression:** Use `zstd`
194- **Maintenance:** Compaction for small files, expire old snapshots
195 
196## Debugging Checklist
197 
1981. `npx wrangler r2 bucket catalog enable <bucket>` - Verify catalog
1992. `echo $WRANGLER_R2_SQL_AUTH_TOKEN` - Check token
2003. `SHOW DATABASES` - List namespaces
2014. `SHOW TABLES IN namespace` - List tables
2025. `DESCRIBE namespace.table` - Check schema
2036. `SELECT COUNT(*) FROM namespace.table` - Verify data
2047. `SELECT * FROM namespace.table LIMIT 10` - Test simple query
2058. Add filters incrementally
206 
207## See Also
208 
209- [api.md](api.md) - SQL syntax
210- [patterns.md](patterns.md) - Query optimization
211- [configuration.md](configuration.md) - Setup
212- [Cloudflare R2 SQL Docs](https://developers.cloudflare.com/r2-sql/)
213
Preparing the source view

Cloudflare Platform Skill

references/r2-sql/gotchas.md