Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
references/r2-sql/gotchas.md
1# R2 SQL Gotchas23Limitations, troubleshooting, and common pitfalls for R2 SQL.45## Critical Limitations67### No Workers Binding89**Cannot call R2 SQL from Workers/Pages code** - no binding exists.1011```typescript12// ❌ This doesn't exist13export default {14async fetch(request, env) {15const result = await env.R2_SQL.query("SELECT * FROM table"); // Not possible16return Response.json(result);17}18};19```2021**Solutions:**22- HTTP API from external systems (not Workers)23- PyIceberg/Spark via r2-data-catalog REST API24- For Workers, use D1 or external databases2526### ORDER BY Limitations2728Can only order by:291. **Partition key columns** - Always supported302. **Aggregation functions** - Supported via shuffle strategy3132**Cannot order by** regular non-partition columns.3334```sql35-- ✅ Valid: ORDER BY partition key36SELECT * FROM logs.requests ORDER BY timestamp DESC LIMIT 100;3738-- ✅ Valid: ORDER BY aggregation39SELECT region, SUM(amount) FROM sales.transactions40GROUP BY region ORDER BY SUM(amount) DESC;4142-- ❌ Invalid: ORDER BY non-partition column43SELECT * FROM logs.requests ORDER BY user_id;4445-- ❌ Invalid: ORDER BY alias (must repeat function)46SELECT region, SUM(amount) as total FROM sales.transactions47GROUP BY region ORDER BY total; -- Use ORDER BY SUM(amount)48```4950Check partition spec: `DESCRIBE namespace.table_name`5152## SQL Feature Limitations5354| Feature | Supported | Notes |55|---------|-----------|-------|56| SELECT, WHERE, GROUP BY, HAVING | ✅ | Standard support |57| COUNT, SUM, AVG, MIN, MAX | ✅ | Standard aggregations |58| ORDER BY partition/aggregation | ✅ | See above |59| LIMIT | ✅ | Max 10,000 |60| Column aliases | ❌ | No AS alias |61| Expressions in SELECT | ❌ | No col1 + col2 |62| ORDER BY non-partition | ❌ | Fails at runtime |63| JOINs, subqueries, CTEs | ❌ | Denormalize at write time |64| Window functions, UNION | ❌ | Use external engines |65| INSERT/UPDATE/DELETE | ❌ | Use PyIceberg/Pipelines |66| Nested columns, arrays, JSON | ❌ | Flatten at write time |6768**Workarounds:**69- No JOINs: Denormalize data or use Spark/PyIceberg70- No subqueries: Split into multiple queries71- No aliases: Accept generated names, transform in app7273## Common Errors7475### "Column not found"76**Cause:** Typo, column doesn't exist, or case mismatch77**Solution:** `DESCRIBE namespace.table_name` to check schema7879### "Type mismatch"80```sql81-- ❌ Wrong types82WHERE status = '200' -- string instead of integer83WHERE timestamp > '2025-01-01' -- missing time/timezone8485-- ✅ Correct types86WHERE status = 20087WHERE timestamp > '2025-01-01T00:00:00Z'88```8990### "ORDER BY column not in partition key"91**Cause:** Ordering by non-partition column92**Solution:** Use partition key, aggregation, or remove ORDER BY. Check: `DESCRIBE table`9394### "Token authentication failed"95```bash96# Check/set token97echo $WRANGLER_R2_SQL_AUTH_TOKEN98export WRANGLER_R2_SQL_AUTH_TOKEN=<your-token>99100# Or .env file101echo "WRANGLER_R2_SQL_AUTH_TOKEN=<your-token>" > .env102```103104### "Table not found"105```sql106-- Verify catalog and tables107SHOW DATABASES;108SHOW TABLES IN namespace_name;109```110111Enable catalog: `npx wrangler r2 bucket catalog enable <bucket>`112113### "LIMIT exceeds maximum"114Max LIMIT is 10,000. For pagination, use WHERE filters with partition keys.115116### "No data returned" (unexpected)117**Debug steps:**1181. `SELECT COUNT(*) FROM table` - verify data exists1192. Remove WHERE filters incrementally1203. `SELECT * FROM table LIMIT 10` - inspect actual data/types121122## Performance Issues123124### Slow Queries125126**Causes:** Too many partitions, large LIMIT, no filters, small files127128```sql129-- ❌ Slow: No filters130SELECT * FROM logs.requests LIMIT 10000;131132-- ✅ Fast: Filter on partition key133SELECT * FROM logs.requests134WHERE timestamp >= '2025-01-15T00:00:00Z' AND timestamp < '2025-01-16T00:00:00Z'135LIMIT 1000;136137-- ✅ Faster: Multiple filters138SELECT * FROM logs.requests139WHERE timestamp >= '2025-01-15T00:00:00Z' AND status = 404 AND method = 'GET'140LIMIT 1000;141```142143**File optimization:**144- Target Parquet size: 100-500MB compressed145- Pipelines roll interval: 300+ sec (prod), 10 sec (dev)146- Run compaction to merge small files147148### Query Timeout149150**Solution:** Add restrictive WHERE filters, reduce time range, query smaller intervals151152```sql153-- ❌ Times out: Year-long aggregation154SELECT status, COUNT(*) FROM logs.requests155WHERE timestamp >= '2024-01-01T00:00:00Z' GROUP BY status;156157-- ✅ Faster: Month-long aggregation158SELECT status, COUNT(*) FROM logs.requests159WHERE timestamp >= '2025-01-01T00:00:00Z' AND timestamp < '2025-02-01T00:00:00Z'160GROUP BY status;161```162163## Best Practices164165### Partitioning166- **Time-series:** Partition by day/hour on timestamp167- **Avoid:** High-cardinality keys (user_id), >10,000 partitions168169```python170from pyiceberg.partitioning import PartitionSpec, PartitionField171from pyiceberg.transforms import DayTransform172173PartitionSpec(PartitionField(source_id=1, field_id=1000, transform=DayTransform(), name="day"))174```175176### Query Writing177- **Always use LIMIT** for early termination178- **Filter on partition keys first** for pruning179- **Combine filters with AND** for more pruning180181```sql182-- Good183WHERE timestamp >= '2025-01-15T00:00:00Z' AND status = 404 AND method = 'GET' LIMIT 100184```185186### Type Safety187- Quote strings: `'GET'` not `GET`188- RFC3339 timestamps: `'2025-01-01T00:00:00Z'` not `'2025-01-01'`189- ISO dates: `'2025-01-15'` not `'01/15/2025'`190191### Data Organization192- **Pipelines:** Dev `roll_file_time: 10`, Prod `roll_file_time: 300+`193- **Compression:** Use `zstd`194- **Maintenance:** Compaction for small files, expire old snapshots195196## Debugging Checklist1971981. `npx wrangler r2 bucket catalog enable <bucket>` - Verify catalog1992. `echo $WRANGLER_R2_SQL_AUTH_TOKEN` - Check token2003. `SHOW DATABASES` - List namespaces2014. `SHOW TABLES IN namespace` - List tables2025. `DESCRIBE namespace.table` - Check schema2036. `SELECT COUNT(*) FROM namespace.table` - Verify data2047. `SELECT * FROM namespace.table LIMIT 10` - Test simple query2058. Add filters incrementally206207## See Also208209- [api.md](api.md) - SQL syntax210- [patterns.md](patterns.md) - Query optimization211- [configuration.md](configuration.md) - Setup212- [Cloudflare R2 SQL Docs](https://developers.cloudflare.com/r2-sql/)213