Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
references/r2-data-catalog/gotchas.md
1# Gotchas & Troubleshooting23Common problems → causes → solutions.45## Permission Errors67### 401 Unauthorized89**Error:** `"401 Unauthorized"`10**Cause:** Token missing R2 Data Catalog permissions.11**Solution:** Use "Admin Read & Write" token (includes catalog + storage permissions). Test with `catalog.list_namespaces()`.1213### 403 Forbidden1415**Error:** `"403 Forbidden"` on data files16**Cause:** Token lacks storage permissions.17**Solution:** Token needs both R2 Data Catalog + R2 Storage Bucket Item permissions.1819### Token Rotation Issues2021**Error:** New token fails after rotation.22**Solution:** Create new token → test in staging → update prod → monitor 24h → revoke old.2324## Catalog URI Issues2526### 404 Not Found2728**Error:** `"404 Catalog not found"`29**Cause:** Catalog not enabled or wrong URI.30**Solution:** Run `wrangler r2 bucket catalog enable <bucket>`. URI must be HTTPS with `/iceberg/` and case-sensitive bucket name.3132### Wrong Warehouse3334**Error:** Cannot create/load tables.35**Cause:** Warehouse ≠ bucket name.36**Solution:** Set `warehouse="bucket-name"` to match bucket exactly.3738## Table and Schema Issues3940### Table/Namespace Already Exists4142**Error:** `"TableAlreadyExistsError"`43**Solution:** Use try/except to load existing or check first.4445### Namespace Not Found4647**Error:** Cannot create table.48**Solution:** Create namespace first: `catalog.create_namespace("ns")`4950### Schema Evolution Errors5152**Error:** `"422 Validation"` on schema update.53**Cause:** Incompatible change (required field, type shrink).54**Solution:** Only add nullable columns, compatible type widening (int→long, float→double).5556## Data and Query Issues5758### Empty Scan Results5960**Error:** Scan returns no data.61**Cause:** Incorrect filter or partition column.62**Solution:** Test without filter first: `table.scan().to_pandas()`. Verify partition column names.6364### Slow Queries6566**Error:** Performance degrades over time.67**Cause:** Too many small files.68**Solution:** Check file count, compact if >1000 or avg <10MB. See [api.md](api.md#compaction).6970### Type Mismatch7172**Error:** `"Cannot cast"` on append.73**Cause:** PyArrow types don't match Iceberg schema.74**Solution:** Cast to int64 (Iceberg default), not int32. Check `table.schema()`.7576## Compaction Issues7778### Compaction Issues7980**Problem:** File count unchanged or compaction takes hours.81**Cause:** Target size too large, or table too big for PyIceberg.82**Solution:** Only compact if avg <50MB. For >1TB tables, use Spark. Run during low-traffic periods.8384## Maintenance Issues8586### Snapshot/Orphan Issues8788**Problem:** Expiration fails or orphan cleanup deletes active data.89**Cause:** Too aggressive retention or wrong order.90**Solution:** Always expire snapshots first with `retain_last=10`, then cleanup orphans with 3+ day threshold.9192## Concurrency Issues9394### Concurrent Write Conflicts9596**Problem:** `CommitFailedException` with multiple writers.97**Cause:** Optimistic locking - simultaneous commits.98**Solution:** Add retry with exponential backoff (see [patterns.md](patterns.md#pattern-6-concurrent-writes-with-retry)).99100### Stale Metadata101102**Problem:** Old schema/data after external update.103**Cause:** Cached metadata.104**Solution:** Reload table: `table = catalog.load_table(("ns", "table"))`105106## Performance Optimization107108### Performance Tips109110**Scans:** Use `row_filter` and `selected_fields` to reduce data scanned.111**Partitions:** 100-1000 optimal. Avoid high cardinality (millions) or low (<10).112**Files:** Keep 100-500MB avg. Compact if <10MB or >10k files.113114## Limits115116| Resource | Recommended | Impact if Exceeded |117|----------|-------------|-------------------|118| Tables/namespace | <10k | Slow list ops |119| Files/table | <100k | Slow query planning |120| Partitions/table | 100-1k | Metadata overhead |121| Snapshots/table | Expire >7d | Metadata bloat |122123## Common Error Messages Reference124125| Error Message | Likely Cause | Fix |126|---------------|--------------|-----|127| `401 Unauthorized` | Missing/invalid token | Check token has catalog+storage permissions |128| `403 Forbidden` | Token lacks storage permissions | Add R2 Storage Bucket Item permission |129| `404 Not Found` | Catalog not enabled or wrong URI | Run `wrangler r2 bucket catalog enable` |130| `409 Conflict` | Table/namespace already exists | Use try/except or load existing |131| `422 Unprocessable Entity` | Schema validation failed | Check type compatibility, required fields |132| `CommitFailedException` | Concurrent write conflict | Add retry logic with backoff |133| `NamespaceAlreadyExistsError` | Namespace exists | Use try/except or load existing |134| `NoSuchTableError` | Table doesn't exist | Check namespace+table name, create first |135| `TypeError: Cannot cast` | PyArrow type mismatch | Cast data to match Iceberg schema |136137## Debugging Checklist138139When things go wrong, check in order:1401411. ✅ **Catalog enabled:** `npx wrangler r2 bucket catalog status <bucket>`1422. ✅ **Token permissions:** Both R2 Data Catalog + R2 Storage in dashboard1433. ✅ **Connection test:** `catalog.list_namespaces()` succeeds1444. ✅ **URI format:** HTTPS, includes `/iceberg/`, correct bucket name1455. ✅ **Warehouse name:** Matches bucket name exactly1466. ✅ **Namespace exists:** Create before `create_table()`1477. ✅ **Enable debug logging:** `logging.basicConfig(level=logging.DEBUG)`1488. ✅ **PyIceberg version:** `pip install --upgrade pyiceberg` (≥0.5.0)1499. ✅ **File health:** Compact if >1000 files or avg <10MB15010. ✅ **Snapshot count:** Expire if >100 snapshots151152## Enable Debug Logging153154```python155import logging156logging.basicConfig(level=logging.DEBUG)157# Now operations show HTTP requests/responses158```159160## Resources161162- [Cloudflare Community](https://community.cloudflare.com/c/developers/workers/40)163- [Cloudflare Discord](https://discord.cloudflare.com) - #r2 channel164- [PyIceberg GitHub](https://github.com/apache/iceberg-python/issues)165- [Apache Iceberg Slack](https://iceberg.apache.org/community/)166167## Next Steps168169- [patterns.md](patterns.md) - Working examples170- [api.md](api.md) - API reference171