Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
references/r2-data-catalog/gotchas.md
1# R2 Data Catalog Gotchas23Common failure modes and operational behavior. For limits, recommendations, and supported settings, pull `https://developers.cloudflare.com/r2/data-catalog/` and `.../table-maintenance/`.45## Connection / Auth67- **Catalog URI / warehouse mismatch (most common).** Copy both values exactly from `wrangler r2 bucket catalog enable` (Catalog URI `https://catalog.cloudflarestorage.com/{ACCOUNT_ID}/{BUCKET}`, warehouse `{ACCOUNT_ID}_{BUCKET}`). Mismatched values fail to connect.8- **401 Unauthorized** — token lacks Data Catalog R&W. Test with `catalog.list_namespaces()`.9- **403 on data files** — token lacks R2 Storage. Open beta requires **Admin Read & Write on R2 Storage even for read-only** data access.10- **`/config` "Warehouse name missing in query param"** — the Iceberg `/v1/config` route needs `?warehouse={ACCOUNT_ID}_{BUCKET}`. PyIceberg/PySpark add it automatically when you set `warehouse=`.1112## Maintenance Behavior (updated)1314- **No throughput cap on compaction.** The former 2 GB/hour/table limit is **lifted** — compaction triggers hourly and processes the backlog with no hard cap. Large small-file backlogs still take multiple hourly cycles.15- **Snapshot expiration deletes data files** (since April 2026), not just metadata. Manual `remove_orphan_files` is rarely needed.16- **Compaction requires a stored credential.** `wrangler ... compaction enable` and the dashboard wizard store it automatically; pure-API setups must POST `/credential`.17- Compaction is **Parquet-only**.1819## Tables & Schema2021- `TableAlreadyExistsError` / `NamespaceAlreadyExistsError` → use `create_*_if_not_exists` / load existing.22- `422 Validation` on schema update → only add nullable columns and widen types (int→long, float→double).23- `TypeError: Cannot cast` on append → PyArrow type ≠Iceberg schema; cast to int64 (Iceberg default); check `table.schema()`.2425## Concurrency2627- `CommitFailedException` → optimistic-locking conflict; retry with backoff (see [patterns.md](patterns.md#concurrent-writes-with-retry-pyiceberg)).28- Stale metadata after external writes → reload: `table = catalog.load_table(("ns","tbl"))`.2930## PySpark / Iceberg3132| Issue | Fix |33|-------|-----|34| Catalog auth fails | Add header `X-Iceberg-Access-Delegation: vended-credentials` |35| `NoAuthWithAWSException` on orphan removal | Supply S3 access/secret keys (vended creds don't work here) |36| Version mismatch | Use Iceberg `1.6.1` |37| Slow first run (~30–60s) | JAR download; cached after |38| Remote signing errors | Set `s3.remote-signing-enabled=false` |3940## Nested Namespaces4142Control-plane URL separator for nested namespaces is **`%1F`** (Unit Separator), not `/` or `.`: `/namespaces/parent%1Fchild/tables`.4344## Debug Checklist45461. `npx wrangler r2 bucket catalog status <bucket>` — enabled?472. Token has R2 Storage (Admin R&W) + R2 Data Catalog (R&W)?483. `catalog.list_namespaces()` succeeds?494. Catalog URI = `catalog.cloudflarestorage.com/{ACCOUNT_ID}/{BUCKET}`, warehouse = `{ACCOUNT_ID}_{BUCKET}`?505. Namespace created before `create_table`?516. Compaction enabled + `credential_status: present`?5253## See Also5455- [configuration.md](configuration.md) · [api.md](api.md) · [patterns.md](patterns.md)56