Source from repo

Cloudflare Platform Skill

Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.

cloudflareGitHub cloudflareSource repo Original GitHub link Publisher page

Files

320

Skill

n/a

Size

1.3 MB

Entrypoint

SKILL.md

Format

git-repo

Open file

references/r2-data-catalog/gotchas.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown56 linesFree

references/r2-data-catalog/gotchas.md

1# R2 Data Catalog Gotchas
2 
3Common failure modes and operational behavior. For limits, recommendations, and supported settings, pull `https://developers.cloudflare.com/r2/data-catalog/` and `.../table-maintenance/`.
4 
5## Connection / Auth
6 
7- **Catalog URI / warehouse mismatch (most common).** Copy both values exactly from `wrangler r2 bucket catalog enable` (Catalog URI `https://catalog.cloudflarestorage.com/{ACCOUNT_ID}/{BUCKET}`, warehouse `{ACCOUNT_ID}_{BUCKET}`). Mismatched values fail to connect.
8- **401 Unauthorized** — token lacks Data Catalog R&W. Test with `catalog.list_namespaces()`.
9- **403 on data files** — token lacks R2 Storage. Open beta requires **Admin Read & Write on R2 Storage even for read-only** data access.
10- **`/config` "Warehouse name missing in query param"** — the Iceberg `/v1/config` route needs `?warehouse={ACCOUNT_ID}_{BUCKET}`. PyIceberg/PySpark add it automatically when you set `warehouse=`.
11 
12## Maintenance Behavior (updated)
13 
14- **No throughput cap on compaction.** The former 2 GB/hour/table limit is **lifted** — compaction triggers hourly and processes the backlog with no hard cap. Large small-file backlogs still take multiple hourly cycles.
15- **Snapshot expiration deletes data files** (since April 2026), not just metadata. Manual `remove_orphan_files` is rarely needed.
16- **Compaction requires a stored credential.** `wrangler ... compaction enable` and the dashboard wizard store it automatically; pure-API setups must POST `/credential`.
17- Compaction is **Parquet-only**.
18 
19## Tables & Schema
20 
21- `TableAlreadyExistsError` / `NamespaceAlreadyExistsError` → use `create_*_if_not_exists` / load existing.
22- `422 Validation` on schema update → only add nullable columns and widen types (int→long, float→double).
23- `TypeError: Cannot cast` on append → PyArrow type ≠ Iceberg schema; cast to int64 (Iceberg default); check `table.schema()`.
24 
25## Concurrency
26 
27- `CommitFailedException` → optimistic-locking conflict; retry with backoff (see [patterns.md](patterns.md#concurrent-writes-with-retry-pyiceberg)).
28- Stale metadata after external writes → reload: `table = catalog.load_table(("ns","tbl"))`.
29 
30## PySpark / Iceberg
31 
32| Issue | Fix |
33|-------|-----|
34| Catalog auth fails | Add header `X-Iceberg-Access-Delegation: vended-credentials` |
35| `NoAuthWithAWSException` on orphan removal | Supply S3 access/secret keys (vended creds don't work here) |
36| Version mismatch | Use Iceberg `1.6.1` |
37| Slow first run (~30–60s) | JAR download; cached after |
38| Remote signing errors | Set `s3.remote-signing-enabled=false` |
39 
40## Nested Namespaces
41 
42Control-plane URL separator for nested namespaces is **`%1F`** (Unit Separator), not `/` or `.`: `/namespaces/parent%1Fchild/tables`.
43 
44## Debug Checklist
45 
461. `npx wrangler r2 bucket catalog status <bucket>` — enabled?
472. Token has R2 Storage (Admin R&W) + R2 Data Catalog (R&W)?
483. `catalog.list_namespaces()` succeeds?
494. Catalog URI = `catalog.cloudflarestorage.com/{ACCOUNT_ID}/{BUCKET}`, warehouse = `{ACCOUNT_ID}_{BUCKET}`?
505. Namespace created before `create_table`?
516. Compaction enabled + `credential_status: present`?
52 
53## See Also
54 
55- [configuration.md](configuration.md) · [api.md](api.md) · [patterns.md](patterns.md)
56

Cloudflare Platform Skill

references/r2-data-catalog/gotchas.md

Preparing the source view

Cloudflare Platform Skill

references/r2-data-catalog/gotchas.md