Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
references/r2-data-catalog/README.md
1# Cloudflare R2 Data Catalog Skill Reference23Expert guidance for Cloudflare R2 Data Catalog - Apache Iceberg catalog built into R2 buckets.45## Reading Order67**New to R2 Data Catalog?** Start here:81. Read "What is R2 Data Catalog?" and "When to Use" below92. [configuration.md](configuration.md) - Enable catalog, create tokens103. [patterns.md](patterns.md) - PyIceberg setup and common patterns114. [api.md](api.md) - REST API reference as needed125. [gotchas.md](gotchas.md) - Troubleshooting when issues arise1314**Quick reference?** Jump to:15- [Enable catalog on bucket](configuration.md#enable-catalog-on-bucket)16- [PyIceberg connection pattern](patterns.md#pyiceberg-connection-pattern)17- [Permission errors](gotchas.md#permission-errors)1819## What is R2 Data Catalog?2021R2 Data Catalog is a **managed Apache Iceberg REST catalog** built directly into R2 buckets. It provides:2223- **Apache Iceberg tables** - ACID transactions, schema evolution, time-travel queries24- **Zero-egress costs** - Query from any cloud/region without data transfer fees25- **Standard REST API** - Works with Spark, PyIceberg, Snowflake, Trino, DuckDB26- **No infrastructure** - Fully managed, no catalog servers to run27- **Public beta** - Available to all R2 subscribers, no extra cost beyond R2 storage2829### What is Apache Iceberg?3031Open table format for analytics datasets in object storage. Features:32- **ACID transactions** - Safe concurrent reads/writes33- **Metadata optimization** - Fast queries without full scans34- **Schema evolution** - Add/rename/delete columns without rewrites35- **Time-travel** - Query historical snapshots36- **Partitioning** - Organize data for efficient queries3738## When to Use3940**Use R2 Data Catalog for:**41- **Log analytics** - Store and query application/system logs42- **Data lakes/warehouses** - Analytical datasets queried by multiple engines43- **BI pipelines** - Aggregate data for dashboards and reports44- **Multi-cloud analytics** - Share data across clouds without egress fees45- **Time-series data** - Event streams, metrics, sensor data4647**Don't use for:**48- **Transactional workloads** - Use D1 or external database instead49- **Sub-second latency** - Iceberg optimized for batch/analytical queries50- **Small datasets (<1GB)** - Setup overhead not worth it51- **Unstructured data** - Store files directly in R2, not as Iceberg tables5253## Architecture5455```56┌─────────────────────────────────────────────────┐57│ Query Engines │58│ (PyIceberg, Spark, Trino, Snowflake, DuckDB) │59└────────────────┬────────────────────────────────┘60│61│ REST API (OAuth2 token)62▼63┌─────────────────────────────────────────────────┐64│ R2 Data Catalog (Managed Iceberg REST Catalog)│65│ • Namespace/table metadata │66│ • Transaction coordination │67│ • Snapshot management │68└────────────────┬────────────────────────────────┘69│70│ Vended credentials71▼72┌─────────────────────────────────────────────────┐73│ R2 Bucket Storage │74│ • Parquet data files │75│ • Metadata files │76│ • Manifest files │77└─────────────────────────────────────────────────┘78```7980**Key concepts:**81- **Catalog URI** - REST endpoint for catalog operations (e.g., `https://<account-id>.r2.cloudflarestorage.com/iceberg/<bucket>`)82- **Warehouse** - Logical grouping of tables (typically same as bucket name)83- **Namespace** - Schema/database containing tables (e.g., `logs`, `analytics`)84- **Table** - Iceberg table with schema, data files, snapshots85- **Vended credentials** - Temporary S3 credentials catalog provides for data access8687## Limits8889| Resource | Limit | Notes |90|----------|-------|-------|91| Namespaces per catalog | No hard limit | Organize tables logically |92| Tables per namespace | <10,000 recommended | Performance degrades beyond this |93| Files per table | <100,000 recommended | Run compaction regularly |94| Snapshots per table | Configurable retention | Expire >7 days old |95| Partitions per table | 100-1,000 optimal | Too many = slow metadata ops |96| Table size | Same as R2 bucket | 10GB-10TB+ common |97| API rate limits | Standard R2 API limits | Shared with R2 storage operations |98| Target file size | 128-512 MB | After compaction |99100## Current Status101102**Public Beta** (as of Jan 2026)103- Available to all R2 subscribers104- No extra cost beyond standard R2 storage/operations105- Production-ready, but breaking changes possible106- Supports: namespaces, tables, snapshots, compaction, time-travel, table maintenance107108## Decision Tree: Is R2 Data Catalog Right For You?109110```111Start → Need analytics on object storage data?112│113├─ No → Use R2 directly for object storage114│115└─ Yes → Dataset >1GB with structured schema?116│117├─ No → Too small, use R2 + ad-hoc queries118│119└─ Yes → Need ACID transactions or schema evolution?120│121├─ No → Consider simpler solutions (Parquet on R2)122│123└─ Yes → Need multi-cloud/multi-tool access?124│125├─ No → D1 or external DB may be simpler126│127└─ Yes → ✅ Use R2 Data Catalog128```129130**Quick check:** If you answer "yes" to all:131- Dataset >1GB and growing132- Structured/tabular data (logs, events, metrics)133- Multiple query tools or cloud environments134- Need versioning, schema changes, or concurrent access135136→ R2 Data Catalog is a good fit.137138## In This Reference139140- **[configuration.md](configuration.md)** - Enable catalog, create API tokens, connect clients141- **[api.md](api.md)** - REST endpoints, operations, maintenance142- **[patterns.md](patterns.md)** - PyIceberg examples, common use cases143- **[gotchas.md](gotchas.md)** - Troubleshooting, best practices, limitations144145## See Also146147- [Cloudflare R2 Data Catalog Docs](https://developers.cloudflare.com/r2/data-catalog/)148- [Apache Iceberg Docs](https://iceberg.apache.org/)149- [PyIceberg Docs](https://py.iceberg.apache.org/)150