Source from repo
Cloudflare Platform Skill

Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
cloudflareGitHub cloudflareSource repo Original GitHub link Publisher page
Files
321
Skill
n/a
Size
1.4 MB
Entrypoint
SKILL.md
Format
git-repo
Open file
references/r2-data-catalog/api.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown200 linesFree
references/r2-data-catalog/api.md
1# API Reference
2 
3R2 Data Catalog exposes standard [Apache Iceberg REST Catalog API](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml).
4 
5## Quick Reference
6 
7**Most common operations:**
8 
9| Task | PyIceberg Code |
10|------|----------------|
11| Connect | `RestCatalog(name="r2", warehouse=bucket, uri=uri, token=token)` |
12| List namespaces | `catalog.list_namespaces()` |
13| Create namespace | `catalog.create_namespace("logs")` |
14| Create table | `catalog.create_table(("ns", "table"), schema=schema)` |
15| Load table | `catalog.load_table(("ns", "table"))` |
16| Append data | `table.append(pyarrow_table)` |
17| Query data | `table.scan().to_pandas()` |
18| Compact files | `table.rewrite_data_files(target_file_size_bytes=128*1024*1024)` |
19| Expire snapshots | `table.expire_snapshots(older_than=timestamp_ms, retain_last=10)` |
20 
21## REST Endpoints
22 
23Base: `https://<account-id>.r2.cloudflarestorage.com/iceberg/<bucket-name>`
24 
25| Operation | Method | Path |
26|-----------|--------|------|
27| Catalog config | GET | `/v1/config` |
28| List namespaces | GET | `/v1/namespaces` |
29| Create namespace | POST | `/v1/namespaces` |
30| Delete namespace | DELETE | `/v1/namespaces/{ns}` |
31| List tables | GET | `/v1/namespaces/{ns}/tables` |
32| Create table | POST | `/v1/namespaces/{ns}/tables` |
33| Load table | GET | `/v1/namespaces/{ns}/tables/{table}` |
34| Update table | POST | `/v1/namespaces/{ns}/tables/{table}` |
35| Delete table | DELETE | `/v1/namespaces/{ns}/tables/{table}` |
36| Rename table | POST | `/v1/tables/rename` |
37 
38**Authentication:** Bearer token in header: `Authorization: Bearer <token>`
39 
40## PyIceberg Client API
41 
42Most users use PyIceberg, not raw REST.
43 
44### Connection
45 
46```python
47from pyiceberg.catalog.rest import RestCatalog
48 
49catalog = RestCatalog(
50    name="my_catalog",
51    warehouse="<bucket-name>",
52    uri="<catalog-uri>",
53    token="<api-token>",
54)
55```
56 
57### Namespace Operations
58 
59```python
60from pyiceberg.exceptions import NamespaceAlreadyExistsError
61 
62namespaces = catalog.list_namespaces()  # [('default',), ('logs',)]
63catalog.create_namespace("logs", properties={"owner": "team"})
64catalog.drop_namespace("logs")  # Must be empty
65```
66 
67### Table Operations
68 
69```python
70from pyiceberg.schema import Schema
71from pyiceberg.types import NestedField, StringType, IntegerType
72 
73schema = Schema(
74    NestedField(1, "id", IntegerType(), required=True),
75    NestedField(2, "name", StringType(), required=False),
76)
77table = catalog.create_table(("logs", "app_logs"), schema=schema)
78tables = catalog.list_tables("logs")
79table = catalog.load_table(("logs", "app_logs"))
80catalog.rename_table(("logs", "old"), ("logs", "new"))
81```
82 
83### Data Operations
84 
85```python
86import pyarrow as pa
87 
88data = pa.table({"id": [1, 2], "name": ["Alice", "Bob"]})
89table.append(data)
90table.overwrite(data)
91 
92# Read with filters
93scan = table.scan(row_filter="id > 100", selected_fields=["id", "name"])
94df = scan.to_pandas()
95```
96 
97### Schema Evolution
98 
99```python
100from pyiceberg.types import IntegerType, LongType
101 
102with table.update_schema() as update:
103    update.add_column("user_id", IntegerType(), doc="User ID")
104    update.rename_column("msg", "message")
105    update.delete_column("old_field")
106    update.update_column("id", field_type=LongType())  # int→long only
107```
108 
109### Time-Travel
110 
111```python
112from datetime import datetime, timedelta
113 
114# Query specific snapshot or timestamp
115scan = table.scan(snapshot_id=table.snapshots()[-2].snapshot_id)
116yesterday_ms = int((datetime.now() - timedelta(days=1)).timestamp() * 1000)
117scan = table.scan(as_of_timestamp=yesterday_ms)
118```
119 
120### Partitioning
121 
122```python
123from pyiceberg.partitioning import PartitionSpec, PartitionField
124from pyiceberg.transforms import DayTransform
125from pyiceberg.types import TimestampType
126 
127partition_spec = PartitionSpec(
128    PartitionField(source_id=1, field_id=1000, transform=DayTransform(), name="day")
129)
130table = catalog.create_table(("events", "actions"), schema=schema, partition_spec=partition_spec)
131scan = table.scan(row_filter="day = '2026-01-27'")  # Prunes partitions
132```
133 
134## Table Maintenance
135 
136### Compaction
137 
138```python
139files = table.scan().plan_files()
140avg_mb = sum(f.file_size_in_bytes for f in files) / len(files) / (1024**2)
141print(f"Files: {len(files)}, Avg: {avg_mb:.1f} MB")
142 
143table.rewrite_data_files(target_file_size_bytes=128 * 1024 * 1024)
144```
145 
146**When:** Avg <10MB or >1000 files. **Frequency:** High-write daily, medium weekly.
147 
148### Snapshot Expiration
149 
150```python
151from datetime import datetime, timedelta
152 
153seven_days_ms = int((datetime.now() - timedelta(days=7)).timestamp() * 1000)
154table.expire_snapshots(older_than=seven_days_ms, retain_last=10)
155```
156 
157**Retention:** Production 7-30d, dev 1-7d, audit 90+d.
158 
159### Orphan Cleanup
160 
161```python
162three_days_ms = int((datetime.now() - timedelta(days=3)).timestamp() * 1000)
163table.delete_orphan_files(older_than=three_days_ms)
164```
165 
166⚠️ Always expire snapshots first, use 3+ day threshold, run during low traffic.
167 
168### Full Maintenance
169 
170```python
171# Compact → Expire → Cleanup (in order)
172if len(table.scan().plan_files()) > 1000:
173    table.rewrite_data_files(target_file_size_bytes=128 * 1024 * 1024)
174seven_days_ms = int((datetime.now() - timedelta(days=7)).timestamp() * 1000)
175table.expire_snapshots(older_than=seven_days_ms, retain_last=10)
176three_days_ms = int((datetime.now() - timedelta(days=3)).timestamp() * 1000)
177table.delete_orphan_files(older_than=three_days_ms)
178```
179 
180## Metadata Inspection
181 
182```python
183table = catalog.load_table(("logs", "app_logs"))
184print(table.schema())
185print(table.current_snapshot())
186print(table.properties)
187print(f"Files: {len(table.scan().plan_files())}")
188```
189 
190## Error Codes
191 
192| Code | Meaning | Common Causes |
193|------|---------|---------------|
194| 401 | Unauthorized | Invalid/missing token |
195| 404 | Not Found | Catalog not enabled, namespace/table missing |
196| 409 | Conflict | Already exists, concurrent update |
197| 422 | Validation | Invalid schema, incompatible type |
198 
199See [gotchas.md](gotchas.md) for detailed troubleshooting.
200
Preparing the source view

Cloudflare Platform Skill

references/r2-data-catalog/api.md