Source from repo
Cloudflare Platform Skill

Comprehensive Cloudflare platform skill covering Workers, D1, R2, KV, AI, Durable Objects, and security.
cloudflareGitHub cloudflareSource repo Original GitHub link Publisher page
Files
320
Skill
n/a
Size
1.3 MB
Entrypoint
SKILL.md
Format
git-repo
Open file
references/pipelines/patterns.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown131 linesFree
references/pipelines/patterns.md
1# Pipelines Patterns
2 
3Code-first patterns. For observability dataset/field schemas and Logpush dataset lists, pull `https://developers.cloudflare.com/pipelines/observability/metrics/` and `https://developers.cloudflare.com/pipelines/streams/logpush/`.
4 
5## Fire-and-Forget Producer
6 
7```typescript
8export default {
9  async fetch(req, env, ctx) {
10    const event = { event_id: crypto.randomUUID(), event_type: "page_view", timestamp: new Date().toISOString() };
11    ctx.waitUntil(env.MY_STREAM.send([event]));  // don't block the response
12    return new Response("OK");
13  }
14};
15```
16 
17## Client-Side Validation with Zod
18 
19Structured streams drop invalid events silently during processing. Validate before sending for immediate feedback.
20 
21```typescript
22import { z } from "zod";
23 
24const EventSchema = z.object({
25  event_id: z.string(),
26  category: z.enum(["purchase", "view"]),
27  amount: z.number().positive().optional(),
28});
29 
30const validated = EventSchema.parse(rawEvent);  // throws synchronously
31await env.MY_STREAM.send([validated]);
32```
33 
34## Scheduled Collector Worker
35 
36```jsonc
37// wrangler.jsonc
38{
39  "name": "collector",
40  "pipelines": [{ "stream": "<STREAM_ID>", "binding": "EVENT_STREAM" }],
41  "triggers": { "crons": ["*/5 * * * *"] }
42}
43```
44 
45```typescript
46export default {
47  async scheduled(event, env, ctx) {
48    const items = await (await fetch("https://api.example.com/data")).json();
49    const events = items.map(i => ({
50      event_id: crypto.randomUUID(),
51      timestamp: new Date().toISOString(),
52      category: i.type, amount: i.value,
53    }));
54    await env.EVENT_STREAM.send(events);
55  },
56};
57```
58 
59## Logpush → Pipelines
60 
61Pipelines is a native Logpush destination — ingest Cloudflare logs, transform with SQL, store as Iceberg/Parquet. For the current supported dataset list and field names, pull the Logpush doc above.
62 
63```sql
64INSERT INTO http_logs_sink
65SELECT
66  ClientIP,
67  EdgeResponseStatus,
68  to_timestamp_micros(EdgeStartTimestamp) AS event_time,
69  upper(ClientRequestMethod) AS method,
70  sha256(ClientIP) AS hashed_ip          -- redact PII at ingest
71FROM http_logs_stream
72WHERE EdgeResponseStatus >= 400;
73```
74 
75Configure via Dashboard (**Logpush → Create a job → Pipelines** destination) or API.
76 
77## Pipelines + Queues Fan-out
78 
79```typescript
80await Promise.all([
81  env.ANALYTICS_STREAM.send([event]),  // long-term storage + SQL
82  env.PROCESS_QUEUE.send(event),       // immediate processing + retries
83]);
84```
85 
86Use Pipelines for long-term storage + SQL; Queues for immediate processing/retries/DLQ; both for fan-out.
87 
88## Observability (GraphQL Analytics)
89 
90Same R2 API token works. Endpoint: `https://api.cloudflare.com/client/v4/graphql`. Datasets cover ingestion, processing (incl. `decodeErrors`), delivery, sink writes (`filesWritten`), and user/validation errors — see the metrics doc for the full dataset/field catalog.
91 
92```bash
93curl -X POST "https://api.cloudflare.com/client/v4/graphql" \
94  -H "Authorization: Bearer $API_TOKEN" -H "Content-Type: application/json" \
95  -d '{"query": "query { viewer { accounts(filter: {accountTag: \"'$ACCOUNT_ID'\"}) { pipelinesIngestionAdaptiveGroups(filter: {pipelineId: \"PIPELINE-UUID-WITH-DASHES\", datetime_geq: \"2026-03-01T00:00:00Z\"}, limit: 10) { sum { ingestedRecords ingestedBytes } dimensions { datetimeHour } } } } }"}'
96```
97 
98> **Sink/pipeline IDs need dashes for GraphQL** but wrangler may show them without: `b909fe6e544844abbd63f6dcbc81d602` → `b909fe6e-5448-44ab-bd63-f6dcbc81d602`. Metrics take 5–10 min to populate.
99 
100### Detecting Silent Data Loss
101 
102If a sink's bucket is deleted or its token expires, events are accepted but lost. Tell-tale: `recordsWritten > 0` but `filesWritten = 0`. Always verify data lands in R2 within the roll interval and R2 SQL returns expected counts.
103 
104## Schema Evolution (Immutable Pipelines)
105 
106Pipelines can't change. Version + dual-write:
107 
108```bash
109npx wrangler pipelines streams create events_v2 --schema-file v2.json
110```
111```typescript
112await Promise.all([env.EVENTS_V1.send([event]), env.EVENTS_V2.send([event])]);
113// query across versions with UNION ALL in R2 SQL
114```
115 
116## End-to-End: Streaming Analytics Dashboard
117 
118```
119External APIs → Collector Worker (cron) → Pipeline → R2 (Iceberg) → Dashboard Worker → R2 SQL
120```
121 
1221. Create bucket + enable catalog ([r2-data-catalog](../r2-data-catalog/configuration.md))
1232. Create stream + sink + pipeline (here)
1243. Collector Worker with cron + stream binding (above)
1254. Dashboard Worker querying R2 SQL ([r2-sql/patterns.md](../r2-sql/patterns.md))
1265. Enable automatic compaction
127 
128## See Also
129 
130- [configuration.md](configuration.md) · [api.md](api.md) · [gotchas.md](gotchas.md) · [r2-sql](../r2-sql/)
131
Preparing the source view

Cloudflare Platform Skill

references/pipelines/patterns.md