Source from repo
Image Generation (AI SDK)

Generate images via OpenAI, Google, OpenRouter, DashScope, Jimeng, Seedream, and Replicate APIs with batch support.
jimliuGitHub jimliuSource repo Original GitHub link Publisher page
Files
Skill
n/a
Size
218.9 KB
Entrypoint
SKILL.md
Format
git-repo
Open file
SKILL.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown230 linesEntrypointFree
SKILL.md
1---
2name: baoyu-image-gen
3description: "[Deprecated: use baoyu-imagine] AI image generation with OpenAI, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images."
4version: 1.56.4
5metadata:
6  openclaw:
7    homepage: https://github.com/JimLiu/baoyu-skills#baoyu-image-gen
8    requires:
9      anyBins:
10        - bun
11        - npx
12---
13 
14 
15# Image Generation (AI SDK)
16 
17Official API-based image generation. Supports OpenAI, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Z.AI GLM-Image, MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate.
18 
19## User Input Tools
20 
21When this skill prompts the user, follow this tool-selection rule (priority order):
22 
231. **Prefer built-in user-input tools** exposed by the current agent runtime — e.g., `AskUserQuestion`, `request_user_input`, `clarify`, `ask_user`, or any equivalent.
242. **Fallback**: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
253. **Batching**: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.
26 
27Concrete `AskUserQuestion` references below are examples — substitute the local equivalent in other runtimes.
28 
29## Script Directory
30 
31`{baseDir}` = this SKILL.md's directory. Main script: `{baseDir}/scripts/main.ts`. Resolve `${BUN_X}`: prefer `bun`; else `npx -y bun`; else suggest `brew install oven-sh/bun/bun`.
32 
33## Step 0: Load Preferences ⛔ BLOCKING
34 
35This step MUST complete before any image generation — generation is blocked until EXTEND.md exists.
36 
37Check these paths in order; first hit wins:
38 
39| Path | Scope |
40|------|-------|
41| `.baoyu-skills/baoyu-image-gen/EXTEND.md` | Project |
42| `${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md` | XDG |
43| `$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md` | User home |
44 
45- **Found** → load, parse, apply. If `default_model.[provider]` is null → ask model only.
46- **Not found** → run first-time setup (`references/config/first-time-setup.md`) using AskUserQuestion to collect provider + model + quality + save location. Save EXTEND.md, then continue. Do not generate images before this completes.
47 
48**EXTEND.md keys**: default provider, default quality, default aspect ratio, default image size, OpenAI image API dialect, default models, batch worker cap, provider-specific batch limits. Schema: `references/config/preferences-schema.md`.
49 
50## Usage
51 
52Minimum working examples — see `references/usage-examples.md` for the full set including per-provider invocations and batch mode.
53 
54```bash
55# Basic
56${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png
57 
58# With aspect ratio and high quality
59${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9 --quality 2k
60 
61# Prompt from files
62${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
63 
64# With reference image
65${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
66 
67# Specific provider
68${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider dashscope --model qwen-image-2.0-pro
69 
70# Batch mode
71${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4
72```
73 
74## Options
75 
76| Option | Description |
77|--------|-------------|
78| `--prompt <text>`, `-p` | Prompt text |
79| `--promptfiles <files...>` | Read prompt from files (concatenated) |
80| `--image <path>` | Output image path (required in single-image mode) |
81| `--batchfile <path>` | JSON batch file for multi-image generation |
82| `--jobs <count>` | Worker count for batch mode (default: auto, max from config, built-in default 10) |
83| `--provider google\|openai\|azure\|openrouter\|dashscope\|zai\|minimax\|jimeng\|seedream\|replicate` | Force provider (default: auto-detect) |
84| `--model <id>`, `-m` | Model ID — see provider references for defaults and allowed values |
85| `--ar <ratio>` | Aspect ratio (`16:9`, `1:1`, `4:3`, …) |
86| `--size <WxH>` | Explicit size (e.g., `1024x1024`) |
87| `--quality normal\|2k` | Quality preset (default: `2k`) |
88| `--imageSize 1K\|2K\|4K` | Image size for Google/OpenRouter (default: from quality) |
89| `--imageApiDialect openai-native\|ratio-metadata` | OpenAI-compatible endpoint dialect — use `ratio-metadata` for gateways that expect aspect-ratio `size` plus `metadata.resolution` |
90| `--ref <files...>` | Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate supported families, MiniMax subject-reference, Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, SeedEdit 3.0 |
91| `--n <count>` | Number of images. Replicate requires `--n 1` (single-output save semantics) |
92| `--json` | JSON output |
93 
94## Environment Variables
95 
96| Variable | Description |
97|----------|-------------|
98| `OPENAI_API_KEY` | OpenAI API key |
99| `AZURE_OPENAI_API_KEY` | Azure OpenAI API key |
100| `OPENROUTER_API_KEY` | OpenRouter API key |
101| `GOOGLE_API_KEY` | Google API key |
102| `DASHSCOPE_API_KEY` | DashScope API key |
103| `ZAI_API_KEY` (alias `BIGMODEL_API_KEY`) | Z.AI API key |
104| `MINIMAX_API_KEY` | MiniMax API key |
105| `REPLICATE_API_TOKEN` | Replicate API token |
106| `JIMENG_ACCESS_KEY_ID`, `JIMENG_SECRET_ACCESS_KEY` | Jimeng (即梦) Volcengine credentials |
107| `ARK_API_KEY` | Seedream (豆包) Volcengine ARK API key |
108| `<PROVIDER>_IMAGE_MODEL` | Per-provider model override (`OPENAI_IMAGE_MODEL`, `GOOGLE_IMAGE_MODEL`, `DASHSCOPE_IMAGE_MODEL`, `ZAI_IMAGE_MODEL`/`BIGMODEL_IMAGE_MODEL`, `MINIMAX_IMAGE_MODEL`, `OPENROUTER_IMAGE_MODEL`, `REPLICATE_IMAGE_MODEL`, `JIMENG_IMAGE_MODEL`, `SEEDREAM_IMAGE_MODEL`) |
109| `AZURE_OPENAI_DEPLOYMENT` (alias `AZURE_OPENAI_IMAGE_MODEL`) | Azure default deployment |
110| `<PROVIDER>_BASE_URL` | Per-provider endpoint override |
111| `AZURE_API_VERSION` | Azure image API version (default `2025-04-01-preview`) |
112| `JIMENG_REGION` | Jimeng region (default `cn-north-1`) |
113| `OPENAI_IMAGE_API_DIALECT` | `openai-native` \| `ratio-metadata` |
114| `OPENROUTER_HTTP_REFERER`, `OPENROUTER_TITLE` | Optional OpenRouter attribution |
115| `BAOYU_IMAGE_GEN_MAX_WORKERS` | Override batch worker cap |
116| `BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY` | Per-provider concurrency (e.g., `BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY`) |
117| `BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS` | Per-provider start-gap |
118 
119**Load priority**: CLI args > EXTEND.md > env vars > `<cwd>/.baoyu-skills/.env` > `~/.baoyu-skills/.env`
120 
121## Model Resolution
122 
123Priority (highest → lowest) applies to every provider:
124 
1251. CLI flag `--model <id>`
1262. EXTEND.md `default_model.[provider]`
1273. Env var `<PROVIDER>_IMAGE_MODEL`
1284. Built-in default
129 
130For Azure, `--model` / `default_model.azure` is the Azure deployment name. `AZURE_OPENAI_DEPLOYMENT` is the preferred env var; `AZURE_OPENAI_IMAGE_MODEL` is kept as a backward-compatible alias.
131 
132EXTEND.md overrides env vars: if EXTEND.md sets `default_model.google: "gemini-3-pro-image-preview"` and the env var sets `GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview`, EXTEND.md wins.
133 
134**Display model info before each generation**:
135 
136- `Using [provider] / [model]`
137- `Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL`
138 
139## OpenAI-Compatible Gateway Dialects
140 
141`provider=openai` means the auth and routing entrypoint is OpenAI-compatible. It does **not** guarantee the upstream image API uses OpenAI native semantics. When a gateway expects a different wire format, set `default_image_api_dialect` in EXTEND.md, `OPENAI_IMAGE_API_DIALECT`, or `--imageApiDialect`:
142 
143- `openai-native`: pixel `size` (`1536x1024`) and native OpenAI quality fields
144- `ratio-metadata`: aspect-ratio `size` (`16:9`) plus `metadata.resolution` (`1K|2K|4K`) and `metadata.orientation`
145 
146Use `openai-native` for the OpenAI native API or strict clones; try `ratio-metadata` for compatibility gateways in front of Gemini or similar models. Current limitation: `ratio-metadata` applies only to text-to-image; reference-image edits still need `openai-native` or a provider with first-class edit support.
147 
148## Provider-Specific Guides
149 
150Each provider has its own quirks (model families, size rules, ref support, limits). Read these when the user picks that provider or asks for non-default behavior:
151 
152| Provider | Reference |
153|----------|-----------|
154| DashScope (Qwen-Image families, custom sizes) | `references/providers/dashscope.md` |
155| Z.AI (GLM-Image, cogview-4) | `references/providers/zai.md` |
156| MiniMax (image-01, subject-reference) | `references/providers/minimax.md` |
157| OpenRouter (multimodal models, `/chat/completions` flow) | `references/providers/openrouter.md` |
158| Replicate (nano-banana, Seedream, Wan) | `references/providers/replicate.md` |
159 
160## Provider Selection
161 
1621. `--ref` provided + no `--provider` → auto-select Google → OpenAI → Azure → OpenRouter → Replicate → Seedream → MiniMax (MiniMax's subject reference is more specialized toward character/portrait consistency)
1632. `--provider` specified → use it (if `--ref`, must be google/openai/azure/openrouter/replicate/seedream/minimax)
1643. Only one API key present → use that provider
1654. Multiple keys → default priority: Google → OpenAI → Azure → OpenRouter → DashScope → Z.AI → MiniMax → Replicate → Jimeng → Seedream
166 
167## Quality Presets
168 
169| Preset | Google imageSize | OpenAI size | OpenRouter size | Replicate resolution | Use case |
170|--------|------------------|-------------|-----------------|----------------------|----------|
171| `normal` | 1K | 1024px | 1K | 1K | Quick previews |
172| `2k` (default) | 2K | 2048px | 2K | 2K | Covers, illustrations, infographics |
173 
174Google/OpenRouter `imageSize` can be overridden with `--imageSize 1K|2K|4K`.
175 
176## Aspect Ratios
177 
178Supported: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2.35:1`.
179 
180- Google multimodal: `imageConfig.aspectRatio`
181- OpenAI: closest supported size
182- OpenRouter: `imageGenerationOptions.aspect_ratio`; if only `--size <WxH>` is given, the ratio is inferred
183- Replicate: behavior is model-specific — `google/nano-banana*` uses `aspect_ratio`, `bytedance/seedream-*` uses documented Replicate ratios, Wan 2.7 maps `--ar` to a concrete `size`
184- MiniMax: official `aspect_ratio` values; if `--size <WxH>` is given without `--ar`, sends `width`/`height` for `image-01`
185 
186## Generation Mode
187 
188**Default**: sequential. **Batch parallel**: enabled automatically when `--batchfile` contains 2+ pending tasks.
189 
190| Situation | Prefer | Why |
191|-----------|--------|-----|
192| One image, or 1-2 simple images | Sequential | Lower coordination overhead, easier debugging |
193| Multiple images with saved prompt files | Batch (`--batchfile`) | Reuses finalized prompts, applies shared throttling/retries, predictable throughput |
194| Each image still needs its own reasoning / prompt writing / style exploration | Subagents | Work is still exploratory, each needs independent analysis |
195| Input is `outline.md` + `prompts/` (e.g. from `baoyu-article-illustrator`) | Batch — use `scripts/build-batch.ts` to assemble the payload | The outline + prompt files already contain everything needed |
196 
197Rule of thumb: once prompt files are saved and the task is "generate all of these", prefer batch over subagents. Use subagents only when generation is coupled with per-image thinking or divergent creative exploration.
198 
199**Parallel behavior**:
200 
201- Default worker count is automatic, capped by config, built-in default 10
202- Provider-specific throttling applies only in batch mode; defaults are tuned for throughput while avoiding RPM bursts
203- Override with `--jobs <count>`
204- Each image retries up to 3 attempts
205- Final output includes success count, failure count, and per-image failure reasons
206 
207## Error Handling
208 
209- Missing API key → error with setup instructions
210- Generation failure → auto-retry up to 3 attempts per image
211- Invalid aspect ratio → warning, proceed with default
212- Reference images with unsupported provider/model → error with fix hint
213 
214## References
215 
216| File | Content |
217|------|---------|
218| `references/usage-examples.md` | Extended CLI examples across providers and batch mode |
219| `references/providers/dashscope.md` | DashScope families, sizes, limits |
220| `references/providers/zai.md` | Z.AI GLM-image / cogview-4 |
221| `references/providers/minimax.md` | MiniMax image-01 + subject reference |
222| `references/providers/openrouter.md` | OpenRouter multimodal flow |
223| `references/providers/replicate.md` | Replicate supported families + guardrails |
224| `references/config/preferences-schema.md` | EXTEND.md schema |
225| `references/config/first-time-setup.md` | First-time setup flow |
226 
227## Extension Support
228 
229Custom configurations via EXTEND.md. See Step 0 for paths and schema.
230
Preparing the source view

Image Generation (AI SDK)

SKILL.md