Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Generate images via OpenAI, Google, OpenRouter, DashScope, Jimeng, Seedream, and Replicate APIs with batch support.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
SKILL.md
1---2name: baoyu-image-gen3description: AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream, Replicate and Agnes APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.4version: 2.1.05metadata:6openclaw:7homepage: https://github.com/JimLiu/baoyu-skills#baoyu-image-gen8requires:9anyBins:10- bun11- npx12---1314# Image Generation (AI SDK)1516Official API-based image generation. Supports OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Z.AI GLM-Image, MiniMax, Jimeng (即梦), Seedream (豆包), Replicate and Agnes.1718## User Input Tools1920When this skill prompts the user, follow this tool-selection rule (priority order):21221. **Prefer built-in user-input tools** exposed by the current agent runtime — e.g., `AskUserQuestion`, `request_user_input`, `clarify`, `ask_user`, or any equivalent.232. **Fallback**: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.243. **Batching**: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.2526Concrete `AskUserQuestion` references below are examples — substitute the local equivalent in other runtimes.2728## Script Directory2930`{baseDir}` = this SKILL.md's directory. All `scripts/...` paths below are relative to `{baseDir}`. Main script: `{baseDir}/scripts/main.ts`. Batch payload helper: `{baseDir}/scripts/build-batch.ts`. Resolve `${BUN_X}`: prefer `bun`; else `npx -y bun`; else suggest `brew install oven-sh/bun/bun`.3132## Step 0: Load Preferences ⛔ BLOCKING3334This step MUST complete before any image generation — generation is blocked until EXTEND.md exists.3536Check these paths in order; first hit wins:3738| Path | Scope |39|------|-------|40| `.baoyu-skills/baoyu-image-gen/EXTEND.md` | Project |41| `${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md` | XDG |42| `$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md` | User home |4344- **Found** → load, parse, apply. If `default_model.[provider]` is null → ask model only.45- **Not found** → run first-time setup (`references/config/first-time-setup.md`) using AskUserQuestion to collect provider + model + quality + save location. Save EXTEND.md, then continue. Do not generate images before this completes.4647Legacy compatibility: if `.baoyu-skills/baoyu-imagine/EXTEND.md` exists and the new path doesn't, the runtime renames it to `baoyu-image-gen`. If both exist, the runtime leaves them alone and uses the new path.4849**EXTEND.md keys**: default provider, default quality, default aspect ratio, default image size, OpenAI image API dialect, default models, batch worker cap, provider-specific batch limits. Schema: `references/config/preferences-schema.md`.5051## Usage5253Minimum working examples — see `references/usage-examples.md` for the full set including per-provider invocations and batch mode.5455### Identity-preserving reference prompts5657When the user wants a real person/character/object preserved from reference images, do **not** replace the reference with a long generic description. Prefer short, hard identity-preservation language:5859- "Use the person/object in the reference image(s) as the same identity. Do not redesign it or create a similar-looking new subject."60- "Only change scene, clothing, pose, lighting, rendering style, and composition. Keep the face/proportions/hair/key accessories/overall identity from the references."61- If using multiple references, state that they are the same subject and should jointly define identity.6263Pitfall: long descriptions like "young East Asian woman, oval face, clear eyes..." can cause the model to synthesize a new person matching the description instead of preserving the referenced person.6465```bash66# Basic67${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png6869# With aspect ratio and high quality70${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9 --quality 2k7172# Prompt from files73${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png7475# With reference image76${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png7778# Specific provider79${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider dashscope --model qwen-image-2.0-pro8081# OpenAI GPT Image 282${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai --model gpt-image-28384# Codex CLI (uses logged-in Codex subscription — no OPENAI_API_KEY required; requires `codex` on PATH)85${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider codex-cli --ar 16:98687# Batch mode88${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 48990# Build a batch file from outline.md + prompts/ (e.g. baoyu-article-illustrator output)91${BUN_X} {baseDir}/scripts/build-batch.ts --outline outline.md --prompts prompts --output batch.json --images-dir attachments92${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 493```9495## Reference-Image Identity Preservation9697When the user wants a person/object preserved from reference images:9899- Prefer a small curated set of existing source references (usually 2–4) over many images; large multi-megabyte refs can destabilize streaming providers.100- Make the prompt say the references are the same subject and the output must use that identity. Avoid long generic facial-feature descriptions that can cause the model to synthesize a new similar-looking person.101- Do not use newly generated outputs as references unless the user explicitly asks; generated refs compound drift.102- If results become too polished or influencer-like, reduce stylized refs and add explicit anti-beautification constraints (no face slimming, eye enlargement, heavy makeup, commercial travel shoot, over-smoothing).103- If the subject should look younger/older, preserve the face and express age through clothing, posture, scene, and styling; do not ask the model to change facial identity.104105## Options106107| Option | Description |108|--------|-------------|109| `--prompt <text>`, `-p` | Prompt text |110| `--promptfiles <files...>` | Read prompt from files (concatenated) |111| `--image <path>` | Output image path (required in single-image mode) |112| `--batchfile <path>` | JSON batch file for multi-image generation |113| `--jobs <count>` | Worker count for batch mode (default: auto, max from config, built-in default 10) |114| `--provider google\|openai\|azure\|openrouter\|dashscope\|zai\|minimax\|jimeng\|seedream\|replicate\|codex-cli\|agnes` | Force provider (default: auto-detect; `codex-cli` is never auto-selected — must be pinned via CLI or EXTEND.md) |115| `--model <id>`, `-m` | Model ID — see provider references for defaults and allowed values |116| `--ar <ratio>` | Aspect ratio (`16:9`, `1:1`, `4:3`, …) |117| `--size <WxH>` | Explicit size (e.g., `1024x1024`; for `gpt-image-2`, width/height must be multiples of 16, max edge 3840px, ratio no wider than 3:1) |118| `--quality normal\|2k` | Quality preset (default: `2k`) |119| `--imageSize 1K\|2K\|4K` | Image size for Google/OpenRouter (default: from quality) |120| `--imageApiDialect openai-native\|ratio-metadata` | OpenAI-compatible endpoint dialect — use `ratio-metadata` for gateways that expect aspect-ratio `size` plus `metadata.resolution` |121| `--ref <files...>` | Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate supported families, MiniMax subject-reference, Seedream 5.0/4.5/4.0, DashScope `wan2.7-image-pro`/`wan2.7-image`. Not supported by Jimeng, Seedream 3.0, SeedEdit 3.0, or any DashScope model outside the `wan2.7-image*` family |122| `--n <count>` | Number of images. Replicate requires `--n 1` (single-output save semantics) |123| `--json` | JSON output |124125## Environment Variables126127| Variable | Description |128|----------|-------------|129| `OPENAI_API_KEY` | OpenAI API key |130| `AZURE_OPENAI_API_KEY` | Azure OpenAI API key |131| `OPENROUTER_API_KEY` | OpenRouter API key |132| `GOOGLE_API_KEY` | Google API key |133| `DASHSCOPE_API_KEY` | DashScope API key |134| `ZAI_API_KEY` (alias `BIGMODEL_API_KEY`) | Z.AI API key |135| `MINIMAX_API_KEY` | MiniMax API key |136| `REPLICATE_API_TOKEN` | Replicate API token |137| `JIMENG_ACCESS_KEY_ID`, `JIMENG_SECRET_ACCESS_KEY` | Jimeng (即梦) Volcengine credentials |138| `ARK_API_KEY` | Seedream (豆包) Volcengine ARK API key |139| `<PROVIDER>_IMAGE_MODEL` | Per-provider model override (`OPENAI_IMAGE_MODEL`, `GOOGLE_IMAGE_MODEL`, `DASHSCOPE_IMAGE_MODEL`, `ZAI_IMAGE_MODEL`/`BIGMODEL_IMAGE_MODEL`, `MINIMAX_IMAGE_MODEL`, `OPENROUTER_IMAGE_MODEL`, `REPLICATE_IMAGE_MODEL`, `JIMENG_IMAGE_MODEL`, `SEEDREAM_IMAGE_MODEL`, `AGNES_IMAGE_MODEL`) |140| `AZURE_OPENAI_DEPLOYMENT` (alias `AZURE_OPENAI_IMAGE_MODEL`) | Azure default deployment |141| `<PROVIDER>_BASE_URL` | Per-provider endpoint override |142| `AZURE_API_VERSION` | Azure image API version (default `2025-04-01-preview`) |143| `JIMENG_REGION` | Jimeng region (default `cn-north-1`) |144| `OPENAI_IMAGE_API_DIALECT` | `openai-native` \| `ratio-metadata` |145| `OPENROUTER_HTTP_REFERER`, `OPENROUTER_TITLE` | Optional OpenRouter attribution |146| `BAOYU_IMAGE_GEN_MAX_WORKERS` | Override batch worker cap |147| `BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY` | Per-provider concurrency (e.g., `BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY`; for codex-cli use `BAOYU_IMAGE_GEN_CODEX_CLI_CONCURRENCY`) |148| `BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS` | Per-provider start-gap |149| `BAOYU_CODEX_IMAGEGEN_BIN` | Override the codex-imagegen wrapper path for the `codex-cli` provider (default: bundled `scripts/codex-imagegen/main.ts`; accepts `.ts` or legacy `.sh`/binary) |150| `BAOYU_CODEX_IMAGEGEN_CACHE_DIR` | Enable idempotency cache for the `codex-cli` provider (off by default) |151| `BAOYU_CODEX_IMAGEGEN_TIMEOUT_MS` | Per-attempt `codex exec` timeout for the `codex-cli` provider (default: 300000 ms) |152| `BAOYU_CODEX_IMAGEGEN_RETRIES` | Wrapper-side retry attempts on retryable errors for the `codex-cli` provider (default: 2) |153| `BAOYU_CODEX_IMAGEGEN_LOG_FILE` | Append JSONL diagnostic log for the `codex-cli` provider |154155**Load priority**: CLI args > EXTEND.md > env vars > `<cwd>/.baoyu-skills/.env` > `~/.baoyu-skills/.env`156157### Codex/ChatGPT OAuth is not an OpenAI API key158159`--provider openai --model gpt-image-2` uses the standard OpenAI Images API (`/v1/images/generations` or `/v1/images/edits`) and requires `OPENAI_API_KEY`. A Codex or ChatGPT desktop login is a different entitlement and is not a drop-in replacement for `OPENAI_API_KEY`; do not paste a Codex OAuth token into `OPENAI_API_KEY` or only set `OPENAI_BASE_URL` to a Codex backend.160161If the user wants to use their Codex subscription / GPT Image 2 entitlement without an OpenAI API key, route through a Codex-native backend instead of this skill's `openai` provider:162163- In Codex runtime: use the native `imagegen` skill/tool.164- In non-Codex runtimes with `codex` CLI installed and logged in: use `baoyu-image-gen --provider codex-cli` (preferred — it gives you the same retry / cache / batch flow as every other provider). The provider spawns the bundled `scripts/codex-imagegen/main.ts`; the same code lives upstream at `packages/baoyu-codex-imagegen/src/main.ts` for standalone callers.165- In Hermes runtimes with a native `image_generate` tool: use that tool as a fallback, and state whether reference images were passed directly or reconstructed from extracted traits.166167Do not modify the existing `openai` provider to silently consume Codex OAuth. The first-class Codex-CLI path is the dedicated `codex-cli` provider, which has its own auth (Codex login), route (`codex exec`), request shape, and tests. See `references/codex-oauth-vs-openai-api-key.md`.168169## Model Resolution170171Priority (highest → lowest) applies to every provider:1721731. CLI flag `--model <id>`1742. EXTEND.md `default_model.[provider]`1753. Env var `<PROVIDER>_IMAGE_MODEL`1764. Built-in default177178For OpenAI, the built-in default is `gpt-image-2`. `gpt-image-1.5`, `gpt-image-1`, and GPT Image snapshots remain selectable with `--model` or `OPENAI_IMAGE_MODEL`.179180For Azure, `--model` / `default_model.azure` is the Azure deployment name. `AZURE_OPENAI_DEPLOYMENT` is the preferred env var; `AZURE_OPENAI_IMAGE_MODEL` is kept as a backward-compatible alias. If your Azure deployment is named after the underlying model, use `gpt-image-2`; otherwise use the exact custom deployment name.181182EXTEND.md overrides env vars: if EXTEND.md sets `default_model.google: "gemini-3-pro-image"` and the env var sets `GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image`, EXTEND.md wins.183184**Display model info before each generation**:185186- `Using [provider] / [model]`187- `Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL`188189## OpenAI-Compatible Gateway Dialects190191`provider=openai` means the auth and routing entrypoint is OpenAI-compatible. It does **not** guarantee the upstream image API uses OpenAI native semantics. When a gateway expects a different wire format, set `default_image_api_dialect` in EXTEND.md, `OPENAI_IMAGE_API_DIALECT`, or `--imageApiDialect`:192193- `openai-native`: pixel `size` (`1536x1024`) and native OpenAI quality fields194- `ratio-metadata`: aspect-ratio `size` (`16:9`) plus `metadata.resolution` (`1K|2K|4K`) and `metadata.orientation`195196Use `openai-native` for the OpenAI native API or strict clones; try `ratio-metadata` for compatibility gateways in front of Gemini or similar models. Current limitation: `ratio-metadata` applies only to text-to-image; reference-image edits still need `openai-native` or a provider with first-class edit support.197198## Provider-Specific Guides199200Each provider has its own quirks (model families, size rules, ref support, limits). Read these when the user picks that provider or asks for non-default behavior:201202| Provider | Reference |203|----------|-----------|204| DashScope (Qwen-Image families, custom sizes) | `references/providers/dashscope.md` |205| Z.AI (GLM-Image, cogview-4) | `references/providers/zai.md` |206| MiniMax (image-01, subject-reference) | `references/providers/minimax.md` |207| OpenRouter (multimodal models, `/chat/completions` flow) | `references/providers/openrouter.md` |208| Replicate (nano-banana, Seedream, Wan) | `references/providers/replicate.md` |209| Codex CLI (wraps bundled `scripts/codex-imagegen/`; Codex login, no `OPENAI_API_KEY`) | `references/providers/codex-cli.md` |210| Agnes (agnes-image-2.1-flash, reference-image support) | `references/providers/agnes.md` |211212## Provider Selection2132141. `--ref` provided + no `--provider` → auto-select Google → OpenAI → Azure → OpenRouter → Replicate → Seedream → MiniMax → Agnes (MiniMax's subject reference is more specialized toward character/portrait consistency)2152. `--provider` specified → use it (if `--ref`, must be google/openai/azure/openrouter/replicate/seedream/minimax/codex-cli/agnes)2163. Only one API key present → use that provider2174. Multiple keys → default priority: Google → OpenAI → Azure → OpenRouter → DashScope → Z.AI → MiniMax → Replicate → Jimeng → Seedream → Agnes2185. `codex-cli` is **never auto-selected** — set `default_provider: codex-cli` in EXTEND.md or pass `--provider codex-cli`. It spawns `codex exec` via the bundled `scripts/codex-imagegen/main.ts` TS entrypoint (run with `bun`) and uses the user's Codex subscription (no `OPENAI_API_KEY`). Requires `codex` on `PATH` with an active `codex login`.219220## Quality Presets221222| Preset | Google imageSize | OpenAI size | OpenRouter size | Replicate resolution | Use case |223|--------|------------------|-------------|-----------------|----------------------|----------|224| `normal` | 1K | 1024px target | 1K | 1K | Quick previews |225| `2k` (default) | 2K | 2048px target | 2K | 2K | Covers, illustrations, infographics |226227Google/OpenRouter `imageSize` can be overridden with `--imageSize 1K|2K|4K`.228229For OpenAI native `gpt-image-2`, `normal` maps to `quality=medium` and a low-latency valid size near the requested aspect ratio; `2k` maps to `quality=high` and 2048px-class sizes such as `2048x2048`, `2048x1152`, or `1152x2048`. Use explicit `--size` for valid custom or 4K outputs, e.g. `3840x2160`.230231## Aspect Ratios232233Supported: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2.35:1`.234235- Google multimodal: `imageConfig.aspectRatio`236- OpenAI: `gpt-image-2` uses the closest valid custom size for the requested ratio; older GPT Image and DALL·E models use their closest supported fixed size237- OpenRouter: `imageGenerationOptions.aspect_ratio`; if only `--size <WxH>` is given, the ratio is inferred238- Replicate: behavior is model-specific — `google/nano-banana*` uses `aspect_ratio`, `bytedance/seedream-*` uses documented Replicate ratios, Wan 2.7 maps `--ar` to a concrete `size`239- MiniMax: official `aspect_ratio` values; if `--size <WxH>` is given without `--ar`, sends `width`/`height` for `image-01`240241## Generation Mode242243**Default**: sequential. **Batch parallel**: enabled automatically when `--batchfile` contains 2+ pending tasks.244245| Situation | Prefer | Why |246|-----------|--------|-----|247| One image, or 1-2 simple images | Sequential | Lower coordination overhead, easier debugging |248| Multiple images with saved prompt files | Batch (`--batchfile`) | Reuses finalized prompts, applies shared throttling/retries, predictable throughput |249| Each image still needs its own reasoning / prompt writing / style exploration | Subagents | Work is still exploratory, each needs independent analysis |250| Input is `outline.md` + `prompts/` (e.g. from `baoyu-article-illustrator`) | Batch — use `{baseDir}/scripts/build-batch.ts` to assemble the payload | The outline + prompt files already contain everything needed |251252Rule of thumb: once prompt files are saved and the task is "generate all of these", prefer batch over subagents. Use subagents only when generation is coupled with per-image thinking or divergent creative exploration.253254**Parallel behavior**:255256- Default worker count is automatic, capped by config, built-in default 10257- Provider-specific throttling applies only in batch mode; defaults are tuned for throughput while avoiding RPM bursts258- Override with `--jobs <count>`259- Each image retries up to 3 attempts260- Final output includes success count, failure count, and per-image failure reasons261262## Error Handling263264- Missing API key → error with setup instructions265- Generation failure → auto-retry up to 3 attempts per image266- Invalid aspect ratio → warning, proceed with default267- Reference images with unsupported provider/model → error with fix hint268269### Codex image2 fallback270271If `--provider openai --model gpt-image-2` fails because `OPENAI_API_KEY` is missing but the current runtime has a native image-generation backend or the repo-level `codex-imagegen` wrapper is available, use that path rather than leaving the user waiting. Be explicit about whether the fallback is true reference-image generation or only a text-prompt reconstruction from extracted visual traits. See `references/codex-image2-fallback.md`.272273## References274275| File | Content |276|------|---------|277| `references/usage-examples.md` | Extended CLI examples across providers and batch mode |278| `references/codex-oauth-vs-openai-api-key.md` | Why Codex/ChatGPT OAuth image2 entitlement is not usable through baoyu-image-gen's standard OpenAI API-key provider |279| `references/codex-image2-fallback.md` | Practical fallback behavior when OpenAI API credentials are absent but Codex/native image generation is available |280| `references/providers/dashscope.md` | DashScope families, sizes, limits |281| `references/providers/zai.md` | Z.AI GLM-image / cogview-4 |282| `references/providers/minimax.md` | MiniMax image-01 + subject reference |283| `references/providers/openrouter.md` | OpenRouter multimodal flow |284| `references/providers/replicate.md` | Replicate supported families + guardrails |285| `references/providers/agnes.md` | Agnes (agnes-image-2.1-flash) sizing, refs, and limits |286| `references/config/preferences-schema.md` | EXTEND.md schema |287| `references/config/first-time-setup.md` | First-time setup flow |288289## Extension Support290291Custom configurations via EXTEND.md. See Step 0 for paths and schema.292