Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Fetch any URL via Chrome CDP and convert the rendered page to clean markdown with YouTube transcript support.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
SKILL.md
1---2name: baoyu-url-to-markdown3description: Fetch any URL and convert to markdown using baoyu-fetch CLI (Chrome CDP with site-specific adapters). Built-in adapters for X/Twitter, YouTube transcripts, Hacker News threads, and generic pages via Defuddle. Handles login/CAPTCHA via interaction wait modes. Use when user wants to save a webpage as markdown.4version: 1.61.05metadata:6openclaw:7homepage: https://github.com/JimLiu/baoyu-skills#baoyu-url-to-markdown8requires:9anyBins:10- bun11---1213# URL to Markdown1415Fetches any URL via `baoyu-fetch` CLI (Chrome CDP + site-specific adapters) and converts it to clean markdown.1617## User Input Tools1819When this skill prompts the user, follow this tool-selection rule (priority order):20211. **Prefer built-in user-input tools** exposed by the current agent runtime — e.g., `AskUserQuestion`, `request_user_input`, `clarify`, `ask_user`, or any equivalent.222. **Fallback**: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.233. **Batching**: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.2425Concrete `AskUserQuestion` references below are examples — substitute the local equivalent in other runtimes.2627## CLI Setup2829**Important**: The CLI source is vendored in `{baseDir}/scripts/lib`. `scripts/package.json` installs only third-party runtime dependencies.3031**Agent Execution Instructions**:321. Determine this SKILL.md file's directory path as `{baseDir}`332. Resolve `${BUN}` runtime: if `bun` installed → `bun`; else suggest installing Bun343. If `{baseDir}/scripts/node_modules` does not exist, run `${BUN} install --cwd {baseDir}/scripts`354. `${READER}` = `{baseDir}/scripts/baoyu-fetch`365. Replace all `${READER}` in this document with the resolved value3738## Preferences (EXTEND.md)3940Check EXTEND.md in priority order — the first one found wins:4142| Priority | Path | Scope |43|----------|------|-------|44| 1 | `.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | Project |45| 2 | `${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | XDG |46| 3 | `$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md` | User home |4748| Result | Action |49|--------|--------|50| Found | Read, parse, apply settings |51| Not found | **MUST** run first-time setup (see below) — do NOT silently create defaults |5253**EXTEND.md supports**: download media by default, default output directory.5455### First-Time Setup ⛔ BLOCKING5657When EXTEND.md is not found, you **MUST** use `AskUserQuestion` to gather preferences before creating EXTEND.md. **NEVER** create EXTEND.md with silent defaults. Generation is BLOCKED until setup completes. Batch all three questions into a single call:5859- **Q1 — Media** (header "Media"): "How to handle images and videos in pages?"60- "Ask each time (Recommended)" — Prompt after each save61- "Always download" — Download to local `imgs/` and `videos/`62- "Never download" — Keep remote URLs63- **Q2 — Output** (header "Output"): "Default output directory?"64- "url-to-markdown (Recommended)" — Save to `./url-to-markdown/{domain}/{slug}.md`65- User may pick "Other" and type a custom path66- **Q3 — Save** (header "Save"): "Where to save preferences?"67- "User (Recommended)" — `~/.baoyu-skills/` (all projects)68- "Project" — `.baoyu-skills/` (this project only)6970After answers, write EXTEND.md, confirm "Preferences saved to [path]", then continue.7172Full template: [references/config/first-time-setup.md](references/config/first-time-setup.md).7374### Supported Keys7576| Key | Default | Values | Description |77|-----|---------|--------|-------------|78| `download_media` | `ask` | `ask` / `1` / `0` | `ask` = prompt each time, `1` = always, `0` = never |79| `default_output_dir` | empty | path or empty | Default output directory (empty = `./url-to-markdown/`) |8081**EXTEND.md → CLI mapping**:8283| EXTEND.md key | CLI argument | Notes |84|---------------|-------------|-------|85| `download_media: 1` | `--download-media` | Requires `--output` to be set |86| `default_output_dir: ./posts/` | Agent constructs `--output ./posts/{domain}/{slug}.md` | Agent generates path, not a direct flag |8788**Value priority**: CLI arguments → EXTEND.md → skill defaults.8990## Usage9192```bash93# Default: headless capture, markdown to stdout94${READER} <url>9596# Save to file97${READER} <url> --output article.md9899# Save with media download100${READER} <url> --output article.md --download-media101102# Wait for interaction (login/CAPTCHA) — auto-detect and continue103${READER} <url> --wait-for interaction --output article.md104105# Wait for interaction — manual control (Enter to continue)106${READER} <url> --wait-for force --output article.md107108# JSON output109${READER} <url> --format json --output article.json110111# Force specific adapter112${READER} <url> --adapter youtube --output transcript.md113```114115## Options116117| Option | Description |118|--------|-------------|119| `<url>` | URL to fetch |120| `--output <path>` | Output file path (default: stdout) |121| `--format <type>` | Output format: `markdown` (default) or `json` |122| `--json` | Shorthand for `--format json` |123| `--adapter <name>` | Force adapter: `x`, `youtube`, `hn`, or `generic` (default: auto-detect) |124| `--headless` | Force headless Chrome (no visible window) |125| `--wait-for <mode>` | Interaction wait mode: `none` (default), `interaction`, or `force` |126| `--wait-for-interaction` | Alias for `--wait-for interaction` |127| `--wait-for-login` | Alias for `--wait-for interaction` |128| `--timeout <ms>` | Page load timeout (default: 30000) |129| `--interaction-timeout <ms>` | Login/CAPTCHA wait timeout (default: 600000 = 10 min) |130| `--interaction-poll-interval <ms>` | Poll interval for interaction checks (default: 1500) |131| `--download-media` | Download images/videos to local `imgs/` and `videos/`, rewrite markdown links. Requires `--output` |132| `--media-dir <dir>` | Base directory for downloaded media (default: same as `--output` directory) |133| `--cdp-url <url>` | Reuse existing Chrome DevTools Protocol endpoint |134| `--browser-path <path>` | Custom Chrome/Chromium binary path |135| `--chrome-profile-dir <path>` | Chrome user data directory (default: `BAOYU_CHROME_PROFILE_DIR` env or `./baoyu-skills/chrome-profile`) |136| `--debug-dir <dir>` | Write debug artifacts (document.json, markdown.md, page.html, network.json) |137138## Agent Quality Gate139140**CRITICAL**: treat default headless capture as provisional. Some sites render differently in headless mode and can silently return low-quality content without failing the CLI.141142After every headless run, inspect the saved markdown. See [references/quality-gate.md](references/quality-gate.md) for the full checklist, recovery workflow, and capture-mode table. Read it whenever a run looks suspicious or the user asks about login/CAPTCHA handling.143144## Output Path Generation145146The agent must construct the output file path — `baoyu-fetch` does not auto-generate paths.147148**Algorithm**:1491. Determine base directory from EXTEND.md `default_output_dir` or default `./url-to-markdown/`1502. Extract domain from URL (e.g., `example.com`)1513. Generate slug from URL path or page title (kebab-case, 2-6 words)1524. Construct: `{base_dir}/{domain}/{slug}/{slug}.md` — each URL gets its own directory so media files stay isolated1535. Conflict resolution: append timestamp `{slug}-YYYYMMDD-HHMMSS/{slug}-YYYYMMDD-HHMMSS.md`154155Pass the constructed path to `--output`. Media files (`--download-media`) are saved into subdirectories next to the markdown file, keeping each URL's assets self-contained.156157## Adapters & Media158159See [references/adapters.md](references/adapters.md) for the adapter catalog (X, YouTube, Hacker News, generic), per-adapter notes, the media download flow (`ask` / always / never), and the JSON output schema. Read it before answering adapter-specific questions or handling media prompts.160161## Environment Variables162163| Variable | Description |164|----------|-------------|165| `BAOYU_CHROME_PROFILE_DIR` | Chrome user data directory (can also use `--chrome-profile-dir`) |166167**Troubleshooting**: Chrome not found → use `--browser-path`. Timeout → increase `--timeout`. Login/CAPTCHA → `--wait-for interaction`. Debug → `--debug-dir` to inspect captured HTML and network logs.168169## Extension Support170171Custom configurations via EXTEND.md. See **Preferences** section above for paths and supported keys.172