Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Extract clean markdown or text content from up to 20 URLs via Tavily CLI, including JavaScript-rendered pages.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
SKILL.md
1---2name: tavily-extract3description: |4Extract clean markdown or text content from specific URLs via the Tavily CLI. Use this skill when the user has one or more URLs and wants their content, says "extract", "grab the content from", "pull the text from", "get the page at", "read this webpage", or needs clean text from web pages. Handles JavaScript-rendered pages, returns LLM-optimized markdown, and supports query-focused chunking for targeted extraction. Can process up to 20 URLs in a single call.5allowed-tools: Bash(tvly *)6---78# tavily extract910Extract clean markdown or text content from one or more URLs.1112## Before running any command1314If `tvly` is not found on PATH, install it first:1516```bash17curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login18```1920Do not skip this step or fall back to other tools.2122See [tavily-cli](../tavily-cli/SKILL.md) for alternative install methods and auth options.2324## When to use2526- You have a specific URL and want its content27- You need text from JavaScript-rendered pages28- Step 2 in the [workflow](../tavily-cli/SKILL.md): search → **extract** → map → crawl → research2930## Quick start3132```bash33# Single URL34tvly extract "https://example.com/article" --json3536# Multiple URLs37tvly extract "https://example.com/page1" "https://example.com/page2" --json3839# Query-focused extraction (returns relevant chunks only)40tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json4142# JS-heavy pages43tvly extract "https://app.example.com" --extract-depth advanced --json4445# Save to file46tvly extract "https://example.com/article" -o article.md47```4849## Options5051| Option | Description |52|--------|-------------|53| `--query` | Rerank chunks by relevance to this query |54| `--chunks-per-source` | Chunks per URL (1-5, requires `--query`) |55| `--extract-depth` | `basic` (default) or `advanced` (for JS pages) |56| `--format` | `markdown` (default) or `text` |57| `--include-images` | Include image URLs |58| `--timeout` | Max wait time (1-60 seconds) |59| `-o, --output` | Save output to file |60| `--json` | Structured JSON output |6162## Extract depth6364| Depth | When to use |65|-------|-------------|66| `basic` | Simple pages, fast — try this first |67| `advanced` | JS-rendered SPAs, dynamic content, tables |6869## Tips7071- **Max 20 URLs per request** — batch larger lists into multiple calls.72- **Use `--query` + `--chunks-per-source`** to get only relevant content instead of full pages.73- **Try `basic` first**, fall back to `advanced` if content is missing.74- **Set `--timeout`** for slow pages (up to 60s).75- If search results already contain the content you need (via `--include-raw-content`), skip the extract step.7677## See also7879- [tavily-search](../tavily-search/SKILL.md) — find pages when you don't have a URL80- [tavily-crawl](../tavily-crawl/SKILL.md) — extract content from many pages on a site81