Source from repo

tavily extract

Extract clean markdown or text content from up to 20 URLs via Tavily CLI, including JavaScript-rendered pages.

tavily-aiGitHub tavily-aiOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

2.8 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

SKILL.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown81 linesEntrypointFree

SKILL.md

1---
2name: tavily-extract
3description: |
4  Extract clean markdown or text content from specific URLs via the Tavily CLI. Use this skill when the user has one or more URLs and wants their content, says "extract", "grab the content from", "pull the text from", "get the page at", "read this webpage", or needs clean text from web pages. Handles JavaScript-rendered pages, returns LLM-optimized markdown, and supports query-focused chunking for targeted extraction. Can process up to 20 URLs in a single call.
5allowed-tools: Bash(tvly *)
6---
7 
8# tavily extract
9 
10Extract clean markdown or text content from one or more URLs.
11 
12## Before running any command
13 
14If `tvly` is not found on PATH, install it first:
15 
16```bash
17curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login
18```
19 
20Do not skip this step or fall back to other tools.
21 
22See [tavily-cli](../tavily-cli/SKILL.md) for alternative install methods and auth options.
23 
24## When to use
25 
26- You have a specific URL and want its content
27- You need text from JavaScript-rendered pages
28- Step 2 in the [workflow](../tavily-cli/SKILL.md): search → **extract** → map → crawl → research
29 
30## Quick start
31 
32```bash
33# Single URL
34tvly extract "https://example.com/article" --json
35 
36# Multiple URLs
37tvly extract "https://example.com/page1" "https://example.com/page2" --json
38 
39# Query-focused extraction (returns relevant chunks only)
40tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json
41 
42# JS-heavy pages
43tvly extract "https://app.example.com" --extract-depth advanced --json
44 
45# Save to file
46tvly extract "https://example.com/article" -o article.md
47```
48 
49## Options
50 
51| Option | Description |
52|--------|-------------|
53| `--query` | Rerank chunks by relevance to this query |
54| `--chunks-per-source` | Chunks per URL (1-5, requires `--query`) |
55| `--extract-depth` | `basic` (default) or `advanced` (for JS pages) |
56| `--format` | `markdown` (default) or `text` |
57| `--include-images` | Include image URLs |
58| `--timeout` | Max wait time (1-60 seconds) |
59| `-o, --output` | Save output to file |
60| `--json` | Structured JSON output |
61 
62## Extract depth
63 
64| Depth | When to use |
65|-------|-------------|
66| `basic` | Simple pages, fast — try this first |
67| `advanced` | JS-rendered SPAs, dynamic content, tables |
68 
69## Tips
70 
71- **Max 20 URLs per request** — batch larger lists into multiple calls.
72- **Use `--query` + `--chunks-per-source`** to get only relevant content instead of full pages.
73- **Try `basic` first**, fall back to `advanced` if content is missing.
74- **Set `--timeout`** for slow pages (up to 60s).
75- If search results already contain the content you need (via `--include-raw-content`), skip the extract step.
76 
77## See also
78 
79- [tavily-search](../tavily-search/SKILL.md) — find pages when you don't have a URL
80- [tavily-crawl](../tavily-crawl/SKILL.md) — extract content from many pages on a site
81

Preparing the source view

tavily extract

SKILL.md