Source from repo

tavily crawl

Bulk-extract content from multiple pages of a website via Tavily CLI with depth, breadth, path filters, and markdown output.

tavily-aiGitHub tavily-aiOfficialSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

3.9 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

SKILL.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown101 linesEntrypointFree

SKILL.md

1---
2name: tavily-crawl
3description: |
4  Crawl websites and extract content from multiple pages via the Tavily CLI. Use this skill when the user wants to crawl a site, download documentation, extract an entire docs section, bulk-extract pages, save a site as local markdown files, or says "crawl", "get all the pages", "download the docs", "extract everything under /docs", "bulk extract", or needs content from many pages on the same domain. Supports depth/breadth control, path filtering, semantic instructions, and saving each page as a local markdown file.
5allowed-tools: Bash(tvly *)
6---
7 
8# tavily crawl
9 
10Crawl a website and extract content from multiple pages. Supports saving each page as a local markdown file.
11 
12## Before running any command
13 
14If `tvly` is not found on PATH, install it first:
15 
16```bash
17curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login
18```
19 
20Do not skip this step or fall back to other tools.
21 
22See [tavily-cli](../tavily-cli/SKILL.md) for alternative install methods and auth options.
23 
24## When to use
25 
26- You need content from many pages on a site (e.g., all `/docs/`)
27- You want to download documentation for offline use
28- Step 4 in the [workflow](../tavily-cli/SKILL.md): search → extract → map → **crawl** → research
29 
30## Quick start
31 
32```bash
33# Basic crawl
34tvly crawl "https://docs.example.com" --json
35 
36# Save each page as a markdown file
37tvly crawl "https://docs.example.com" --output-dir ./docs/
38 
39# Deeper crawl with limits
40tvly crawl "https://docs.example.com" --max-depth 2 --limit 50 --json
41 
42# Filter to specific paths
43tvly crawl "https://example.com" --select-paths "/api/.*,/guides/.*" --exclude-paths "/blog/.*" --json
44 
45# Semantic focus (returns relevant chunks, not full pages)
46tvly crawl "https://docs.example.com" --instructions "Find authentication docs" --chunks-per-source 3 --json
47```
48 
49## Options
50 
51| Option | Description |
52|--------|-------------|
53| `--max-depth` | Levels deep (1-5, default: 1) |
54| `--max-breadth` | Links per page (default: 20) |
55| `--limit` | Total pages cap (default: 50) |
56| `--instructions` | Natural language guidance for semantic focus |
57| `--chunks-per-source` | Chunks per page (1-5, requires `--instructions`) |
58| `--extract-depth` | `basic` (default) or `advanced` |
59| `--format` | `markdown` (default) or `text` |
60| `--select-paths` | Comma-separated regex patterns to include |
61| `--exclude-paths` | Comma-separated regex patterns to exclude |
62| `--select-domains` | Comma-separated regex for domains to include |
63| `--exclude-domains` | Comma-separated regex for domains to exclude |
64| `--allow-external / --no-external` | Include external links (default: allow) |
65| `--include-images` | Include images |
66| `--timeout` | Max wait (10-150 seconds) |
67| `-o, --output` | Save JSON output to file |
68| `--output-dir` | Save each page as a .md file in directory |
69| `--json` | Structured JSON output |
70 
71## Crawl for context vs. data collection
72 
73**For agentic use** (feeding results to an LLM):
74 
75Always use `--instructions` + `--chunks-per-source`. Returns only relevant chunks instead of full pages — prevents context explosion.
76 
77```bash
78tvly crawl "https://docs.example.com" --instructions "API authentication" --chunks-per-source 3 --json
79```
80 
81**For data collection** (saving to files):
82 
83Use `--output-dir` without `--chunks-per-source` to get full pages as markdown files.
84 
85```bash
86tvly crawl "https://docs.example.com" --max-depth 2 --output-dir ./docs/
87```
88 
89## Tips
90 
91- **Start conservative** — `--max-depth 1`, `--limit 20` — and scale up.
92- **Use `--select-paths`** to focus on the section you need.
93- **Use map first** to understand site structure before a full crawl.
94- **Always set `--limit`** to prevent runaway crawls.
95 
96## See also
97 
98- [tavily-map](../tavily-map/SKILL.md) — discover URLs before deciding to crawl
99- [tavily-extract](../tavily-extract/SKILL.md) — extract individual pages
100- [tavily-search](../tavily-search/SKILL.md) — find pages when you don't have a URL
101

Preparing the source view

tavily crawl

SKILL.md