Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
examples/llm-as-judge-skills/tools/research/read-url.md
1# Read URL Tool23## Purpose45Extract and parse content from a given URL. Returns structured text content with metadata about the source.67## Tool Definition89```typescript10import { tool } from "ai";11import { z } from "zod";1213export const readUrl = tool({14description: `Read and extract content from a URL.15Returns the main text content, stripped of navigation and ads.16Use after webSearch to get full content from relevant results.`,1718parameters: z.object({19url: z.string().url()20.describe("The URL to read"),2122contentType: z.enum(["auto", "article", "documentation", "paper", "code"]).default("auto")23.describe("Hint for content type to optimize extraction"),2425maxLength: z.number().min(1000).max(50000).default(10000)26.describe("Maximum characters to return"),2728extractSections: z.boolean().default(true)29.describe("Whether to identify and label sections"),3031includeMetadata: z.boolean().default(true)32.describe("Include author, date, and other metadata")33}),3435execute: async (input) => {36return extractUrlContent(input);37}38});39```4041## Input Schema4243| Field | Type | Required | Description |44|-------|------|----------|-------------|45| url | string | Yes | URL to read |46| contentType | enum | No | Content type hint |47| maxLength | number | No | Max chars (default: 10000) |48| extractSections | boolean | No | Label sections |49| includeMetadata | boolean | No | Include metadata |5051## Output Schema5253```typescript54interface ReadUrlResult {55success: boolean;5657url: string;58title: string;5960content: {61full: string;62sections?: {63heading: string;64level: number; // h1=1, h2=2, etc.65content: string;66}[];67};6869metadata?: {70author?: string;71publishedDate?: string;72lastModified?: string;73description?: string;74keywords?: string[];75source: string;76};7778stats: {79totalCharacters: number;80truncated: boolean;81sectionsFound: number;82};8384error?: {85code: string;86message: string;87};88}89```9091## Usage Example9293```typescript94const content = await readUrl.execute({95url: "https://eugeneyan.com/writing/llm-evaluators/",96contentType: "article",97maxLength: 15000,98extractSections: true,99includeMetadata: true100});101102// Result:103// {104// success: true,105// url: "https://eugeneyan.com/writing/llm-evaluators/",106// title: "Evaluating the Effectiveness of LLM-Evaluators",107// content: {108// full: "LLM-evaluators, also known as LLM-as-a-Judge...",109// sections: [110// {111// heading: "Key considerations before adopting an LLM-evaluator",112// level: 2,113// content: "Before reviewing the literature..."114// },115// ...116// ]117// },118// metadata: {119// author: "Eugene Yan",120// publishedDate: "2024-06-15",121// source: "eugeneyan.com"122// },123// stats: {124// totalCharacters: 15000,125// truncated: true,126// sectionsFound: 8127// }128// }129```130131## Content Type Handling132133| Type | Optimization |134|------|-------------|135| article | Prioritize main content, skip sidebars |136| documentation | Preserve code blocks, keep structure |137| paper | Extract abstract, sections, references |138| code | Preserve formatting, syntax highlighting |139| auto | Detect type from content |140141## Error Handling142143```typescript144const errorCodes = {145"URL_NOT_FOUND": "Page does not exist (404)",146"ACCESS_DENIED": "Page requires authentication (401/403)",147"TIMEOUT": "Request timed out",148"BLOCKED": "Access blocked by robots.txt or rate limit",149"INVALID_CONTENT": "Content could not be parsed",150"UNSUPPORTED_TYPE": "Content type not supported (e.g., binary)"151};152```153154## Implementation Notes1551561. **Respect robots.txt**: Check and honor robots.txt directives1572. **Rate Limiting**: Don't hammer the same domain1583. **User Agent**: Use appropriate user agent string1594. **Timeouts**: Set reasonable timeouts (10-30s)1605. **JavaScript Rendering**: Consider headless browser for JS-heavy sites1616. **Caching**: Cache content for repeated reads162163