Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/benchmarks/router/README.md
1# Router Benchmark (Stage 2)23Tests whether the activation-scenario descriptions in v2.2.0 skill frontmatter are good enough to route the right skill to a given prompt.45See `researcher/benchmarks/PLAN.md` for full methodology.67## Files89- `prompts.jsonl`: ground-truth prompts. Each line has `prompt_id`, `prompt`, `expected_primary_skill`, optional `acceptable_secondary_skills` and `rejected_skills`, and a `reason`.10- `routing-prompt.md`: the template given to the LLM. Uses `{{SKILL_BLOCK}}`, `{{USER_PROMPT}}`, `{{SKILL_COUNT}}` placeholders.11- `results/<date>-<seed>/`: per-run JSON outputs (gitignored).1213## Running1415From the SDK runner:1617```bash18cd researcher/benchmarks/sdk-runner19npm install20npm run router:dry-run # see the plan and cost forecast21npm run router:run -- --max-budget-usd 5 # execute (after exporting CURSOR_API_KEY)22```2324## Ground truth2526Initial fixtures are 50 prompts covering:2728- Single-skill positive controls (one per skill, 15 cases)29- Adversarial boundary pairs from the v2.2.0 boundary-confusion list (15 cases across 5 pairs x 3 variants)30- Combined-skill prompts where multiple are acceptable (10 cases)31- Negative controls where no skill should fit well (5 cases)32- Subtle activation cases that should still resolve (5 cases)3334Expand to 100 by adding prompts as new boundary confusions surface in the wild.35