Source from repo

Agent Skills for Context Engineering

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.

muratcankoylanGitHub muratcankoylanSource repo Original GitHub link

Files

339

Skill

n/a

Size

4.3 MB

Entrypoint

SKILL.md

Format

git-repo

Open file

researcher/benchmarks/router/README.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown35 linesFree

researcher/benchmarks/router/README.md

1# Router Benchmark (Stage 2)
2 
3Tests whether the activation-scenario descriptions in v2.2.0 skill frontmatter are good enough to route the right skill to a given prompt.
4 
5See `researcher/benchmarks/PLAN.md` for full methodology.
6 
7## Files
8 
9- `prompts.jsonl`: ground-truth prompts. Each line has `prompt_id`, `prompt`, `expected_primary_skill`, optional `acceptable_secondary_skills` and `rejected_skills`, and a `reason`.
10- `routing-prompt.md`: the template given to the LLM. Uses `{{SKILL_BLOCK}}`, `{{USER_PROMPT}}`, `{{SKILL_COUNT}}` placeholders.
11- `results/<date>-<seed>/`: per-run JSON outputs (gitignored).
12 
13## Running
14 
15From the SDK runner:
16 
17```bash
18cd researcher/benchmarks/sdk-runner
19npm install
20npm run router:dry-run                       # see the plan and cost forecast
21npm run router:run -- --max-budget-usd 5     # execute (after exporting CURSOR_API_KEY)
22```
23 
24## Ground truth
25 
26Initial fixtures are 50 prompts covering:
27 
28- Single-skill positive controls (one per skill, 15 cases)
29- Adversarial boundary pairs from the v2.2.0 boundary-confusion list (15 cases across 5 pairs x 3 variants)
30- Combined-skill prompts where multiple are acceptable (10 cases)
31- Negative controls where no skill should fit well (5 cases)
32- Subtle activation cases that should still resolve (5 cases)
33 
34Expand to 100 by adding prompts as new boundary confusions surface in the wild.
35

Preparing the source view

Agent Skills for Context Engineering

researcher/benchmarks/router/README.md