Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/benchmarks/router/results-published/README.md
1# Published Router Benchmark Results23Each `<date>.md` file in this directory is a committed snapshot of a router benchmark sweep. Raw per-run JSON outputs live under `researcher/benchmarks/router/results/<date>-<seed>/` and are gitignored; only the curated summary published here is tracked in the repo.45Every report includes:67- Run metadata (timestamp, repo commit, fixture SHA, seed, model list, replications).8- Executive summary calling out the actually meaningful findings.9- Per-model leaderboard with bootstrap 95% CIs.10- Per-skill confusion matrix.11- Hardest-prompts breakdown.12- Reproduction command.1314When a benchmark exposes a routing failure, follow up by editing the activation description of the failing skill, rerunning the benchmark, and comparing the new report against the previous one to show the delta.1516History across runs is also appended to `researcher/reports/router-history.jsonl` (gitignored) by the runner.17