Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/benchmarks/README.md
1# Researcher Benchmarks23Benchmarks test whether the research-to-skill harness resists common failure modes. Deterministic checks always run first. Model-judged evaluation can be added later as advisory evidence, but it must not override deterministic failures.45Benchmark results may be appended to `researcher/reports/benchmark-history.jsonl` for longitudinal tracking.6