Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/rubrics/pairwise-skill-revision.md
1# Pairwise Skill Revision Rubric23Use this rubric when comparing two candidate revisions of the same skill or two competing new-skill drafts.45## Preconditions67- Both candidates use the same source evidence.8- Both candidates target the same activation scenario.9- Both candidates pass deterministic structure checks.10- Neither candidate changes the rubric used to compare it.1112## Dimensions1314Score each candidate independently from 0 to 2, then compare.1516| Dimension | Weight | Score 2 |17| --- | --- | --- |18| Behavioral Improvement | 35% | The candidate gives future agents clearer actions, decisions, or recovery paths |19| Evidence Fidelity | 20% | Claims are grounded in retrieved sources and do not overstate evidence |20| Activation Clarity | 15% | The description and When to Activate section route the skill cleanly |21| Corpus Fit | 15% | The candidate avoids duplication and respects related skill boundaries |22| Simplicity | 15% | The candidate achieves the improvement with fewer concepts, fewer lines, and less maintenance burden |2324## Tie Breakers2526If totals are within 0.1:27281. Prefer the simpler candidate.292. Prefer the candidate with fewer volatile claims in `SKILL.md`.303. Prefer the candidate that updates an existing skill over adding a new one.314. Prefer the candidate with clearer gotchas.325. Route to human review if the tie remains.3334## Required Output3536```yaml37candidate_a:38path: ""39weighted_total: 0.040strengths: []41risks: []42candidate_b:43path: ""44weighted_total: 0.045strengths: []46risks: []47winner: "A | B | tie | human_review"48tie_breaker_used: ""49decision_rationale: ""50```5152## Failure Modes53541. **Verbose candidate wins by judge bias**: Penalize irrelevant detail under Simplicity.552. **Evidence drift**: Reject claims that do not map back to retrieved sources.563. **False novelty**: Compare against existing skills before scoring either candidate.574. **Prompt-only comparison**: Run deterministic structure checks before rubric scoring.58