Pairwise Skill Revision Rubric

Use this rubric when comparing two candidate revisions of the same skill or two competing new-skill drafts.

Preconditions

Both candidates use the same source evidence.
Both candidates target the same activation scenario.
Both candidates pass deterministic structure checks.
Neither candidate changes the rubric used to compare it.

Dimensions

Score each candidate independently from 0 to 2, then compare.

Dimension	Weight	Score 2
Behavioral Improvement	35%	The candidate gives future agents clearer actions, decisions, or recovery paths
Evidence Fidelity	20%	Claims are grounded in retrieved sources and do not overstate evidence
Activation Clarity	15%	The description and When to Activate section route the skill cleanly
Corpus Fit	15%	The candidate avoids duplication and respects related skill boundaries
Simplicity	15%	The candidate achieves the improvement with fewer concepts, fewer lines, and less maintenance burden

Tie Breakers

If totals are within 0.1:

Prefer the simpler candidate.
Prefer the candidate with fewer volatile claims in SKILL.md.
Prefer the candidate that updates an existing skill over adding a new one.
Prefer the candidate with clearer gotchas.
Route to human review if the tie remains.

Required Output

candidate_a:
  path: ""
  weighted_total: 0.0
  strengths: []
  risks: []
candidate_b:
  path: ""
  weighted_total: 0.0
  strengths: []
  risks: []
winner: "A | B | tie | human_review"
tie_breaker_used: ""
decision_rationale: ""

Failure Modes

Verbose candidate wins by judge bias: Penalize irrelevant detail under Simplicity.
Evidence drift: Reject claims that do not map back to retrieved sources.
False novelty: Compare against existing skills before scoring either candidate.
Prompt-only comparison: Run deterministic structure checks before rubric scoring.

Preparing the source view

Agent Skills for Context Engineering

researcher/rubrics/pairwise-skill-revision.md

Pairwise Skill Revision Rubric

Preconditions

Dimensions

Tie Breakers

Required Output

Failure Modes