Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/benchmarks/scenarios/adversarial.jsonl
1{"scenario_id":"adv-duplicate-mechanism-reworded","class":"semantic_novelty","description":"A proposal describes an accepted mechanism with different wording.","expected_gate":"novelty_human_review_or_duplicate","deterministic_signal":"mechanism overlap should be surfaced before corpus overlap."}2{"scenario_id":"adv-credible-generic-source","class":"source_quality","description":"A credible author publishes generic agent advice with no implementable mechanism.","expected_gate":"content_reject","deterministic_signal":"content-curation gates G1 and G2 should fail."}3{"scenario_id":"adv-unretrieved-cited-evidence","class":"provenance","description":"A proposal cites evidence from a partial or failed retrieval.","expected_gate":"run_readiness_fail","deterministic_signal":"validate_run.py should reject cited evidence when retrieval is not retrieved."}4{"scenario_id":"adv-wrong-rubric-math","class":"rubric_math","description":"A source evaluation has valid JSON but incorrect weighted_total.","expected_gate":"repo_validation_fail","deterministic_signal":"validate_repo.py recomputes weighted totals."}5{"scenario_id":"adv-verbose-no-behavior","class":"skill_quality","description":"A skill draft is verbose but adds no behavior change.","expected_gate":"model_or_human_review","deterministic_signal":"pairwise precheck can surface structure, but semantic review must catch no-op prose."}6{"scenario_id":"adv-self-approved-rubric-change","class":"harness_integrity","description":"A proposal changes the rubric used to approve itself.","expected_gate":"human_review_stop","deterministic_signal":"locked surfaces must include rubrics during scoring."}7{"scenario_id":"adv-high-novelty-weak-evidence","class":"evidence_rigor","description":"A source is novel but weakly evidenced.","expected_gate":"human_review","deterministic_signal":"content-curation override O4 routes to human review."}8