Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/README.md
1# Researcher Operating System23This directory defines the repo-native workflow for turning external research into skill changes. It is intentionally file-based so agents can inspect, resume, and audit work without requiring a hosted scheduler.45## Mission67Maintain this repository as the source of truth for context engineering and harness engineering by continuously:891. Discovering credible papers, engineering posts, benchmark reports, and lab notes.102. Evaluating sources against explicit rubrics.113. Extracting implementable mechanisms, not generic takeaways.124. Mapping mechanisms to new skills, existing skill updates, or reference-only notes.135. Preparing reviewable PRs after gates pass.1415Agents may prepare branches and PR content after passing gates, but humans decide what merges. No workflow in this directory authorizes auto-merge.1617## Lifecycle1819```text20discover -> triage -> evaluate -> extract -> map -> draft -> validate -> prepare-pr -> human-merge21```2223| Stage | Output |24| --- | --- |25| Discover | Candidate source with URL, author, date, and why it matters |26| Triage | Source class and exclusion check from `source-registry.md` |27| Evaluate | JSON matching `templates/source-evaluation.json` |28| Extract | Mechanisms, artifacts, evidence, and failure modes |29| Map | Skill proposal using `templates/skill-proposal.md` |30| Draft | Skill or reference changes in normal repo structure |31| Validate | Rubric scores plus deterministic structure checks |32| Prepare PR | PR-ready summary, test plan, and unresolved review notes |3334## Directory Map3536- `source-registry.md` - source classes, priorities, and rejection rules.37- `mechanisms/registry.jsonl` - accepted mechanisms used for novelty and skill-delta checks.38- `mechanisms/ledgers/` - append-only accepted and rejected mechanism promotion events.39- `claims/index.jsonl` - provenance for volatile, numeric, or benchmark claims.40- `corpus/index.json` - machine-readable map of skills, mechanisms, claims, and activation scenarios.41- `benchmarks/` - adversarial scenarios and goldens for the researcher harness.42- `rubrics/content-curation.md` - gates for accepting external content.43- `rubrics/skill-change.md` - gates for changing skills.44- `rubrics/harness-change.md` - gates for changing research or evaluation harnesses.45- `rubrics/pairwise-skill-revision.md` - comparison rubric for competing skill drafts.46- `templates/source-evaluation.json` - machine-readable evaluation shape.47- `templates/skill-proposal.md` - source-to-skill delta proposal format.48- `templates/mechanism-proposal.jsonl` - run-local mechanism promotion proposal format.49- `templates/research-thread.md` - durable thread log for long-running agents.50- `runbooks/autonomous-research-loop.md` - operating loop for autonomous researchers.51- `runbooks/pr-readiness.md` - pre-PR checklist.52- `scripts/validate_repo.py` - deterministic repository and harness validator.53- `scripts/validate_run.py` - publish-readiness validator for a single research run.54- `scripts/research_loop.py` - creates durable run directories and validation reports.55- `scripts/novelty_check.py` - checks proposal overlap against existing skills and prior runs.56- `scripts/compare_skill_revisions.py` - deterministic pre-check for pairwise skill revisions.57- `scripts/check_activation_cases.py` - deterministic activation-boundary regression checks.58- `scripts/run_benchmarks.py` - deterministic benchmark harness with optional history recording.5960## Governance Rules61621. Keep rubrics harder to change than outputs. A source cannot relax the rubric used to admit it.632. Cite only retrieved sources. If a source failed to load, record the failure and do not cite it as evidence.643. Separate source quality from skill quality. A strong paper may still produce no actionable skill delta.654. Prefer updating existing skills over adding new ones unless the activation scenario, mechanism, and operating procedure are distinct.665. Require human review when evidence is anecdotal, source claims are volatile, or a skill change affects repo-wide guidance.676. Keep all generated skill changes aligned with `template/SKILL.md`, the 500-line cap, and manifest sync rules.6869## Local Commands7071```bash72python researcher/scripts/validate_repo.py73python researcher/scripts/validate_run.py --run-dir researcher/runs/<run-id>74python researcher/scripts/research_loop.py init --title "Source title" --url "https://example.com/source"75python researcher/scripts/novelty_check.py --file researcher/fixtures/skill-proposals/harness-engineering-proposal.md76python researcher/scripts/compare_skill_revisions.py skills/evaluation/SKILL.md skills/advanced-evaluation/SKILL.md77python researcher/scripts/check_activation_cases.py78python researcher/scripts/run_benchmarks.py79```8081## Current Published Research Skills8283The first published skill from this operating system is `harness-engineering`. Skill evolution remains internal to this directory until the process has enough examples and validation data to justify a standalone published skill.84