Researcher Operating System
This directory defines the repo-native workflow for turning external research into skill changes. It is intentionally file-based so agents can inspect, resume, and audit work without requiring a hosted scheduler.
Mission
Maintain this repository as the source of truth for context engineering and harness engineering by continuously:
- Discovering credible papers, engineering posts, benchmark reports, and lab notes.
- Evaluating sources against explicit rubrics.
- Extracting implementable mechanisms, not generic takeaways.
- Mapping mechanisms to new skills, existing skill updates, or reference-only notes.
- Preparing reviewable PRs after gates pass.
Agents may prepare branches and PR content after passing gates, but humans decide what merges. No workflow in this directory authorizes auto-merge.
Lifecycle
discover -> triage -> evaluate -> extract -> map -> draft -> validate -> prepare-pr -> human-merge| Stage | Output |
|---|---|
| Discover | Candidate source with URL, author, date, and why it matters |
| Triage | Source class and exclusion check from source-registry.md |
| Evaluate | JSON matching templates/source-evaluation.json |
| Extract | Mechanisms, artifacts, evidence, and failure modes |
| Map | Skill proposal using templates/skill-proposal.md |
| Draft | Skill or reference changes in normal repo structure |
| Validate | Rubric scores plus deterministic structure checks |
| Prepare PR | PR-ready summary, test plan, and unresolved review notes |
Directory Map
source-registry.md- source classes, priorities, and rejection rules.mechanisms/registry.jsonl- accepted mechanisms used for novelty and skill-delta checks.mechanisms/ledgers/- append-only accepted and rejected mechanism promotion events.claims/index.jsonl- provenance for volatile, numeric, or benchmark claims.corpus/index.json- machine-readable map of skills, mechanisms, claims, and activation scenarios.benchmarks/- adversarial scenarios and goldens for the researcher harness.rubrics/content-curation.md- gates for accepting external content.rubrics/skill-change.md- gates for changing skills.rubrics/harness-change.md- gates for changing research or evaluation harnesses.rubrics/pairwise-skill-revision.md- comparison rubric for competing skill drafts.templates/source-evaluation.json- machine-readable evaluation shape.templates/skill-proposal.md- source-to-skill delta proposal format.templates/mechanism-proposal.jsonl- run-local mechanism promotion proposal format.templates/research-thread.md- durable thread log for long-running agents.runbooks/autonomous-research-loop.md- operating loop for autonomous researchers.runbooks/pr-readiness.md- pre-PR checklist.scripts/validate_repo.py- deterministic repository and harness validator.scripts/validate_run.py- publish-readiness validator for a single research run.scripts/research_loop.py- creates durable run directories and validation reports.scripts/novelty_check.py- checks proposal overlap against existing skills and prior runs.scripts/compare_skill_revisions.py- deterministic pre-check for pairwise skill revisions.scripts/check_activation_cases.py- deterministic activation-boundary regression checks.scripts/run_benchmarks.py- deterministic benchmark harness with optional history recording.
Governance Rules
- Keep rubrics harder to change than outputs. A source cannot relax the rubric used to admit it.
- Cite only retrieved sources. If a source failed to load, record the failure and do not cite it as evidence.
- Separate source quality from skill quality. A strong paper may still produce no actionable skill delta.
- Prefer updating existing skills over adding new ones unless the activation scenario, mechanism, and operating procedure are distinct.
- Require human review when evidence is anecdotal, source claims are volatile, or a skill change affects repo-wide guidance.
- Keep all generated skill changes aligned with
template/SKILL.md, the 500-line cap, and manifest sync rules.
Local Commands
python researcher/scripts/validate_repo.py
python researcher/scripts/validate_run.py --run-dir researcher/runs/<run-id>
python researcher/scripts/research_loop.py init --title "Source title" --url "https://example.com/source"
python researcher/scripts/novelty_check.py --file researcher/fixtures/skill-proposals/harness-engineering-proposal.md
python researcher/scripts/compare_skill_revisions.py skills/evaluation/SKILL.md skills/advanced-evaluation/SKILL.md
python researcher/scripts/check_activation_cases.py
python researcher/scripts/run_benchmarks.pyCurrent Published Research Skills
The first published skill from this operating system is harness-engineering. Skill evolution remains internal to this directory until the process has enough examples and validation data to justify a standalone published skill.