Autonomous Research Loop
This runbook defines how an agent should operate when asked to find research and turn it into repo changes.
Setup
- Create a run ID with
python researcher/scripts/research_loop.py init --title "..." --url "...". - Read
../source-registry.mdand select source classes for the mission. - Read
../mechanisms/registry.jsonlto understand accepted mechanisms before claiming novelty. - Read the relevant rubrics before evaluating anything.
- Declare locked surfaces: rubrics, manifests, mechanism registry, and merge policy are not editable during scoring.
- Declare editable surfaces: evaluations, proposals, drafts, run-local mechanism proposals, and append-only logs.
Loop
Repeat until source queue is empty or the human stops the run:
- Discover candidates from the source registry.
- Fetch primary sources whenever available and record them with
research_loop.py retrieve. - Record retrieval status before evaluating.
- Apply
../rubrics/content-curation.md. - Reject failed gates immediately and log why.
- For approved or reviewed sources, extract mechanisms and artifacts into the proposal.
- Apply
../rubrics/skill-change.mdor../rubrics/harness-change.md. - Draft a proposal with
../templates/skill-proposal.mdand any mechanism proposals with../templates/mechanism-proposal.jsonl. - Run
python researcher/scripts/research_loop.py novelty --run-dir <run>before changing published skills; registry overlap is the primary duplicate signal. - If multiple drafts compete, apply
../rubrics/pairwise-skill-revision.mdand runcompare_skill_revisions.py. - If the proposal passes, prepare repo changes in normal repo structure.
- Run deterministic repo and run-readiness validation and record results.
- Prepare PR summary and test plan, but do not merge.
- Close the run with
accepted,rejected,reference-only, orabandonedrationale.
Novelty And Refresh Rules
- Before drafting a new skill, compare against accepted mechanisms and existing skill boundaries.
- Use
novelty_check.pyas a fast mechanism-overlap gate, then apply human or LLM judgment for semantic novelty. - For long-running runs, refresh upstream sources before finalizing a proposal.
- Preserve rejected ideas so future agents do not rediscover the same failed path.
- Require a pruning pass when a proposal adds multiple rules or concepts. Remove any piece that does not change behavior.
- Store raw source exports under the run's
sources/evidence/raw/directory, never at the repository root. - Promote accepted or candidate mechanisms only through
research_loop.py promote-mechanismsafter run readiness and recorded human review.
Failure Handling
| Failure | Action |
|---|---|
| Source fetch fails | Retry once with an alternate URL, then record partial or failed |
| JSON evaluation invalid | Save raw output and route to human review |
| Evidence weak but relevant | Route to human review, do not publish automatically |
| Skill draft exceeds 500 lines | Move detail to references or reject the draft |
| Manifest sync uncertain | Stop and request human review before PR |
| Conflicting sources | Record both claims and prefer no published change until resolved |
PR Preparation Policy
Agents may prepare PRs only after:
- Content and skill or harness rubrics pass.
- Deterministic checks pass.
- Every source cited in the change was retrieved.
- The PR body includes unresolved risks.
- The PR states that merge requires human approval.
The user rule remains binding: do not push anything to GitHub without explicit approval.
Handover
Before context compaction, interruption, or model handoff, update the run thread with:
- Current best candidate.
- Evaluations completed and their file paths.
- Rejected candidates and reasons.
- Open risks.
- Next action.