Source Registry
Use this registry to decide what an autonomous researcher should monitor, what evidence is admissible, and what should be rejected before spending evaluation tokens.
Priority Sources
| Tier | Source Class | Examples | Use |
|---|---|---|---|
| 1 | Peer-reviewed papers and major preprints | arXiv, OpenReview, conference proceedings | New mechanisms, benchmark results, ablations |
| 1 | AI lab engineering and research posts | OpenAI, Anthropic, DeepMind, Google Research, Meta, Microsoft, Cohere, Mistral, xAI | Production patterns, model behavior, agent architecture |
| 1 | Reproducible public code and benchmarks | GitHub repos, benchmark harnesses, leaderboards with logs | Harness design, validation methodology, implementation patterns |
| 2 | Infrastructure and agent product teams | Cursor, Vercel, LangChain, Cognition, Ramp, Prime Intellect, Modal, Browserbase | Operational lessons and system design patterns |
| 2 | Recognized practitioner deep dives | Maintainers, researchers, benchmark authors with public track record | Field reports and failure modes |
| 3 | Newsletters, summaries, podcasts, videos | Technical summaries with source links | Discovery leads only; evaluate primary sources before accepting |
Exclusion Rules
Reject or defer sources that match any of these patterns:
- Anonymous or unverifiable author with no primary evidence.
- Vendor marketing with no mechanism, artifact, metric, or reproducible claim.
- Basic tutorials that restate prompt engineering or RAG fundamentals.
- Claims based only on screenshots, demos, or private anecdotes without enough detail to implement.
- Content whose main insight is already covered in the repo without new evidence, failure modes, or implementation detail.
Monitoring Queries
Use these query families when running web or paper discovery:
context engineering agent systems tool design evaluation memory compressionharness engineering AI agents eval harness agent loop scratchpadautonomous research agent self improving agents experiment loopLLM agent evaluation rubric source quality citation accuracyagent memory durable scratchpad file system stateAlphaEvolve FunSearch autoresearch autonomous experimentationOpenAI Anthropic Cohere DeepMind agent engineering blog
Source Metadata
Every candidate source must record:
url: ""
title: ""
author_or_org: ""
published_at: ""
source_type: "paper | engineering_blog | documentation | benchmark | code | talk | other"
retrieval_status: "retrieved | partial | failed"
primary_or_secondary: "primary | secondary"
candidate_reason: ""Refresh Cadence
- Weekly: lab blogs, arXiv/OpenReview, public benchmark repos, and active engineering blogs.
- Monthly: older source revalidation for volatile claims, especially model-specific thresholds and benchmark numbers.
- Before PR: re-fetch every cited source and confirm the evidence still supports the proposed skill change.
Acceptance Biases To Avoid
- Do not accept a weak artifact because the organization is famous.
- Do not reject negative or failed experiments if they reveal a practical failure mode.
- Do not overvalue long reports. The target is implementable mechanism density.
- Do not accept benchmark claims without checking evaluation setup, baselines, and limitations.
- Do not treat secondary summaries as sources of truth when primary sources are available.