Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/mechanisms/registry.jsonl
1{"mechanism_id":"locked-editable-surfaces","owning_skill":"harness-engineering","status":"accepted","activation_scenario":"An autonomous agent is allowed to change one artifact while being scored by an external evaluator or rubric.","behavior_change":"Classify surfaces as locked, editable, append-only, or human-controlled before the loop starts; prevent the agent from using changed evaluators to approve its own work.","evidence":["https://github.com/karpathy/autoresearch/blob/master/program.md","researcher/rubrics/harness-change.md"],"failure_modes":["metric gaming","self-approved rubric changes","unreviewable harness mutation"]}2{"mechanism_id":"durable-research-thread","owning_skill":"filesystem-context","status":"accepted","activation_scenario":"A research or implementation loop may run past a context window, be resumed by another agent, or require auditability.","behavior_change":"Create a run directory with THREAD.md, source queue, evaluations, proposals, reports, and append-only logs before executing the loop.","evidence":["https://www.primeintellect.ai/auto-nanogpt","researcher/templates/research-thread.md"],"failure_modes":["lost state after compaction","repeated failed ideas","monitoring without evidence"]}3{"mechanism_id":"deterministic-first-validation","owning_skill":"evaluation","status":"accepted","activation_scenario":"A skill, source evaluation, manifest, or run artifact can be checked without model judgment.","behavior_change":"Run deterministic checks for structure, schemas, line caps, manifest sync, retrieval status, and rubric math before invoking LLM judges.","evidence":["researcher/scripts/validate_repo.py","researcher/rubrics/content-curation.md"],"failure_modes":["LLM judge laundering invalid artifacts","schema drift","APPROVE decisions with partial evidence"]}4{"mechanism_id":"structured-novelty-gate","owning_skill":"harness-engineering","status":"accepted","activation_scenario":"A proposed skill change may duplicate existing corpus guidance or prior run artifacts.","behavior_change":"Compare mechanism summary, activation scenario, behavior change, and failure modes against the mechanism registry before drafting or publishing a skill change.","evidence":["researcher/scripts/novelty_check.py","researcher/rubrics/pairwise-skill-revision.md"],"failure_modes":["false novelty","keyword-based duplicate detection","proposal boilerplate blocking useful changes"]}5{"mechanism_id":"pairwise-skill-revision","owning_skill":"advanced-evaluation","status":"accepted","activation_scenario":"Two candidate skill revisions target the same source evidence or activation scenario.","behavior_change":"Run deterministic structure checks, then compare candidates with a pairwise rubric using behavior improvement, evidence fidelity, activation clarity, corpus fit, and simplicity tie-breakers.","evidence":["researcher/rubrics/pairwise-skill-revision.md","researcher/scripts/compare_skill_revisions.py"],"failure_modes":["verbose candidate wins by length bias","uncalibrated skill revision choice","duplicate skill boundary"]}6{"mechanism_id":"world-state-grounded-bdi-chain","owning_skill":"bdi-mental-states","status":"accepted","activation_scenario":"An agent must explain why an external RDF world state became a belief, desire, intention, and plan.","behavior_change":"Model world states, beliefs, desires, intentions, justifications, validity intervals, and plans as linked entities before reasoning or projecting results back to RDF.","evidence":["skills/bdi-mental-states/SKILL.md"],"failure_modes":["ungrounded beliefs","opaque intention selection","stale mental states"]}7{"mechanism_id":"anchored-iterative-summary","owning_skill":"context-compression","status":"accepted","activation_scenario":"A long-running agent session needs compaction without losing files, decisions, risks, and next actions.","behavior_change":"Maintain a structured summary and merge only newly truncated content into existing sections instead of regenerating the whole summary each time.","evidence":["skills/context-compression/SKILL.md","claim-context-compression-factory-benchmark"],"failure_modes":["summary drift","lost artifact trail","re-fetching previously known files"]}8{"mechanism_id":"context-poisoning-circuit-breaker","owning_skill":"context-degradation","status":"accepted","activation_scenario":"A false claim, tool error, or hallucinated fact enters context and starts recurring in later reasoning.","behavior_change":"Identify the poisoning point, truncate or restart before it, reload verified context only, and record rejected provenance rather than layering corrections over the bad context.","evidence":["skills/context-degradation/SKILL.md","docs/claude_research.md"],"failure_modes":["correction stacking","persistent hallucination","tool misalignment"]}9{"mechanism_id":"progressive-disclosure-loading","owning_skill":"context-fundamentals","status":"accepted","activation_scenario":"A corpus, tool catalog, or documentation set is too large to load fully by default.","behavior_change":"Load names, descriptions, or indexes first; load full artifacts only when activation conditions match the current task.","evidence":["skills/context-fundamentals/SKILL.md","template/SKILL.md"],"failure_modes":["context stuffing","adjacent-skill overactivation","lost signal density"]}10{"mechanism_id":"retrievable-observation-masking","owning_skill":"context-optimization","status":"accepted","activation_scenario":"Verbose tool outputs dominate the active context but may need to be retrieved later.","behavior_change":"Replace resolved outputs with compact references and summaries while preserving full content in a retrievable external store.","evidence":["skills/context-optimization/SKILL.md","claim-context-optimization-tool-output-dominance"],"failure_modes":["masked active errors","unretrievable references","stale summaries treated as ground truth"]}11{"mechanism_id":"prebuilt-warm-sandbox-pool","owning_skill":"hosted-agents","status":"accepted","activation_scenario":"Hosted agents must start background coding sessions without cold-start setup dominating user-perceived latency.","behavior_change":"Pre-build environment images, keep warm sandboxes available, sync branch deltas at session start, and snapshot session state before teardown.","evidence":["skills/hosted-agents/SKILL.md"],"failure_modes":["cold-start abandonment","stale images","lost sandbox work"]}12{"mechanism_id":"task-guided-kv-cache-compaction","owning_skill":"latent-briefing","status":"accepted","activation_scenario":"A worker needs task-relevant orchestrator trajectory state without replaying the full trajectory as text.","behavior_change":"Score trajectory KV positions using the worker task prompt, retain a shared global token mask, and tune compaction against task accuracy and token savings.","evidence":["skills/latent-briefing/SKILL.md","claim-latent-briefing-public-results"],"failure_modes":["inaccessible KV tensors","cross-model latent mismatch","accuracy cliff from aggressive thresholds"]}13{"mechanism_id":"shallowest-viable-memory-layer","owning_skill":"memory-systems","status":"accepted","activation_scenario":"An agent needs persistent memory but the required retrieval semantics are not yet proven.","behavior_change":"Start with the simplest persistent layer that satisfies retrieval needs, then add vector, graph, temporal, or hybrid memory only when measured retrieval quality requires it.","evidence":["skills/memory-systems/SKILL.md","claim-memory-locomo-filesystem-baseline"],"failure_modes":["premature graph complexity","memory-context mismatch","stale fact poisoning"]}14{"mechanism_id":"context-isolation-agent-partitioning","owning_skill":"multi-agent-patterns","status":"accepted","activation_scenario":"A task exceeds one agent's useful context or decomposes into independent subtasks.","behavior_change":"Partition work across agents with isolated contexts, explicit handoffs, validation checkpoints, and measured coordination overhead.","evidence":["skills/multi-agent-patterns/SKILL.md","claim-multi-agent-token-multiplier"],"failure_modes":["supervisor bottleneck","agent sprawl","telephone-game summarization"]}15{"mechanism_id":"filesystem-state-machine-pipeline","owning_skill":"project-development","status":"accepted","activation_scenario":"An LLM project processes many items through repeatable stages and needs resumability, debugging, and selective reruns.","behavior_change":"Represent stage completion with files in per-item directories so acquire, prepare, process, parse, and render stages are idempotent and cacheable.","evidence":["skills/project-development/SKILL.md"],"failure_modes":["monolithic reruns","uninspectable intermediate state","duplicate processing"]}16{"mechanism_id":"tool-contract-description","owning_skill":"tool-design","status":"accepted","activation_scenario":"An agent must choose and call tools from descriptions without human documentation lookup.","behavior_change":"Write tool descriptions that specify purpose, activation conditions, parameters, return shape, and actionable recovery errors as an executable contract.","evidence":["skills/tool-design/SKILL.md"],"failure_modes":["wrong tool selection","malformed calls","unrecoverable errors"]}17