Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
CLAUDE.md
1# CLAUDE.md23This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.45## Project Overview67Agent Skills for Context Engineering: an open collection of 15 Agent Skills teaching context engineering and harness engineering principles for production AI agent systems. Skills are platform-agnostic (Claude Code, Cursor, GitHub Copilot, any Open Plugins-conformant tool). v2.2.0 ships a file-based researcher operating system with deterministic gates and a continuous loop.89Context engineering is the discipline of curating everything that enters a model's context window (system prompts, tool definitions, retrieved documents, message history, tool outputs) to maximize signal within limited attention budget.1011## Repository Structure1213- `skills/` - 15 skill directories, each containing a `SKILL.md` with YAML frontmatter (`name`, `description`) and optional `references/` and `scripts/` subdirectories14- `examples/` - 5 complete demonstration projects (digital-brain-skill, llm-as-judge-skills, book-sft-pipeline, x-to-book-system, interleaved-thinking)15- `docs/` - Research materials and reference documentation16- `researcher/` - File-based research-to-skill operating system: rubrics, mechanism registry, claim provenance, corpus index, run state machine, adversarial benchmarks, continuous loop, launchd service definitions17- `template/SKILL.md` - Canonical skill template (use when creating new skills)18- `SKILL.md` (root) - Collection-level metadata and skill map19- `.claude-plugin/marketplace.json` - Claude Code marketplace manifest (single bundled plugin, v2.2.0)20- `.plugin/plugin.json` - Open Plugins format manifest (v2.2.0)2122## Build & Test Commands2324No top-level build system. Repo-level gates and per-project tooling below.2526### Top-level deterministic gates (run on every PR via CI)2728```29python3 researcher/scripts/validate_repo.py --strict # corpus structure, manifests, rubric math, mechanism registry, claims, corpus index, activation cases, benchmark scenarios, run artifacts30python3 researcher/scripts/skill_health.py --strict --no-history # deterministic skill-body quality gate31python3 researcher/scripts/run_benchmarks.py # adversarial benchmark harness + repo + activation gates32python3 researcher/scripts/check_activation_cases.py # skill-boundary regression fixtures33```3435### Per-run readiness (active runs only)3637```38python3 researcher/scripts/validate_run.py --run-dir researcher/runs/<run-id>39```4041### Continuous loop (manual or launchd)4243```44python3 researcher/scripts/loop_discover.py45python3 researcher/scripts/loop_step.py --allow-fetch46python3 researcher/scripts/loop_daily.py47python3 researcher/scripts/loop_status.py4849researcher/orchestration/launchd/install.sh # macOS daemon50researcher/orchestration/launchd/uninstall.sh51```5253### Example projects5455#### examples/llm-as-judge-skills (TypeScript, Node >= 18)56```57cd examples/llm-as-judge-skills58npm install59npm run build # tsc60npm test # vitest (19 tests)61npm run lint # eslint62npm run format # prettier63npm run typecheck # tsc --noEmit64```6566#### examples/interleaved-thinking (Python >= 3.10)67```68cd examples/interleaved-thinking69pip install -e ".[dev]"70pytest # pytest + pytest-asyncio71ruff check . # linting (100 char line length)72```7374#### examples/digital-brain-skill (Node.js)75```76cd examples/digital-brain-skill77npm run setup78npm run weekly-review79npm run content-ideas80npm run stale-contacts81```8283## Skill Authoring Rules8485When creating or editing skills:86871. **SKILL.md must stay under 500 lines**: move detailed content to `references/` directory882. **YAML frontmatter is required**: must include `name` and `description` fields893. **Folder naming**: lowercase with hyphens (e.g., `context-fundamentals`)904. **Write in third person**: descriptions are injected into system prompts; inconsistent POV causes discovery issues915. **Platform-agnostic**: no vendor-locked examples or platform-specific tool names without abstraction926. **Token-conscious**: challenge each paragraph and assume an advanced audience937. **Body standard**: include `When to Activate`, `Core Concepts`, `Practical Guidance`, `Examples`, `Guidelines`, `Gotchas`, `Integration`, and `References`948. **Explicit boundaries**: every `When to Activate` section needs positive triggers plus a `Do not activate` block routing adjacent work to the right skill959. **Include a Gotchas section**: experience-derived failure modes are the highest-signal content in any skill9610. **Update root README.md** when adding new skills9711. **Update marketplace/plugin manifests** when adding skills (`.claude-plugin/marketplace.json`, `.plugin/plugin.json`)9812. **Update the corpus index** (`researcher/corpus/index.json`) to map the new skill to activation scenarios, mechanism IDs, and claim IDs9913. **Update mechanisms and claims**: add registry entries for reusable behavior changes and `claim-*` provenance for numeric, benchmark, volatile, or vendor-performance claims10014. **Run `validate_repo.py --strict`, `skill_health.py --strict --no-history`, `check_activation_cases.py`, and `run_benchmarks.py`** before committing skill changes101102## Researcher OS Rules103104When working through the researcher operating system:1051061. **Initialize runs via `research_loop.py init`**: it creates `run-state.json`, queue entry, thread log, source evaluation scaffold, and mechanism proposal template1072. **Advance state explicitly**: use `retrieve`, `evaluate`, `propose`, `novelty`, `validate-run`, `pr-ready`, `close` subcommands; do not edit `run-state.json` by hand1083. **Promote mechanisms only after run readiness**: `research_loop.py promote-mechanisms` requires `--reviewed-by` and a passing run-readiness check1094. **Add claim provenance** to `researcher/claims/index.jsonl` for any numeric, benchmark, or volatile claim added to a skill1105. **Never invoke paid LLMs from the continuous loop**: HTTP retrieval is stdlib-only, judge adapters are explicitly out of scope until budget-gated1116. **Never commit runtime queue/report files**: `.gitignore` covers `researcher/queue/*.jsonl`, `researcher/reports/{logs,snapshots,loop-events.jsonl,loop-failures.jsonl,status.md,parked-review.md}`, and `researcher/runs/*/` except the seed run112113## Plugin Architecture114115All 15 skills are distributed as a single plugin (`context-engineering`) in the marketplace manifest. This avoids cache duplication: Claude Code caches each plugin's `source` directory separately, so multiple plugins pointing to `source: "./"` would each cache a full copy of the repo.116117Progressive disclosure pattern: only skill names/descriptions load at startup; full content loads on activation.118119## Key Design Principles120121- **Context quality over quantity**: attention scarcity and lost-in-middle behavior mean more context is not always better122- **Sub-agents isolate context**: they exist to manage attention budget, not simulate org roles123- **Skills reference each other**: use plain text skill names (not links) in Integration sections to avoid cross-directory reference issues124- **Examples use Python pseudocode**: conceptual demonstrations that work across environments, not production-ready implementations125- **Deterministic first, model-judged second**: structure, schema, rubric math, manifest sync, retrieval status, and registry shape must pass before any LLM judge is invoked126- **Human-controlled merge**: agents may prepare PRs and pass gates, but push and merge always require explicit human approval127