Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
docs/skills-improvement-analysis.md
1# Skills Improvement Analysis: Lessons from Anthropic's "Building Claude Code" Article23*Analysis date: 2026-03-17*4*Source: "Lessons from Building Claude Code: How We Use Skills" — Anthropic Team*56---78## What We're Already Doing Well910**Description field as trigger conditions** — 100% compliance. Every SKILL.md uses the "use when X" format the article recommends.1112**Progressive disclosure via filesystem** — Our 3-level hierarchy (SKILL.md → references/ → scripts/) is textbook progressive disclosure. The article calls this out as a best practice.1314**Composable scripts** — 12/13 skills include Python scripts with callable classes and functions.1516**Not stating the obvious** — Skills focus on pushing Claude beyond defaults (e.g., U-shaped attention curves, observation masking, KV-cache tricks).1718---1920## The Big Gaps (Ordered by Impact)2122### 1. Skills are knowledge-first, not action-first2324The Anthropic team's 9 skill categories are overwhelmingly **operational** — verification, scaffolding, automation, runbooks, deployment. Our 13 skills are overwhelmingly **conceptual** — teaching Claude about context engineering principles.2526The article says the most powerful thing you can give Claude is **code it can compose at runtime**, not knowledge it reads and internalizes. Our `scripts/` directories contain reference implementations (demonstration code), not composable helper libraries Claude would actually import and use during a task.2728**The shift**: Our skills teach Claude *about* context engineering. The article suggests skills should help Claude *do* context engineering.2930### 2. ~~No Gotchas sections (69% of skills)~~ — RESOLVED3132> **Status**: Fixed in commit c847b20. All 13 skills now have standardized Gotchas sections (5-9 gotchas each). Template updated with canonical Gotchas section.3334~~The article is unambiguous: *"The highest-signal content in any skill is the Gotchas section."* Only 4 of 13 skills had one. The root cause was our `template/SKILL.md` didn't include a Gotchas section — so new skills never got one by default.~~3536### 3. No on-demand hooks3738The article highlights on-demand hooks as a differentiator. Examples like `/careful` (blocks destructive commands) and `/freeze` (blocks edits outside a directory) show how hooks transform a knowledge skill into a guardrail. None of our skills use this.3940For a context engineering marketplace, natural fits include:41- `/budget` — warns when context usage exceeds a threshold42- `/trace` — logs every tool call with token counts for post-hoc analysis43- `/compress` — auto-triggers compression when conversation gets long4445### 4. No setup/config pattern4647The article recommends a `config.json` pattern for skills needing user context. None of our skills use this. For example, `memory-systems` could ask which framework the user is using and store that preference.4849### 5. No measurement infrastructure5051The article describes using `PreToolUse` hooks to track which skills are popular and which are undertriggering. We have no way to know if skills are actually being activated correctly.5253### 6. No `${CLAUDE_PLUGIN_DATA}` usage5455The article emphasizes persistent data storage so skills can learn over time. Our skills are stateless — they forget everything between sessions.5657---5859## Strategic Recommendations6061### Tier 1: Quick wins (high impact, low effort)6263**A. Add Gotchas to template and all 9 missing skills**6465Update `template/SKILL.md` to include a `## Gotchas` section. Then add gotchas to the 9 skills that lack them. These should capture real failure modes, not theoretical ones. Examples:66- `context-compression`: "Don't compress tool definitions — models need exact schemas"67- `multi-agent-patterns`: "Sub-agents sharing context via message passing doubles token cost vs. filesystem coordination"68- `context-optimization`: "Prefix caching breaks when system prompts change between turns"6970**B. Add a marketplace curation flow**7172Add a `sandbox/` directory for experimental skills. Update CONTRIBUTING.md to describe sandbox → traction → marketplace flow.7374**C. Update SKILL.md template with article best practices**7576Add sections for: Gotchas, Setup Requirements, Related Scripts, Storage Expectations.7778### Tier 2: Structural enhancements (medium effort, high differentiation)7980**D. Create 2-3 operational skills to complement knowledge skills**8182| Proposed Skill | Category | What It Does |83|---|---|---|84| `context-debugger` | Runbook | Symptom → investigation → diagnosis for context failures |85| `agent-scaffolding` | Code Scaffolding | Generates boilerplate for new agent projects |86| `skill-creator` | Code Scaffolding | Meta-skill that helps create new skills following conventions |8788**E. Make scripts composable, not demonstrative**8990Transform scripts from "here's how you'd implement this" to "import this and use it":9192```python93# Before (reference): Shows how compaction works94class ContextCompactor:95"""Example implementation..."""9697# After (composable): Claude actually uses this98def compact_observation(output: str, max_tokens: int = 500) -> str:99"""Compact a tool observation to fit within token budget."""100```101102**F. Add on-demand hooks to 2-3 skills**103104Start with:105- `context-optimization` → hook that warns on large tool outputs106- `evaluation` → hook that auto-evaluates Claude's output quality107- `context-compression` → hook that monitors conversation length108109### Tier 3: Ecosystem maturity (higher effort, long-term value)110111**G. Add a usage measurement skill** — `PreToolUse` hook logging skill activations.112113**H. Add config.json setup** to framework-dependent skills (memory-systems, multi-agent-patterns).114115**I. Create a "skill composition" example** — showing how skills invoke each other.116117**J. Add persistent learning via `${CLAUDE_PLUGIN_DATA}`** — skills that get better over time.118119---120121## The Meta-Insight122123Our repository is currently a **textbook** — it teaches Claude how to think about context engineering. The Anthropic article reveals that the most impactful skills at Anthropic are **toolboxes** — they give Claude things to do, not things to know.124125The strongest version of this repo is both: **knowledge skills that also include operational capabilities**. The knowledge foundation is what got us cited in academic papers. Layering actionable tooling on top (gotchas, hooks, composable scripts, persistent state) would make the skills dramatically more useful in practice.126127---128129## Audit Summary Table130131| Criterion | Status | Score | Notes |132|-----------|--------|-------|-------|133| Gotchas Sections | CRITICAL GAP | 31% (4/13) | Highest-signal content per article |134| Description Format | PERFECT | 100% (13/13) | Trigger-condition format |135| Composable Scripts | STRONG | 92% (12/13) | Present but reference-grade |136| On-Demand Hooks | NOT IMPLEMENTED | 0% (0/13) | High differentiation opportunity |137| Config/Setup Pattern | NOT IMPLEMENTED | 0% (0/13) | Needed for framework-dependent skills |138| Persistent Storage | MINIMAL | 23% (3/13) | No `${CLAUDE_PLUGIN_DATA}` usage |139| Progressive Disclosure | COMPREHENSIVE | 100% (13/13) | SKILL.md → references/ → scripts/ |140| Templates/Assets | COMPREHENSIVE | 100% (13/13) | All have reference docs |141142**Overall compliance: 65%** — Closing the Gotchas gap alone raises this to ~85%.143144---145146## Anthropic's 9 Skill Categories vs. Our Coverage147148| Category | Coverage | Our Skills |149|----------|----------|------------|150| Library & API Reference | Moderate | memory-systems, tool-design |151| Product Verification | Moderate | evaluation, advanced-evaluation |152| Data Fetching & Analysis | Light | (interleaved-thinking example only) |153| Business Process & Automation | Light | (digital-brain example only) |154| Code Scaffolding & Templates | Light | project-development |155| Code Quality & Review | Moderate | evaluation, advanced-evaluation |156| CI/CD & Deployment | Light | hosted-agents |157| Runbooks | Light | context-degradation |158| Infrastructure Operations | Light | hosted-agents |159