Source from repo
Agent Skills for Context Engineering

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
muratcankoylanGitHub muratcankoylanSource repo Original GitHub link
Files
241
Skill
n/a
Size
2.6 MB
Entrypoint
SKILL.md
Format
git-repo
Open file
docs/skills-improvement-analysis.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown159 linesFree
docs/skills-improvement-analysis.md
1# Skills Improvement Analysis: Lessons from Anthropic's "Building Claude Code" Article
2 
3*Analysis date: 2026-03-17*
4*Source: "Lessons from Building Claude Code: How We Use Skills" — Anthropic Team*
5 
6---
7 
8## What We're Already Doing Well
9 
10**Description field as trigger conditions** — 100% compliance. Every SKILL.md uses the "use when X" format the article recommends.
11 
12**Progressive disclosure via filesystem** — Our 3-level hierarchy (SKILL.md → references/ → scripts/) is textbook progressive disclosure. The article calls this out as a best practice.
13 
14**Composable scripts** — 12/13 skills include Python scripts with callable classes and functions.
15 
16**Not stating the obvious** — Skills focus on pushing Claude beyond defaults (e.g., U-shaped attention curves, observation masking, KV-cache tricks).
17 
18---
19 
20## The Big Gaps (Ordered by Impact)
21 
22### 1. Skills are knowledge-first, not action-first
23 
24The Anthropic team's 9 skill categories are overwhelmingly **operational** — verification, scaffolding, automation, runbooks, deployment. Our 13 skills are overwhelmingly **conceptual** — teaching Claude about context engineering principles.
25 
26The article says the most powerful thing you can give Claude is **code it can compose at runtime**, not knowledge it reads and internalizes. Our `scripts/` directories contain reference implementations (demonstration code), not composable helper libraries Claude would actually import and use during a task.
27 
28**The shift**: Our skills teach Claude *about* context engineering. The article suggests skills should help Claude *do* context engineering.
29 
30### 2. ~~No Gotchas sections (69% of skills)~~ — RESOLVED
31 
32> **Status**: Fixed in commit c847b20. All 13 skills now have standardized Gotchas sections (5-9 gotchas each). Template updated with canonical Gotchas section.
33 
34~~The article is unambiguous: *"The highest-signal content in any skill is the Gotchas section."* Only 4 of 13 skills had one. The root cause was our `template/SKILL.md` didn't include a Gotchas section — so new skills never got one by default.~~
35 
36### 3. No on-demand hooks
37 
38The article highlights on-demand hooks as a differentiator. Examples like `/careful` (blocks destructive commands) and `/freeze` (blocks edits outside a directory) show how hooks transform a knowledge skill into a guardrail. None of our skills use this.
39 
40For a context engineering marketplace, natural fits include:
41- `/budget` — warns when context usage exceeds a threshold
42- `/trace` — logs every tool call with token counts for post-hoc analysis
43- `/compress` — auto-triggers compression when conversation gets long
44 
45### 4. No setup/config pattern
46 
47The article recommends a `config.json` pattern for skills needing user context. None of our skills use this. For example, `memory-systems` could ask which framework the user is using and store that preference.
48 
49### 5. No measurement infrastructure
50 
51The article describes using `PreToolUse` hooks to track which skills are popular and which are undertriggering. We have no way to know if skills are actually being activated correctly.
52 
53### 6. No `${CLAUDE_PLUGIN_DATA}` usage
54 
55The article emphasizes persistent data storage so skills can learn over time. Our skills are stateless — they forget everything between sessions.
56 
57---
58 
59## Strategic Recommendations
60 
61### Tier 1: Quick wins (high impact, low effort)
62 
63**A. Add Gotchas to template and all 9 missing skills**
64 
65Update `template/SKILL.md` to include a `## Gotchas` section. Then add gotchas to the 9 skills that lack them. These should capture real failure modes, not theoretical ones. Examples:
66- `context-compression`: "Don't compress tool definitions — models need exact schemas"
67- `multi-agent-patterns`: "Sub-agents sharing context via message passing doubles token cost vs. filesystem coordination"
68- `context-optimization`: "Prefix caching breaks when system prompts change between turns"
69 
70**B. Add a marketplace curation flow**
71 
72Add a `sandbox/` directory for experimental skills. Update CONTRIBUTING.md to describe sandbox → traction → marketplace flow.
73 
74**C. Update SKILL.md template with article best practices**
75 
76Add sections for: Gotchas, Setup Requirements, Related Scripts, Storage Expectations.
77 
78### Tier 2: Structural enhancements (medium effort, high differentiation)
79 
80**D. Create 2-3 operational skills to complement knowledge skills**
81 
82| Proposed Skill | Category | What It Does |
83|---|---|---|
84| `context-debugger` | Runbook | Symptom → investigation → diagnosis for context failures |
85| `agent-scaffolding` | Code Scaffolding | Generates boilerplate for new agent projects |
86| `skill-creator` | Code Scaffolding | Meta-skill that helps create new skills following conventions |
87 
88**E. Make scripts composable, not demonstrative**
89 
90Transform scripts from "here's how you'd implement this" to "import this and use it":
91 
92```python
93# Before (reference): Shows how compaction works
94class ContextCompactor:
95    """Example implementation..."""
96 
97# After (composable): Claude actually uses this
98def compact_observation(output: str, max_tokens: int = 500) -> str:
99    """Compact a tool observation to fit within token budget."""
100```
101 
102**F. Add on-demand hooks to 2-3 skills**
103 
104Start with:
105- `context-optimization` → hook that warns on large tool outputs
106- `evaluation` → hook that auto-evaluates Claude's output quality
107- `context-compression` → hook that monitors conversation length
108 
109### Tier 3: Ecosystem maturity (higher effort, long-term value)
110 
111**G. Add a usage measurement skill** — `PreToolUse` hook logging skill activations.
112 
113**H. Add config.json setup** to framework-dependent skills (memory-systems, multi-agent-patterns).
114 
115**I. Create a "skill composition" example** — showing how skills invoke each other.
116 
117**J. Add persistent learning via `${CLAUDE_PLUGIN_DATA}`** — skills that get better over time.
118 
119---
120 
121## The Meta-Insight
122 
123Our repository is currently a **textbook** — it teaches Claude how to think about context engineering. The Anthropic article reveals that the most impactful skills at Anthropic are **toolboxes** — they give Claude things to do, not things to know.
124 
125The strongest version of this repo is both: **knowledge skills that also include operational capabilities**. The knowledge foundation is what got us cited in academic papers. Layering actionable tooling on top (gotchas, hooks, composable scripts, persistent state) would make the skills dramatically more useful in practice.
126 
127---
128 
129## Audit Summary Table
130 
131| Criterion | Status | Score | Notes |
132|-----------|--------|-------|-------|
133| Gotchas Sections | CRITICAL GAP | 31% (4/13) | Highest-signal content per article |
134| Description Format | PERFECT | 100% (13/13) | Trigger-condition format |
135| Composable Scripts | STRONG | 92% (12/13) | Present but reference-grade |
136| On-Demand Hooks | NOT IMPLEMENTED | 0% (0/13) | High differentiation opportunity |
137| Config/Setup Pattern | NOT IMPLEMENTED | 0% (0/13) | Needed for framework-dependent skills |
138| Persistent Storage | MINIMAL | 23% (3/13) | No `${CLAUDE_PLUGIN_DATA}` usage |
139| Progressive Disclosure | COMPREHENSIVE | 100% (13/13) | SKILL.md → references/ → scripts/ |
140| Templates/Assets | COMPREHENSIVE | 100% (13/13) | All have reference docs |
141 
142**Overall compliance: 65%** — Closing the Gotchas gap alone raises this to ~85%.
143 
144---
145 
146## Anthropic's 9 Skill Categories vs. Our Coverage
147 
148| Category | Coverage | Our Skills |
149|----------|----------|------------|
150| Library & API Reference | Moderate | memory-systems, tool-design |
151| Product Verification | Moderate | evaluation, advanced-evaluation |
152| Data Fetching & Analysis | Light | (interleaved-thinking example only) |
153| Business Process & Automation | Light | (digital-brain example only) |
154| Code Scaffolding & Templates | Light | project-development |
155| Code Quality & Review | Moderate | evaluation, advanced-evaluation |
156| CI/CD & Deployment | Light | hosted-agents |
157| Runbooks | Light | context-degradation |
158| Infrastructure Operations | Light | hosted-agents |
159
Preparing the source view

Agent Skills for Context Engineering

docs/skills-improvement-analysis.md