Source from repo

Writing Skills

Creates and validates agent skills using Test-Driven Development — write test scenarios, baseline behavior, then the skill itself.

obraGitHub obraSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

100.7 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

SKILL.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown656 linesEntrypointFree

SKILL.md

1---
2name: writing-skills
3description: Use when creating new skills, editing existing skills, or verifying skills work before deployment
4---
5 
6# Writing Skills
7 
8## Overview
9 
10**Writing skills IS Test-Driven Development applied to process documentation.**
11 
12**Personal skills live in agent-specific directories (`~/.claude/skills` for Claude Code, `~/.agents/skills/` for Codex)** 
13 
14You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
15 
16**Core principle:** If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
17 
18**REQUIRED BACKGROUND:** You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.
19 
20**Official guidance:** For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.
21 
22## What is a Skill?
23 
24A **skill** is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.
25 
26**Skills are:** Reusable techniques, patterns, tools, reference guides
27 
28**Skills are NOT:** Narratives about how you solved a problem once
29 
30## TDD Mapping for Skills
31 
32| TDD Concept | Skill Creation |
33|-------------|----------------|
34| **Test case** | Pressure scenario with subagent |
35| **Production code** | Skill document (SKILL.md) |
36| **Test fails (RED)** | Agent violates rule without skill (baseline) |
37| **Test passes (GREEN)** | Agent complies with skill present |
38| **Refactor** | Close loopholes while maintaining compliance |
39| **Write test first** | Run baseline scenario BEFORE writing skill |
40| **Watch it fail** | Document exact rationalizations agent uses |
41| **Minimal code** | Write skill addressing those specific violations |
42| **Watch it pass** | Verify agent now complies |
43| **Refactor cycle** | Find new rationalizations → plug → re-verify |
44 
45The entire skill creation process follows RED-GREEN-REFACTOR.
46 
47## When to Create a Skill
48 
49**Create when:**
50- Technique wasn't intuitively obvious to you
51- You'd reference this again across projects
52- Pattern applies broadly (not project-specific)
53- Others would benefit
54 
55**Don't create for:**
56- One-off solutions
57- Standard practices well-documented elsewhere
58- Project-specific conventions (put in CLAUDE.md)
59- Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls)
60 
61## Skill Types
62 
63### Technique
64Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)
65 
66### Pattern
67Way of thinking about problems (flatten-with-flags, test-invariants)
68 
69### Reference
70API docs, syntax guides, tool documentation (office docs)
71 
72## Directory Structure
73 
74 
75```
76skills/
77  skill-name/
78    SKILL.md              # Main reference (required)
79    supporting-file.*     # Only if needed
80```
81 
82**Flat namespace** - all skills in one searchable namespace
83 
84**Separate files for:**
851. **Heavy reference** (100+ lines) - API docs, comprehensive syntax
862. **Reusable tools** - Scripts, utilities, templates
87 
88**Keep inline:**
89- Principles and concepts
90- Code patterns (< 50 lines)
91- Everything else
92 
93## SKILL.md Structure
94 
95**Frontmatter (YAML):**
96- Two required fields: `name` and `description` (see [agentskills.io/specification](https://agentskills.io/specification) for all supported fields)
97- Max 1024 characters total
98- `name`: Use letters, numbers, and hyphens only (no parentheses, special chars)
99- `description`: Third-person, describes ONLY when to use (NOT what it does)
100  - Start with "Use when..." to focus on triggering conditions
101  - Include specific symptoms, situations, and contexts
102  - **NEVER summarize the skill's process or workflow** (see CSO section for why)
103  - Keep under 500 characters if possible
104 
105```markdown
106---
107name: Skill-Name-With-Hyphens
108description: Use when [specific triggering conditions and symptoms]
109---
110 
111# Skill Name
112 
113## Overview
114What is this? Core principle in 1-2 sentences.
115 
116## When to Use
117[Small inline flowchart IF decision non-obvious]
118 
119Bullet list with SYMPTOMS and use cases
120When NOT to use
121 
122## Core Pattern (for techniques/patterns)
123Before/after code comparison
124 
125## Quick Reference
126Table or bullets for scanning common operations
127 
128## Implementation
129Inline code for simple patterns
130Link to file for heavy reference or reusable tools
131 
132## Common Mistakes
133What goes wrong + fixes
134 
135## Real-World Impact (optional)
136Concrete results
137```
138 
139 
140## Claude Search Optimization (CSO)
141 
142**Critical for discovery:** Future Claude needs to FIND your skill
143 
144### 1. Rich Description Field
145 
146**Purpose:** Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"
147 
148**Format:** Start with "Use when..." to focus on triggering conditions
149 
150**CRITICAL: Description = When to Use, NOT What the Skill Does**
151 
152The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.
153 
154**Why this matters:** Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).
155 
156When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.
157 
158**The trap:** Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.
159 
160```yaml
161# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
162description: Use when executing plans - dispatches subagent per task with code review between tasks
163 
164# ❌ BAD: Too much process detail
165description: Use for TDD - write test first, watch it fail, write minimal code, refactor
166 
167# ✅ GOOD: Just triggering conditions, no workflow summary
168description: Use when executing implementation plans with independent tasks in the current session
169 
170# ✅ GOOD: Triggering conditions only
171description: Use when implementing any feature or bugfix, before writing implementation code
172```
173 
174**Content:**
175- Use concrete triggers, symptoms, and situations that signal this skill applies
176- Describe the *problem* (race conditions, inconsistent behavior) not *language-specific symptoms* (setTimeout, sleep)
177- Keep triggers technology-agnostic unless the skill itself is technology-specific
178- If skill is technology-specific, make that explicit in the trigger
179- Write in third person (injected into system prompt)
180- **NEVER summarize the skill's process or workflow**
181 
182```yaml
183# ❌ BAD: Too abstract, vague, doesn't include when to use
184description: For async testing
185 
186# ❌ BAD: First person
187description: I can help you with async tests when they're flaky
188 
189# ❌ BAD: Mentions technology but skill isn't specific to it
190description: Use when tests use setTimeout/sleep and are flaky
191 
192# ✅ GOOD: Starts with "Use when", describes problem, no workflow
193description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
194 
195# ✅ GOOD: Technology-specific skill with explicit trigger
196description: Use when using React Router and handling authentication redirects
197```
198 
199### 2. Keyword Coverage
200 
201Use words Claude would search for:
202- Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
203- Symptoms: "flaky", "hanging", "zombie", "pollution"
204- Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
205- Tools: Actual commands, library names, file types
206 
207### 3. Descriptive Naming
208 
209**Use active voice, verb-first:**
210- ✅ `creating-skills` not `skill-creation`
211- ✅ `condition-based-waiting` not `async-test-helpers`
212 
213### 4. Token Efficiency (Critical)
214 
215**Problem:** getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.
216 
217**Target word counts:**
218- getting-started workflows: <150 words each
219- Frequently-loaded skills: <200 words total
220- Other skills: <500 words (still be concise)
221 
222**Techniques:**
223 
224**Move details to tool help:**
225```bash
226# ❌ BAD: Document all flags in SKILL.md
227search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
228 
229# ✅ GOOD: Reference --help
230search-conversations supports multiple modes and filters. Run --help for details.
231```
232 
233**Use cross-references:**
234```markdown
235# ❌ BAD: Repeat workflow details
236When searching, dispatch subagent with template...
237[20 lines of repeated instructions]
238 
239# ✅ GOOD: Reference other skill
240Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
241```
242 
243**Compress examples:**
244```markdown
245# ❌ BAD: Verbose example (42 words)
246your human partner: "How did we handle authentication errors in React Router before?"
247You: I'll search past conversations for React Router authentication patterns.
248[Dispatch subagent with search query: "React Router authentication error handling 401"]
249 
250# ✅ GOOD: Minimal example (20 words)
251Partner: "How did we handle auth errors in React Router?"
252You: Searching...
253[Dispatch subagent → synthesis]
254```
255 
256**Eliminate redundancy:**
257- Don't repeat what's in cross-referenced skills
258- Don't explain what's obvious from command
259- Don't include multiple examples of same pattern
260 
261**Verification:**
262```bash
263wc -w skills/path/SKILL.md
264# getting-started workflows: aim for <150 each
265# Other frequently-loaded: aim for <200 total
266```
267 
268**Name by what you DO or core insight:**
269- ✅ `condition-based-waiting` > `async-test-helpers`
270- ✅ `using-skills` not `skill-usage`
271- ✅ `flatten-with-flags` > `data-structure-refactoring`
272- ✅ `root-cause-tracing` > `debugging-techniques`
273 
274**Gerunds (-ing) work well for processes:**
275- `creating-skills`, `testing-skills`, `debugging-with-logs`
276- Active, describes the action you're taking
277 
278### 4. Cross-Referencing Other Skills
279 
280**When writing documentation that references other skills:**
281 
282Use skill name only, with explicit requirement markers:
283- ✅ Good: `**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development`
284- ✅ Good: `**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging`
285- ❌ Bad: `See skills/testing/test-driven-development` (unclear if required)
286- ❌ Bad: `@skills/testing/test-driven-development/SKILL.md` (force-loads, burns context)
287 
288**Why no @ links:** `@` syntax force-loads files immediately, consuming 200k+ context before you need them.
289 
290## Flowchart Usage
291 
292```dot
293digraph when_flowchart {
294    "Need to show information?" [shape=diamond];
295    "Decision where I might go wrong?" [shape=diamond];
296    "Use markdown" [shape=box];
297    "Small inline flowchart" [shape=box];
298 
299    "Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
300    "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
301    "Decision where I might go wrong?" -> "Use markdown" [label="no"];
302}
303```
304 
305**Use flowcharts ONLY for:**
306- Non-obvious decision points
307- Process loops where you might stop too early
308- "When to use A vs B" decisions
309 
310**Never use flowcharts for:**
311- Reference material → Tables, lists
312- Code examples → Markdown blocks
313- Linear instructions → Numbered lists
314- Labels without semantic meaning (step1, helper2)
315 
316See @graphviz-conventions.dot for graphviz style rules.
317 
318**Visualizing for your human partner:** Use `render-graphs.js` in this directory to render a skill's flowcharts to SVG:
319```bash
320./render-graphs.js ../some-skill           # Each diagram separately
321./render-graphs.js ../some-skill --combine # All diagrams in one SVG
322```
323 
324## Code Examples
325 
326**One excellent example beats many mediocre ones**
327 
328Choose most relevant language:
329- Testing techniques → TypeScript/JavaScript
330- System debugging → Shell/Python
331- Data processing → Python
332 
333**Good example:**
334- Complete and runnable
335- Well-commented explaining WHY
336- From real scenario
337- Shows pattern clearly
338- Ready to adapt (not generic template)
339 
340**Don't:**
341- Implement in 5+ languages
342- Create fill-in-the-blank templates
343- Write contrived examples
344 
345You're good at porting - one great example is enough.
346 
347## File Organization
348 
349### Self-Contained Skill
350```
351defense-in-depth/
352  SKILL.md    # Everything inline
353```
354When: All content fits, no heavy reference needed
355 
356### Skill with Reusable Tool
357```
358condition-based-waiting/
359  SKILL.md    # Overview + patterns
360  example.ts  # Working helpers to adapt
361```
362When: Tool is reusable code, not just narrative
363 
364### Skill with Heavy Reference
365```
366pptx/
367  SKILL.md       # Overview + workflows
368  pptxgenjs.md   # 600 lines API reference
369  ooxml.md       # 500 lines XML structure
370  scripts/       # Executable tools
371```
372When: Reference material too large for inline
373 
374## The Iron Law (Same as TDD)
375 
376```
377NO SKILL WITHOUT A FAILING TEST FIRST
378```
379 
380This applies to NEW skills AND EDITS to existing skills.
381 
382Write skill before testing? Delete it. Start over.
383Edit skill without testing? Same violation.
384 
385**No exceptions:**
386- Not for "simple additions"
387- Not for "just adding a section"
388- Not for "documentation updates"
389- Don't keep untested changes as "reference"
390- Don't "adapt" while running tests
391- Delete means delete
392 
393**REQUIRED BACKGROUND:** The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.
394 
395## Testing All Skill Types
396 
397Different skill types need different test approaches:
398 
399### Discipline-Enforcing Skills (rules/requirements)
400 
401**Examples:** TDD, verification-before-completion, designing-before-coding
402 
403**Test with:**
404- Academic questions: Do they understand the rules?
405- Pressure scenarios: Do they comply under stress?
406- Multiple pressures combined: time + sunk cost + exhaustion
407- Identify rationalizations and add explicit counters
408 
409**Success criteria:** Agent follows rule under maximum pressure
410 
411### Technique Skills (how-to guides)
412 
413**Examples:** condition-based-waiting, root-cause-tracing, defensive-programming
414 
415**Test with:**
416- Application scenarios: Can they apply the technique correctly?
417- Variation scenarios: Do they handle edge cases?
418- Missing information tests: Do instructions have gaps?
419 
420**Success criteria:** Agent successfully applies technique to new scenario
421 
422### Pattern Skills (mental models)
423 
424**Examples:** reducing-complexity, information-hiding concepts
425 
426**Test with:**
427- Recognition scenarios: Do they recognize when pattern applies?
428- Application scenarios: Can they use the mental model?
429- Counter-examples: Do they know when NOT to apply?
430 
431**Success criteria:** Agent correctly identifies when/how to apply pattern
432 
433### Reference Skills (documentation/APIs)
434 
435**Examples:** API documentation, command references, library guides
436 
437**Test with:**
438- Retrieval scenarios: Can they find the right information?
439- Application scenarios: Can they use what they found correctly?
440- Gap testing: Are common use cases covered?
441 
442**Success criteria:** Agent finds and correctly applies reference information
443 
444## Common Rationalizations for Skipping Testing
445 
446| Excuse | Reality |
447|--------|---------|
448| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
449| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
450| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
451| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
452| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
453| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
454| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
455| "No time to test" | Deploying untested skill wastes more time fixing it later. |
456 
457**All of these mean: Test before deploying. No exceptions.**
458 
459## Bulletproofing Skills Against Rationalization
460 
461Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
462 
463**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
464 
465### Close Every Loophole Explicitly
466 
467Don't just state the rule - forbid specific workarounds:
468 
469<Bad>
470```markdown
471Write code before test? Delete it.
472```
473</Bad>
474 
475<Good>
476```markdown
477Write code before test? Delete it. Start over.
478 
479**No exceptions:**
480- Don't keep it as "reference"
481- Don't "adapt" it while writing tests
482- Don't look at it
483- Delete means delete
484```
485</Good>
486 
487### Address "Spirit vs Letter" Arguments
488 
489Add foundational principle early:
490 
491```markdown
492**Violating the letter of the rules is violating the spirit of the rules.**
493```
494 
495This cuts off entire class of "I'm following the spirit" rationalizations.
496 
497### Build Rationalization Table
498 
499Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
500 
501```markdown
502| Excuse | Reality |
503|--------|---------|
504| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
505| "I'll test after" | Tests passing immediately prove nothing. |
506| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
507```
508 
509### Create Red Flags List
510 
511Make it easy for agents to self-check when rationalizing:
512 
513```markdown
514## Red Flags - STOP and Start Over
515 
516- Code before test
517- "I already manually tested it"
518- "Tests after achieve the same purpose"
519- "It's about spirit not ritual"
520- "This is different because..."
521 
522**All of these mean: Delete code. Start over with TDD.**
523```
524 
525### Update CSO for Violation Symptoms
526 
527Add to description: symptoms of when you're ABOUT to violate the rule:
528 
529```yaml
530description: use when implementing any feature or bugfix, before writing implementation code
531```
532 
533## RED-GREEN-REFACTOR for Skills
534 
535Follow the TDD cycle:
536 
537### RED: Write Failing Test (Baseline)
538 
539Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
540- What choices did they make?
541- What rationalizations did they use (verbatim)?
542- Which pressures triggered violations?
543 
544This is "watch the test fail" - you must see what agents naturally do before writing the skill.
545 
546### GREEN: Write Minimal Skill
547 
548Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
549 
550Run same scenarios WITH skill. Agent should now comply.
551 
552### REFACTOR: Close Loopholes
553 
554Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
555 
556**Testing methodology:** See @testing-skills-with-subagents.md for the complete testing methodology:
557- How to write pressure scenarios
558- Pressure types (time, sunk cost, authority, exhaustion)
559- Plugging holes systematically
560- Meta-testing techniques
561 
562## Anti-Patterns
563 
564### ❌ Narrative Example
565"In session 2025-10-03, we found empty projectDir caused..."
566**Why bad:** Too specific, not reusable
567 
568### ❌ Multi-Language Dilution
569example-js.js, example-py.py, example-go.go
570**Why bad:** Mediocre quality, maintenance burden
571 
572### ❌ Code in Flowcharts
573```dot
574step1 [label="import fs"];
575step2 [label="read file"];
576```
577**Why bad:** Can't copy-paste, hard to read
578 
579### ❌ Generic Labels
580helper1, helper2, step3, pattern4
581**Why bad:** Labels should have semantic meaning
582 
583## STOP: Before Moving to Next Skill
584 
585**After writing ANY skill, you MUST STOP and complete the deployment process.**
586 
587**Do NOT:**
588- Create multiple skills in batch without testing each
589- Move to next skill before current one is verified
590- Skip testing because "batching is more efficient"
591 
592**The deployment checklist below is MANDATORY for EACH skill.**
593 
594Deploying untested skills = deploying untested code. It's a violation of quality standards.
595 
596## Skill Creation Checklist (TDD Adapted)
597 
598**IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.**
599 
600**RED Phase - Write Failing Test:**
601- [ ] Create pressure scenarios (3+ combined pressures for discipline skills)
602- [ ] Run scenarios WITHOUT skill - document baseline behavior verbatim
603- [ ] Identify patterns in rationalizations/failures
604 
605**GREEN Phase - Write Minimal Skill:**
606- [ ] Name uses only letters, numbers, hyphens (no parentheses/special chars)
607- [ ] YAML frontmatter with required `name` and `description` fields (max 1024 chars; see [spec](https://agentskills.io/specification))
608- [ ] Description starts with "Use when..." and includes specific triggers/symptoms
609- [ ] Description written in third person
610- [ ] Keywords throughout for search (errors, symptoms, tools)
611- [ ] Clear overview with core principle
612- [ ] Address specific baseline failures identified in RED
613- [ ] Code inline OR link to separate file
614- [ ] One excellent example (not multi-language)
615- [ ] Run scenarios WITH skill - verify agents now comply
616 
617**REFACTOR Phase - Close Loopholes:**
618- [ ] Identify NEW rationalizations from testing
619- [ ] Add explicit counters (if discipline skill)
620- [ ] Build rationalization table from all test iterations
621- [ ] Create red flags list
622- [ ] Re-test until bulletproof
623 
624**Quality Checks:**
625- [ ] Small flowchart only if decision non-obvious
626- [ ] Quick reference table
627- [ ] Common mistakes section
628- [ ] No narrative storytelling
629- [ ] Supporting files only for tools or heavy reference
630 
631**Deployment:**
632- [ ] Commit skill to git and push to your fork (if configured)
633- [ ] Consider contributing back via PR (if broadly useful)
634 
635## Discovery Workflow
636 
637How future Claude finds your skill:
638 
6391. **Encounters problem** ("tests are flaky")
6403. **Finds SKILL** (description matches)
6414. **Scans overview** (is this relevant?)
6425. **Reads patterns** (quick reference table)
6436. **Loads example** (only when implementing)
644 
645**Optimize for this flow** - put searchable terms early and often.
646 
647## The Bottom Line
648 
649**Creating skills IS TDD for process documentation.**
650 
651Same Iron Law: No skill without failing test first.
652Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes).
653Same benefits: Better quality, fewer surprises, bulletproof results.
654 
655If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.
656

Marketplace

Source from repo

Writing Skills

Creates and validates agent skills using Test-Driven Development — write test scenarios, baseline behavior, then the skill itself.

obraGitHub obraSource repo Original GitHub link Publisher page

Files

Skill

n/a

Size

100.7 KB

Entrypoint

SKILL.md

Format

git-repo

Open file

SKILL.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown656 linesEntrypointFree

SKILL.md

1---
2name: writing-skills
3description: Use when creating new skills, editing existing skills, or verifying skills work before deployment
4---
5 
6# Writing Skills
7 
8## Overview
9 
10**Writing skills IS Test-Driven Development applied to process documentation.**
11 
12**Personal skills live in agent-specific directories (`~/.claude/skills` for Claude Code, `~/.agents/skills/` for Codex)** 
13 
14You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
15 
16**Core principle:** If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
17 
18**REQUIRED BACKGROUND:** You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.
19 
20**Official guidance:** For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.
21 
22## What is a Skill?
23 
24A **skill** is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.
25 
26**Skills are:** Reusable techniques, patterns, tools, reference guides
27 
28**Skills are NOT:** Narratives about how you solved a problem once
29 
30## TDD Mapping for Skills
31 
32| TDD Concept | Skill Creation |
33|-------------|----------------|
34| **Test case** | Pressure scenario with subagent |
35| **Production code** | Skill document (SKILL.md) |
36| **Test fails (RED)** | Agent violates rule without skill (baseline) |
37| **Test passes (GREEN)** | Agent complies with skill present |
38| **Refactor** | Close loopholes while maintaining compliance |
39| **Write test first** | Run baseline scenario BEFORE writing skill |
40| **Watch it fail** | Document exact rationalizations agent uses |
41| **Minimal code** | Write skill addressing those specific violations |
42| **Watch it pass** | Verify agent now complies |
43| **Refactor cycle** | Find new rationalizations → plug → re-verify |
44 
45The entire skill creation process follows RED-GREEN-REFACTOR.
46 
47## When to Create a Skill
48 
49**Create when:**
50- Technique wasn't intuitively obvious to you
51- You'd reference this again across projects
52- Pattern applies broadly (not project-specific)
53- Others would benefit
54 
55**Don't create for:**
56- One-off solutions
57- Standard practices well-documented elsewhere
58- Project-specific conventions (put in CLAUDE.md)
59- Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls)
60 
61## Skill Types
62 
63### Technique
64Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)
65 
66### Pattern
67Way of thinking about problems (flatten-with-flags, test-invariants)
68 
69### Reference
70API docs, syntax guides, tool documentation (office docs)
71 
72## Directory Structure
73 
74 
75```
76skills/
77  skill-name/
78    SKILL.md              # Main reference (required)
79    supporting-file.*     # Only if needed
80```
81 
82**Flat namespace** - all skills in one searchable namespace
83 
84**Separate files for:**
851. **Heavy reference** (100+ lines) - API docs, comprehensive syntax
862. **Reusable tools** - Scripts, utilities, templates
87 
88**Keep inline:**
89- Principles and concepts
90- Code patterns (< 50 lines)
91- Everything else
92 
93## SKILL.md Structure
94 
95**Frontmatter (YAML):**
96- Two required fields: `name` and `description` (see [agentskills.io/specification](https://agentskills.io/specification) for all supported fields)
97- Max 1024 characters total
98- `name`: Use letters, numbers, and hyphens only (no parentheses, special chars)
99- `description`: Third-person, describes ONLY when to use (NOT what it does)
100  - Start with "Use when..." to focus on triggering conditions
101  - Include specific symptoms, situations, and contexts
102  - **NEVER summarize the skill's process or workflow** (see CSO section for why)
103  - Keep under 500 characters if possible
104 
105```markdown
106---
107name: Skill-Name-With-Hyphens
108description: Use when [specific triggering conditions and symptoms]
109---
110 
111# Skill Name
112 
113## Overview
114What is this? Core principle in 1-2 sentences.
115 
116## When to Use
117[Small inline flowchart IF decision non-obvious]
118 
119Bullet list with SYMPTOMS and use cases
120When NOT to use
121 
122## Core Pattern (for techniques/patterns)
123Before/after code comparison
124 
125## Quick Reference
126Table or bullets for scanning common operations
127 
128## Implementation
129Inline code for simple patterns
130Link to file for heavy reference or reusable tools
131 
132## Common Mistakes
133What goes wrong + fixes
134 
135## Real-World Impact (optional)
136Concrete results
137```
138 
139 
140## Claude Search Optimization (CSO)
141 
142**Critical for discovery:** Future Claude needs to FIND your skill
143 
144### 1. Rich Description Field
145 
146**Purpose:** Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"
147 
148**Format:** Start with "Use when..." to focus on triggering conditions
149 
150**CRITICAL: Description = When to Use, NOT What the Skill Does**
151 
152The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.
153 
154**Why this matters:** Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).
155 
156When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.
157 
158**The trap:** Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.
159 
160```yaml
161# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
162description: Use when executing plans - dispatches subagent per task with code review between tasks
163 
164# ❌ BAD: Too much process detail
165description: Use for TDD - write test first, watch it fail, write minimal code, refactor
166 
167# ✅ GOOD: Just triggering conditions, no workflow summary
168description: Use when executing implementation plans with independent tasks in the current session
169 
170# ✅ GOOD: Triggering conditions only
171description: Use when implementing any feature or bugfix, before writing implementation code
172```
173 
174**Content:**
175- Use concrete triggers, symptoms, and situations that signal this skill applies
176- Describe the *problem* (race conditions, inconsistent behavior) not *language-specific symptoms* (setTimeout, sleep)
177- Keep triggers technology-agnostic unless the skill itself is technology-specific
178- If skill is technology-specific, make that explicit in the trigger
179- Write in third person (injected into system prompt)
180- **NEVER summarize the skill's process or workflow**
181 
182```yaml
183# ❌ BAD: Too abstract, vague, doesn't include when to use
184description: For async testing
185 
186# ❌ BAD: First person
187description: I can help you with async tests when they're flaky
188 
189# ❌ BAD: Mentions technology but skill isn't specific to it
190description: Use when tests use setTimeout/sleep and are flaky
191 
192# ✅ GOOD: Starts with "Use when", describes problem, no workflow
193description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
194 
195# ✅ GOOD: Technology-specific skill with explicit trigger
196description: Use when using React Router and handling authentication redirects
197```
198 
199### 2. Keyword Coverage
200 
201Use words Claude would search for:
202- Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
203- Symptoms: "flaky", "hanging", "zombie", "pollution"
204- Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
205- Tools: Actual commands, library names, file types
206 
207### 3. Descriptive Naming
208 
209**Use active voice, verb-first:**
210- ✅ `creating-skills` not `skill-creation`
211- ✅ `condition-based-waiting` not `async-test-helpers`
212 
213### 4. Token Efficiency (Critical)
214 
215**Problem:** getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.
216 
217**Target word counts:**
218- getting-started workflows: <150 words each
219- Frequently-loaded skills: <200 words total
220- Other skills: <500 words (still be concise)
221 
222**Techniques:**
223 
224**Move details to tool help:**
225```bash
226# ❌ BAD: Document all flags in SKILL.md
227search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
228 
229# ✅ GOOD: Reference --help
230search-conversations supports multiple modes and filters. Run --help for details.
231```
232 
233**Use cross-references:**
234```markdown
235# ❌ BAD: Repeat workflow details
236When searching, dispatch subagent with template...
237[20 lines of repeated instructions]
238 
239# ✅ GOOD: Reference other skill
240Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
241```
242 
243**Compress examples:**
244```markdown
245# ❌ BAD: Verbose example (42 words)
246your human partner: "How did we handle authentication errors in React Router before?"
247You: I'll search past conversations for React Router authentication patterns.
248[Dispatch subagent with search query: "React Router authentication error handling 401"]
249 
250# ✅ GOOD: Minimal example (20 words)
251Partner: "How did we handle auth errors in React Router?"
252You: Searching...
253[Dispatch subagent → synthesis]
254```
255 
256**Eliminate redundancy:**
257- Don't repeat what's in cross-referenced skills
258- Don't explain what's obvious from command
259- Don't include multiple examples of same pattern
260 
261**Verification:**
262```bash
263wc -w skills/path/SKILL.md
264# getting-started workflows: aim for <150 each
265# Other frequently-loaded: aim for <200 total
266```
267 
268**Name by what you DO or core insight:**
269- ✅ `condition-based-waiting` > `async-test-helpers`
270- ✅ `using-skills` not `skill-usage`
271- ✅ `flatten-with-flags` > `data-structure-refactoring`
272- ✅ `root-cause-tracing` > `debugging-techniques`
273 
274**Gerunds (-ing) work well for processes:**
275- `creating-skills`, `testing-skills`, `debugging-with-logs`
276- Active, describes the action you're taking
277 
278### 4. Cross-Referencing Other Skills
279 
280**When writing documentation that references other skills:**
281 
282Use skill name only, with explicit requirement markers:
283- ✅ Good: `**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development`
284- ✅ Good: `**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging`
285- ❌ Bad: `See skills/testing/test-driven-development` (unclear if required)
286- ❌ Bad: `@skills/testing/test-driven-development/SKILL.md` (force-loads, burns context)
287 
288**Why no @ links:** `@` syntax force-loads files immediately, consuming 200k+ context before you need them.
289 
290## Flowchart Usage
291 
292```dot
293digraph when_flowchart {
294    "Need to show information?" [shape=diamond];
295    "Decision where I might go wrong?" [shape=diamond];
296    "Use markdown" [shape=box];
297    "Small inline flowchart" [shape=box];
298 
299    "Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
300    "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
301    "Decision where I might go wrong?" -> "Use markdown" [label="no"];
302}
303```
304 
305**Use flowcharts ONLY for:**
306- Non-obvious decision points
307- Process loops where you might stop too early
308- "When to use A vs B" decisions
309 
310**Never use flowcharts for:**
311- Reference material → Tables, lists
312- Code examples → Markdown blocks
313- Linear instructions → Numbered lists
314- Labels without semantic meaning (step1, helper2)
315 
316See @graphviz-conventions.dot for graphviz style rules.
317 
318**Visualizing for your human partner:** Use `render-graphs.js` in this directory to render a skill's flowcharts to SVG:
319```bash
320./render-graphs.js ../some-skill           # Each diagram separately
321./render-graphs.js ../some-skill --combine # All diagrams in one SVG
322```
323 
324## Code Examples
325 
326**One excellent example beats many mediocre ones**
327 
328Choose most relevant language:
329- Testing techniques → TypeScript/JavaScript
330- System debugging → Shell/Python
331- Data processing → Python
332 
333**Good example:**
334- Complete and runnable
335- Well-commented explaining WHY
336- From real scenario
337- Shows pattern clearly
338- Ready to adapt (not generic template)
339 
340**Don't:**
341- Implement in 5+ languages
342- Create fill-in-the-blank templates
343- Write contrived examples
344 
345You're good at porting - one great example is enough.
346 
347## File Organization
348 
349### Self-Contained Skill
350```
351defense-in-depth/
352  SKILL.md    # Everything inline
353```
354When: All content fits, no heavy reference needed
355 
356### Skill with Reusable Tool
357```
358condition-based-waiting/
359  SKILL.md    # Overview + patterns
360  example.ts  # Working helpers to adapt
361```
362When: Tool is reusable code, not just narrative
363 
364### Skill with Heavy Reference
365```
366pptx/
367  SKILL.md       # Overview + workflows
368  pptxgenjs.md   # 600 lines API reference
369  ooxml.md       # 500 lines XML structure
370  scripts/       # Executable tools
371```
372When: Reference material too large for inline
373 
374## The Iron Law (Same as TDD)
375 
376```
377NO SKILL WITHOUT A FAILING TEST FIRST
378```
379 
380This applies to NEW skills AND EDITS to existing skills.
381 
382Write skill before testing? Delete it. Start over.
383Edit skill without testing? Same violation.
384 
385**No exceptions:**
386- Not for "simple additions"
387- Not for "just adding a section"
388- Not for "documentation updates"
389- Don't keep untested changes as "reference"
390- Don't "adapt" while running tests
391- Delete means delete
392 
393**REQUIRED BACKGROUND:** The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.
394 
395## Testing All Skill Types
396 
397Different skill types need different test approaches:
398 
399### Discipline-Enforcing Skills (rules/requirements)
400 
401**Examples:** TDD, verification-before-completion, designing-before-coding
402 
403**Test with:**
404- Academic questions: Do they understand the rules?
405- Pressure scenarios: Do they comply under stress?
406- Multiple pressures combined: time + sunk cost + exhaustion
407- Identify rationalizations and add explicit counters
408 
409**Success criteria:** Agent follows rule under maximum pressure
410 
411### Technique Skills (how-to guides)
412 
413**Examples:** condition-based-waiting, root-cause-tracing, defensive-programming
414 
415**Test with:**
416- Application scenarios: Can they apply the technique correctly?
417- Variation scenarios: Do they handle edge cases?
418- Missing information tests: Do instructions have gaps?
419 
420**Success criteria:** Agent successfully applies technique to new scenario
421 
422### Pattern Skills (mental models)
423 
424**Examples:** reducing-complexity, information-hiding concepts
425 
426**Test with:**
427- Recognition scenarios: Do they recognize when pattern applies?
428- Application scenarios: Can they use the mental model?
429- Counter-examples: Do they know when NOT to apply?
430 
431**Success criteria:** Agent correctly identifies when/how to apply pattern
432 
433### Reference Skills (documentation/APIs)
434 
435**Examples:** API documentation, command references, library guides
436 
437**Test with:**
438- Retrieval scenarios: Can they find the right information?
439- Application scenarios: Can they use what they found correctly?
440- Gap testing: Are common use cases covered?
441 
442**Success criteria:** Agent finds and correctly applies reference information
443 
444## Common Rationalizations for Skipping Testing
445 
446| Excuse | Reality |
447|--------|---------|
448| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
449| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
450| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
451| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
452| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
453| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
454| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
455| "No time to test" | Deploying untested skill wastes more time fixing it later. |
456 
457**All of these mean: Test before deploying. No exceptions.**
458 
459## Bulletproofing Skills Against Rationalization
460 
461Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
462 
463**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
464 
465### Close Every Loophole Explicitly
466 
467Don't just state the rule - forbid specific workarounds:
468 
469<Bad>
470```markdown
471Write code before test? Delete it.
472```
473</Bad>
474 
475<Good>
476```markdown
477Write code before test? Delete it. Start over.
478 
479**No exceptions:**
480- Don't keep it as "reference"
481- Don't "adapt" it while writing tests
482- Don't look at it
483- Delete means delete
484```
485</Good>
486 
487### Address "Spirit vs Letter" Arguments
488 
489Add foundational principle early:
490 
491```markdown
492**Violating the letter of the rules is violating the spirit of the rules.**
493```
494 
495This cuts off entire class of "I'm following the spirit" rationalizations.
496 
497### Build Rationalization Table
498 
499Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
500 
501```markdown
502| Excuse | Reality |
503|--------|---------|
504| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
505| "I'll test after" | Tests passing immediately prove nothing. |
506| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
507```
508 
509### Create Red Flags List
510 
511Make it easy for agents to self-check when rationalizing:
512 
513```markdown
514## Red Flags - STOP and Start Over
515 
516- Code before test
517- "I already manually tested it"
518- "Tests after achieve the same purpose"
519- "It's about spirit not ritual"
520- "This is different because..."
521 
522**All of these mean: Delete code. Start over with TDD.**
523```
524 
525### Update CSO for Violation Symptoms
526 
527Add to description: symptoms of when you're ABOUT to violate the rule:
528 
529```yaml
530description: use when implementing any feature or bugfix, before writing implementation code
531```
532 
533## RED-GREEN-REFACTOR for Skills
534 
535Follow the TDD cycle:
536 
537### RED: Write Failing Test (Baseline)
538 
539Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
540- What choices did they make?
541- What rationalizations did they use (verbatim)?
542- Which pressures triggered violations?
543 
544This is "watch the test fail" - you must see what agents naturally do before writing the skill.
545 
546### GREEN: Write Minimal Skill
547 
548Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
549 
550Run same scenarios WITH skill. Agent should now comply.
551 
552### REFACTOR: Close Loopholes
553 
554Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
555 
556**Testing methodology:** See @testing-skills-with-subagents.md for the complete testing methodology:
557- How to write pressure scenarios
558- Pressure types (time, sunk cost, authority, exhaustion)
559- Plugging holes systematically
560- Meta-testing techniques
561 
562## Anti-Patterns
563 
564### ❌ Narrative Example
565"In session 2025-10-03, we found empty projectDir caused..."
566**Why bad:** Too specific, not reusable
567 
568### ❌ Multi-Language Dilution
569example-js.js, example-py.py, example-go.go
570**Why bad:** Mediocre quality, maintenance burden
571 
572### ❌ Code in Flowcharts
573```dot
574step1 [label="import fs"];
575step2 [label="read file"];
576```
577**Why bad:** Can't copy-paste, hard to read
578 
579### ❌ Generic Labels
580helper1, helper2, step3, pattern4
581**Why bad:** Labels should have semantic meaning
582 
583## STOP: Before Moving to Next Skill
584 
585**After writing ANY skill, you MUST STOP and complete the deployment process.**
586 
587**Do NOT:**
588- Create multiple skills in batch without testing each
589- Move to next skill before current one is verified
590- Skip testing because "batching is more efficient"
591 
592**The deployment checklist below is MANDATORY for EACH skill.**
593 
594Deploying untested skills = deploying untested code. It's a violation of quality standards.
595 
596## Skill Creation Checklist (TDD Adapted)
597 
598**IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.**
599 
600**RED Phase - Write Failing Test:**
601- [ ] Create pressure scenarios (3+ combined pressures for discipline skills)
602- [ ] Run scenarios WITHOUT skill - document baseline behavior verbatim
603- [ ] Identify patterns in rationalizations/failures
604 
605**GREEN Phase - Write Minimal Skill:**
606- [ ] Name uses only letters, numbers, hyphens (no parentheses/special chars)
607- [ ] YAML frontmatter with required `name` and `description` fields (max 1024 chars; see [spec](https://agentskills.io/specification))
608- [ ] Description starts with "Use when..." and includes specific triggers/symptoms
609- [ ] Description written in third person
610- [ ] Keywords throughout for search (errors, symptoms, tools)
611- [ ] Clear overview with core principle
612- [ ] Address specific baseline failures identified in RED
613- [ ] Code inline OR link to separate file
614- [ ] One excellent example (not multi-language)
615- [ ] Run scenarios WITH skill - verify agents now comply
616 
617**REFACTOR Phase - Close Loopholes:**
618- [ ] Identify NEW rationalizations from testing
619- [ ] Add explicit counters (if discipline skill)
620- [ ] Build rationalization table from all test iterations
621- [ ] Create red flags list
622- [ ] Re-test until bulletproof
623 
624**Quality Checks:**
625- [ ] Small flowchart only if decision non-obvious
626- [ ] Quick reference table
627- [ ] Common mistakes section
628- [ ] No narrative storytelling
629- [ ] Supporting files only for tools or heavy reference
630 
631**Deployment:**
632- [ ] Commit skill to git and push to your fork (if configured)
633- [ ] Consider contributing back via PR (if broadly useful)
634 
635## Discovery Workflow
636 
637How future Claude finds your skill:
638 
6391. **Encounters problem** ("tests are flaky")
6403. **Finds SKILL** (description matches)
6414. **Scans overview** (is this relevant?)
6425. **Reads patterns** (quick reference table)
6436. **Loads example** (only when implementing)
644 
645**Optimize for this flow** - put searchable terms early and often.
646 
647## The Bottom Line
648 
649**Creating skills IS TDD for process documentation.**
650 
651Same Iron Law: No skill without failing test first.
652Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes).
653Same benefits: Better quality, fewer surprises, bulletproof results.
654 
655If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.
656

Writing Skills

SKILL.md

Preparing the source view

Writing Skills

SKILL.md