Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Creates and validates agent skills using Test-Driven Development — write test scenarios, baseline behavior, then the skill itself.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
SKILL.md
1---2name: writing-skills3description: Use when creating new skills, editing existing skills, or verifying skills work before deployment4---56# Writing Skills78## Overview910**Writing skills IS Test-Driven Development applied to process documentation.**1112**Personal skills live in agent-specific directories (`~/.claude/skills` for Claude Code, `~/.agents/skills/` for Codex)**1314You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).1516**Core principle:** If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.1718**REQUIRED BACKGROUND:** You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.1920**Official guidance:** For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.2122## What is a Skill?2324A **skill** is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.2526**Skills are:** Reusable techniques, patterns, tools, reference guides2728**Skills are NOT:** Narratives about how you solved a problem once2930## TDD Mapping for Skills3132| TDD Concept | Skill Creation |33|-------------|----------------|34| **Test case** | Pressure scenario with subagent |35| **Production code** | Skill document (SKILL.md) |36| **Test fails (RED)** | Agent violates rule without skill (baseline) |37| **Test passes (GREEN)** | Agent complies with skill present |38| **Refactor** | Close loopholes while maintaining compliance |39| **Write test first** | Run baseline scenario BEFORE writing skill |40| **Watch it fail** | Document exact rationalizations agent uses |41| **Minimal code** | Write skill addressing those specific violations |42| **Watch it pass** | Verify agent now complies |43| **Refactor cycle** | Find new rationalizations → plug → re-verify |4445The entire skill creation process follows RED-GREEN-REFACTOR.4647## When to Create a Skill4849**Create when:**50- Technique wasn't intuitively obvious to you51- You'd reference this again across projects52- Pattern applies broadly (not project-specific)53- Others would benefit5455**Don't create for:**56- One-off solutions57- Standard practices well-documented elsewhere58- Project-specific conventions (put in CLAUDE.md)59- Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls)6061## Skill Types6263### Technique64Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)6566### Pattern67Way of thinking about problems (flatten-with-flags, test-invariants)6869### Reference70API docs, syntax guides, tool documentation (office docs)7172## Directory Structure737475```76skills/77skill-name/78SKILL.md # Main reference (required)79supporting-file.* # Only if needed80```8182**Flat namespace** - all skills in one searchable namespace8384**Separate files for:**851. **Heavy reference** (100+ lines) - API docs, comprehensive syntax862. **Reusable tools** - Scripts, utilities, templates8788**Keep inline:**89- Principles and concepts90- Code patterns (< 50 lines)91- Everything else9293## SKILL.md Structure9495**Frontmatter (YAML):**96- Two required fields: `name` and `description` (see [agentskills.io/specification](https://agentskills.io/specification) for all supported fields)97- Max 1024 characters total98- `name`: Use letters, numbers, and hyphens only (no parentheses, special chars)99- `description`: Third-person, describes ONLY when to use (NOT what it does)100- Start with "Use when..." to focus on triggering conditions101- Include specific symptoms, situations, and contexts102- **NEVER summarize the skill's process or workflow** (see CSO section for why)103- Keep under 500 characters if possible104105```markdown106---107name: Skill-Name-With-Hyphens108description: Use when [specific triggering conditions and symptoms]109---110111# Skill Name112113## Overview114What is this? Core principle in 1-2 sentences.115116## When to Use117[Small inline flowchart IF decision non-obvious]118119Bullet list with SYMPTOMS and use cases120When NOT to use121122## Core Pattern (for techniques/patterns)123Before/after code comparison124125## Quick Reference126Table or bullets for scanning common operations127128## Implementation129Inline code for simple patterns130Link to file for heavy reference or reusable tools131132## Common Mistakes133What goes wrong + fixes134135## Real-World Impact (optional)136Concrete results137```138139140## Claude Search Optimization (CSO)141142**Critical for discovery:** Future Claude needs to FIND your skill143144### 1. Rich Description Field145146**Purpose:** Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"147148**Format:** Start with "Use when..." to focus on triggering conditions149150**CRITICAL: Description = When to Use, NOT What the Skill Does**151152The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.153154**Why this matters:** Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).155156When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.157158**The trap:** Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.159160```yaml161# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill162description: Use when executing plans - dispatches subagent per task with code review between tasks163164# ❌ BAD: Too much process detail165description: Use for TDD - write test first, watch it fail, write minimal code, refactor166167# ✅ GOOD: Just triggering conditions, no workflow summary168description: Use when executing implementation plans with independent tasks in the current session169170# ✅ GOOD: Triggering conditions only171description: Use when implementing any feature or bugfix, before writing implementation code172```173174**Content:**175- Use concrete triggers, symptoms, and situations that signal this skill applies176- Describe the *problem* (race conditions, inconsistent behavior) not *language-specific symptoms* (setTimeout, sleep)177- Keep triggers technology-agnostic unless the skill itself is technology-specific178- If skill is technology-specific, make that explicit in the trigger179- Write in third person (injected into system prompt)180- **NEVER summarize the skill's process or workflow**181182```yaml183# ❌ BAD: Too abstract, vague, doesn't include when to use184description: For async testing185186# ❌ BAD: First person187description: I can help you with async tests when they're flaky188189# ❌ BAD: Mentions technology but skill isn't specific to it190description: Use when tests use setTimeout/sleep and are flaky191192# ✅ GOOD: Starts with "Use when", describes problem, no workflow193description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently194195# ✅ GOOD: Technology-specific skill with explicit trigger196description: Use when using React Router and handling authentication redirects197```198199### 2. Keyword Coverage200201Use words Claude would search for:202- Error messages: "Hook timed out", "ENOTEMPTY", "race condition"203- Symptoms: "flaky", "hanging", "zombie", "pollution"204- Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"205- Tools: Actual commands, library names, file types206207### 3. Descriptive Naming208209**Use active voice, verb-first:**210- ✅ `creating-skills` not `skill-creation`211- ✅ `condition-based-waiting` not `async-test-helpers`212213### 4. Token Efficiency (Critical)214215**Problem:** getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.216217**Target word counts:**218- getting-started workflows: <150 words each219- Frequently-loaded skills: <200 words total220- Other skills: <500 words (still be concise)221222**Techniques:**223224**Move details to tool help:**225```bash226# ❌ BAD: Document all flags in SKILL.md227search-conversations supports --text, --both, --after DATE, --before DATE, --limit N228229# ✅ GOOD: Reference --help230search-conversations supports multiple modes and filters. Run --help for details.231```232233**Use cross-references:**234```markdown235# ❌ BAD: Repeat workflow details236When searching, dispatch subagent with template...237[20 lines of repeated instructions]238239# ✅ GOOD: Reference other skill240Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.241```242243**Compress examples:**244```markdown245# ❌ BAD: Verbose example (42 words)246your human partner: "How did we handle authentication errors in React Router before?"247You: I'll search past conversations for React Router authentication patterns.248[Dispatch subagent with search query: "React Router authentication error handling 401"]249250# ✅ GOOD: Minimal example (20 words)251Partner: "How did we handle auth errors in React Router?"252You: Searching...253[Dispatch subagent → synthesis]254```255256**Eliminate redundancy:**257- Don't repeat what's in cross-referenced skills258- Don't explain what's obvious from command259- Don't include multiple examples of same pattern260261**Verification:**262```bash263wc -w skills/path/SKILL.md264# getting-started workflows: aim for <150 each265# Other frequently-loaded: aim for <200 total266```267268**Name by what you DO or core insight:**269- ✅ `condition-based-waiting` > `async-test-helpers`270- ✅ `using-skills` not `skill-usage`271- ✅ `flatten-with-flags` > `data-structure-refactoring`272- ✅ `root-cause-tracing` > `debugging-techniques`273274**Gerunds (-ing) work well for processes:**275- `creating-skills`, `testing-skills`, `debugging-with-logs`276- Active, describes the action you're taking277278### 4. Cross-Referencing Other Skills279280**When writing documentation that references other skills:**281282Use skill name only, with explicit requirement markers:283- ✅ Good: `**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development`284- ✅ Good: `**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging`285- ❌ Bad: `See skills/testing/test-driven-development` (unclear if required)286- ❌ Bad: `@skills/testing/test-driven-development/SKILL.md` (force-loads, burns context)287288**Why no @ links:** `@` syntax force-loads files immediately, consuming 200k+ context before you need them.289290## Flowchart Usage291292```dot293digraph when_flowchart {294"Need to show information?" [shape=diamond];295"Decision where I might go wrong?" [shape=diamond];296"Use markdown" [shape=box];297"Small inline flowchart" [shape=box];298299"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];300"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];301"Decision where I might go wrong?" -> "Use markdown" [label="no"];302}303```304305**Use flowcharts ONLY for:**306- Non-obvious decision points307- Process loops where you might stop too early308- "When to use A vs B" decisions309310**Never use flowcharts for:**311- Reference material → Tables, lists312- Code examples → Markdown blocks313- Linear instructions → Numbered lists314- Labels without semantic meaning (step1, helper2)315316See @graphviz-conventions.dot for graphviz style rules.317318**Visualizing for your human partner:** Use `render-graphs.js` in this directory to render a skill's flowcharts to SVG:319```bash320./render-graphs.js ../some-skill # Each diagram separately321./render-graphs.js ../some-skill --combine # All diagrams in one SVG322```323324## Code Examples325326**One excellent example beats many mediocre ones**327328Choose most relevant language:329- Testing techniques → TypeScript/JavaScript330- System debugging → Shell/Python331- Data processing → Python332333**Good example:**334- Complete and runnable335- Well-commented explaining WHY336- From real scenario337- Shows pattern clearly338- Ready to adapt (not generic template)339340**Don't:**341- Implement in 5+ languages342- Create fill-in-the-blank templates343- Write contrived examples344345You're good at porting - one great example is enough.346347## File Organization348349### Self-Contained Skill350```351defense-in-depth/352SKILL.md # Everything inline353```354When: All content fits, no heavy reference needed355356### Skill with Reusable Tool357```358condition-based-waiting/359SKILL.md # Overview + patterns360example.ts # Working helpers to adapt361```362When: Tool is reusable code, not just narrative363364### Skill with Heavy Reference365```366pptx/367SKILL.md # Overview + workflows368pptxgenjs.md # 600 lines API reference369ooxml.md # 500 lines XML structure370scripts/ # Executable tools371```372When: Reference material too large for inline373374## The Iron Law (Same as TDD)375376```377NO SKILL WITHOUT A FAILING TEST FIRST378```379380This applies to NEW skills AND EDITS to existing skills.381382Write skill before testing? Delete it. Start over.383Edit skill without testing? Same violation.384385**No exceptions:**386- Not for "simple additions"387- Not for "just adding a section"388- Not for "documentation updates"389- Don't keep untested changes as "reference"390- Don't "adapt" while running tests391- Delete means delete392393**REQUIRED BACKGROUND:** The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.394395## Testing All Skill Types396397Different skill types need different test approaches:398399### Discipline-Enforcing Skills (rules/requirements)400401**Examples:** TDD, verification-before-completion, designing-before-coding402403**Test with:**404- Academic questions: Do they understand the rules?405- Pressure scenarios: Do they comply under stress?406- Multiple pressures combined: time + sunk cost + exhaustion407- Identify rationalizations and add explicit counters408409**Success criteria:** Agent follows rule under maximum pressure410411### Technique Skills (how-to guides)412413**Examples:** condition-based-waiting, root-cause-tracing, defensive-programming414415**Test with:**416- Application scenarios: Can they apply the technique correctly?417- Variation scenarios: Do they handle edge cases?418- Missing information tests: Do instructions have gaps?419420**Success criteria:** Agent successfully applies technique to new scenario421422### Pattern Skills (mental models)423424**Examples:** reducing-complexity, information-hiding concepts425426**Test with:**427- Recognition scenarios: Do they recognize when pattern applies?428- Application scenarios: Can they use the mental model?429- Counter-examples: Do they know when NOT to apply?430431**Success criteria:** Agent correctly identifies when/how to apply pattern432433### Reference Skills (documentation/APIs)434435**Examples:** API documentation, command references, library guides436437**Test with:**438- Retrieval scenarios: Can they find the right information?439- Application scenarios: Can they use what they found correctly?440- Gap testing: Are common use cases covered?441442**Success criteria:** Agent finds and correctly applies reference information443444## Common Rationalizations for Skipping Testing445446| Excuse | Reality |447|--------|---------|448| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |449| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |450| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |451| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |452| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |453| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |454| "Academic review is enough" | Reading ≠ using. Test application scenarios. |455| "No time to test" | Deploying untested skill wastes more time fixing it later. |456457**All of these mean: Test before deploying. No exceptions.**458459## Bulletproofing Skills Against Rationalization460461Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.462463**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.464465### Close Every Loophole Explicitly466467Don't just state the rule - forbid specific workarounds:468469<Bad>470```markdown471Write code before test? Delete it.472```473</Bad>474475<Good>476```markdown477Write code before test? Delete it. Start over.478479**No exceptions:**480- Don't keep it as "reference"481- Don't "adapt" it while writing tests482- Don't look at it483- Delete means delete484```485</Good>486487### Address "Spirit vs Letter" Arguments488489Add foundational principle early:490491```markdown492**Violating the letter of the rules is violating the spirit of the rules.**493```494495This cuts off entire class of "I'm following the spirit" rationalizations.496497### Build Rationalization Table498499Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:500501```markdown502| Excuse | Reality |503|--------|---------|504| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |505| "I'll test after" | Tests passing immediately prove nothing. |506| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |507```508509### Create Red Flags List510511Make it easy for agents to self-check when rationalizing:512513```markdown514## Red Flags - STOP and Start Over515516- Code before test517- "I already manually tested it"518- "Tests after achieve the same purpose"519- "It's about spirit not ritual"520- "This is different because..."521522**All of these mean: Delete code. Start over with TDD.**523```524525### Update CSO for Violation Symptoms526527Add to description: symptoms of when you're ABOUT to violate the rule:528529```yaml530description: use when implementing any feature or bugfix, before writing implementation code531```532533## RED-GREEN-REFACTOR for Skills534535Follow the TDD cycle:536537### RED: Write Failing Test (Baseline)538539Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:540- What choices did they make?541- What rationalizations did they use (verbatim)?542- Which pressures triggered violations?543544This is "watch the test fail" - you must see what agents naturally do before writing the skill.545546### GREEN: Write Minimal Skill547548Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.549550Run same scenarios WITH skill. Agent should now comply.551552### REFACTOR: Close Loopholes553554Agent found new rationalization? Add explicit counter. Re-test until bulletproof.555556**Testing methodology:** See @testing-skills-with-subagents.md for the complete testing methodology:557- How to write pressure scenarios558- Pressure types (time, sunk cost, authority, exhaustion)559- Plugging holes systematically560- Meta-testing techniques561562## Anti-Patterns563564### ❌ Narrative Example565"In session 2025-10-03, we found empty projectDir caused..."566**Why bad:** Too specific, not reusable567568### ❌ Multi-Language Dilution569example-js.js, example-py.py, example-go.go570**Why bad:** Mediocre quality, maintenance burden571572### ❌ Code in Flowcharts573```dot574step1 [label="import fs"];575step2 [label="read file"];576```577**Why bad:** Can't copy-paste, hard to read578579### ❌ Generic Labels580helper1, helper2, step3, pattern4581**Why bad:** Labels should have semantic meaning582583## STOP: Before Moving to Next Skill584585**After writing ANY skill, you MUST STOP and complete the deployment process.**586587**Do NOT:**588- Create multiple skills in batch without testing each589- Move to next skill before current one is verified590- Skip testing because "batching is more efficient"591592**The deployment checklist below is MANDATORY for EACH skill.**593594Deploying untested skills = deploying untested code. It's a violation of quality standards.595596## Skill Creation Checklist (TDD Adapted)597598**IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.**599600**RED Phase - Write Failing Test:**601- [ ] Create pressure scenarios (3+ combined pressures for discipline skills)602- [ ] Run scenarios WITHOUT skill - document baseline behavior verbatim603- [ ] Identify patterns in rationalizations/failures604605**GREEN Phase - Write Minimal Skill:**606- [ ] Name uses only letters, numbers, hyphens (no parentheses/special chars)607- [ ] YAML frontmatter with required `name` and `description` fields (max 1024 chars; see [spec](https://agentskills.io/specification))608- [ ] Description starts with "Use when..." and includes specific triggers/symptoms609- [ ] Description written in third person610- [ ] Keywords throughout for search (errors, symptoms, tools)611- [ ] Clear overview with core principle612- [ ] Address specific baseline failures identified in RED613- [ ] Code inline OR link to separate file614- [ ] One excellent example (not multi-language)615- [ ] Run scenarios WITH skill - verify agents now comply616617**REFACTOR Phase - Close Loopholes:**618- [ ] Identify NEW rationalizations from testing619- [ ] Add explicit counters (if discipline skill)620- [ ] Build rationalization table from all test iterations621- [ ] Create red flags list622- [ ] Re-test until bulletproof623624**Quality Checks:**625- [ ] Small flowchart only if decision non-obvious626- [ ] Quick reference table627- [ ] Common mistakes section628- [ ] No narrative storytelling629- [ ] Supporting files only for tools or heavy reference630631**Deployment:**632- [ ] Commit skill to git and push to your fork (if configured)633- [ ] Consider contributing back via PR (if broadly useful)634635## Discovery Workflow636637How future Claude finds your skill:6386391. **Encounters problem** ("tests are flaky")6403. **Finds SKILL** (description matches)6414. **Scans overview** (is this relevant?)6425. **Reads patterns** (quick reference table)6436. **Loads example** (only when implementing)644645**Optimize for this flow** - put searchable terms early and often.646647## The Bottom Line648649**Creating skills IS TDD for process documentation.**650651Same Iron Law: No skill without failing test first.652Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes).653Same benefits: Better quality, fewer surprises, bulletproof results.654655If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.656