Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Create, read, edit, and manipulate Word (.docx) documents with formatting, tables, and tracked changes
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
SKILL.md
1---2name: docx3description: "Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation."4license: Proprietary. LICENSE.txt has complete terms5---67# DOCX creation, editing, and analysis89## Overview1011A .docx file is a ZIP archive containing XML files.1213## Quick Reference1415| Task | Approach |16|------|----------|17| Read/analyze content | `pandoc` or unpack for raw XML |18| Create new document | Use `docx-js` - see Creating New Documents below |19| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |2021### Converting .doc to .docx2223Legacy `.doc` files must be converted before editing:2425```bash26python scripts/office/soffice.py --headless --convert-to docx document.doc27```2829### Reading Content3031```bash32# Text extraction with tracked changes33pandoc --track-changes=all document.docx -o output.md3435# Raw XML access36python scripts/office/unpack.py document.docx unpacked/37```3839### Converting to Images4041```bash42python scripts/office/soffice.py --headless --convert-to pdf document.docx43pdftoppm -jpeg -r 150 document.pdf page44```4546### Accepting Tracked Changes4748To produce a clean document with all tracked changes accepted (requires LibreOffice):4950```bash51python scripts/accept_changes.py input.docx output.docx52```5354---5556## Creating New Documents5758Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`5960### Setup61```javascript62const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,63Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,64InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab,65PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader,66TabStopType, TabStopPosition, Column, SectionType,67TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,68VerticalAlign, PageNumber, PageBreak } = require('docx');6970const doc = new Document({ sections: [{ children: [/* content */] }] });71Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));72```7374### Validation75After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.76```bash77python scripts/office/validate.py doc.docx78```7980### Page Size8182```javascript83// CRITICAL: docx-js defaults to A4, not US Letter84// Always set page size explicitly for consistent results85sections: [{86properties: {87page: {88size: {89width: 12240, // 8.5 inches in DXA90height: 15840 // 11 inches in DXA91},92margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins93}94},95children: [/* content */]96}]97```9899**Common page sizes (DXA units, 1440 DXA = 1 inch):**100101| Paper | Width | Height | Content Width (1" margins) |102|-------|-------|--------|---------------------------|103| US Letter | 12,240 | 15,840 | 9,360 |104| A4 (default) | 11,906 | 16,838 | 9,026 |105106**Landscape orientation:** docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:107```javascript108size: {109width: 12240, // Pass SHORT edge as width110height: 15840, // Pass LONG edge as height111orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML112},113// Content width = 15840 - left margin - right margin (uses the long edge)114```115116### Styles (Override Built-in Headings)117118Use Arial as the default font (universally supported). Keep titles black for readability.119120```javascript121const doc = new Document({122styles: {123default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default124paragraphStyles: [125// IMPORTANT: Use exact IDs to override built-in styles126{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,127run: { size: 32, bold: true, font: "Arial" },128paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC129{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,130run: { size: 28, bold: true, font: "Arial" },131paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },132]133},134sections: [{135children: [136new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),137]138}]139});140```141142### Lists (NEVER use unicode bullets)143144```javascript145// ❌ WRONG - never manually insert bullet characters146new Paragraph({ children: [new TextRun("• Item")] }) // BAD147new Paragraph({ children: [new TextRun("\u2022 Item")] }) // BAD148149// ✅ CORRECT - use numbering config with LevelFormat.BULLET150const doc = new Document({151numbering: {152config: [153{ reference: "bullets",154levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,155style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },156{ reference: "numbers",157levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,158style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },159]160},161sections: [{162children: [163new Paragraph({ numbering: { reference: "bullets", level: 0 },164children: [new TextRun("Bullet item")] }),165new Paragraph({ numbering: { reference: "numbers", level: 0 },166children: [new TextRun("Numbered item")] }),167]168}]169});170171// ⚠️ Each reference creates INDEPENDENT numbering172// Same reference = continues (1,2,3 then 4,5,6)173// Different reference = restarts (1,2,3 then 1,2,3)174```175176### Tables177178**CRITICAL: Tables need dual widths** - set both `columnWidths` on the table AND `width` on each cell. Without both, tables render incorrectly on some platforms.179180```javascript181// CRITICAL: Always set table width for consistent rendering182// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds183const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };184const borders = { top: border, bottom: border, left: border, right: border };185186new Table({187width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)188columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)189rows: [190new TableRow({191children: [192new TableCell({193borders,194width: { size: 4680, type: WidthType.DXA }, // Also set on each cell195shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID196margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)197children: [new Paragraph({ children: [new TextRun("Cell")] })]198})199]200})201]202})203```204205**Table width calculation:**206207Always use `WidthType.DXA` — `WidthType.PERCENTAGE` breaks in Google Docs.208209```javascript210// Table width = sum of columnWidths = content width211// US Letter with 1" margins: 12240 - 2880 = 9360 DXA212width: { size: 9360, type: WidthType.DXA },213columnWidths: [7000, 2360] // Must sum to table width214```215216**Width rules:**217- **Always use `WidthType.DXA`** — never `WidthType.PERCENTAGE` (incompatible with Google Docs)218- Table width must equal the sum of `columnWidths`219- Cell `width` must match corresponding `columnWidth`220- Cell `margins` are internal padding - they reduce content area, not add to cell width221- For full-width tables: use content width (page width minus left and right margins)222223### Images224225```javascript226// CRITICAL: type parameter is REQUIRED227new Paragraph({228children: [new ImageRun({229type: "png", // Required: png, jpg, jpeg, gif, bmp, svg230data: fs.readFileSync("image.png"),231transformation: { width: 200, height: 150 },232altText: { title: "Title", description: "Desc", name: "Name" } // All three required233})]234})235```236237### Page Breaks238239```javascript240// CRITICAL: PageBreak must be inside a Paragraph241new Paragraph({ children: [new PageBreak()] })242243// Or use pageBreakBefore244new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })245```246247### Hyperlinks248249```javascript250// External link251new Paragraph({252children: [new ExternalHyperlink({253children: [new TextRun({ text: "Click here", style: "Hyperlink" })],254link: "https://example.com",255})]256})257258// Internal link (bookmark + reference)259// 1. Create bookmark at destination260new Paragraph({ heading: HeadingLevel.HEADING_1, children: [261new Bookmark({ id: "chapter1", children: [new TextRun("Chapter 1")] }),262]})263// 2. Link to it264new Paragraph({ children: [new InternalHyperlink({265children: [new TextRun({ text: "See Chapter 1", style: "Hyperlink" })],266anchor: "chapter1",267})]})268```269270### Footnotes271272```javascript273const doc = new Document({274footnotes: {2751: { children: [new Paragraph("Source: Annual Report 2024")] },2762: { children: [new Paragraph("See appendix for methodology")] },277},278sections: [{279children: [new Paragraph({280children: [281new TextRun("Revenue grew 15%"),282new FootnoteReferenceRun(1),283new TextRun(" using adjusted metrics"),284new FootnoteReferenceRun(2),285],286})]287}]288});289```290291### Tab Stops292293```javascript294// Right-align text on same line (e.g., date opposite a title)295new Paragraph({296children: [297new TextRun("Company Name"),298new TextRun("\tJanuary 2025"),299],300tabStops: [{ type: TabStopType.RIGHT, position: TabStopPosition.MAX }],301})302303// Dot leader (e.g., TOC-style)304new Paragraph({305children: [306new TextRun("Introduction"),307new TextRun({ children: [308new PositionalTab({309alignment: PositionalTabAlignment.RIGHT,310relativeTo: PositionalTabRelativeTo.MARGIN,311leader: PositionalTabLeader.DOT,312}),313"3",314]}),315],316})317```318319### Multi-Column Layouts320321```javascript322// Equal-width columns323sections: [{324properties: {325column: {326count: 2, // number of columns327space: 720, // gap between columns in DXA (720 = 0.5 inch)328equalWidth: true,329separate: true, // vertical line between columns330},331},332children: [/* content flows naturally across columns */]333}]334335// Custom-width columns (equalWidth must be false)336sections: [{337properties: {338column: {339equalWidth: false,340children: [341new Column({ width: 5400, space: 720 }),342new Column({ width: 3240 }),343],344},345},346children: [/* content */]347}]348```349350Force a column break with a new section using `type: SectionType.NEXT_COLUMN`.351352### Table of Contents353354```javascript355// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles356new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })357```358359### Headers/Footers360361```javascript362sections: [{363properties: {364page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch365},366headers: {367default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })368},369footers: {370default: new Footer({ children: [new Paragraph({371children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]372})] })373},374children: [/* content */]375}]376```377378### Critical Rules for docx-js379380- **Set page size explicitly** - docx-js defaults to A4; use US Letter (12240 x 15840 DXA) for US documents381- **Landscape: pass portrait dimensions** - docx-js swaps width/height internally; pass short edge as `width`, long edge as `height`, and set `orientation: PageOrientation.LANDSCAPE`382- **Never use `\n`** - use separate Paragraph elements383- **Never use unicode bullets** - use `LevelFormat.BULLET` with numbering config384- **PageBreak must be in Paragraph** - standalone creates invalid XML385- **ImageRun requires `type`** - always specify png/jpg/etc386- **Always set table `width` with DXA** - never use `WidthType.PERCENTAGE` (breaks in Google Docs)387- **Tables need dual widths** - `columnWidths` array AND cell `width`, both must match388- **Table width = sum of columnWidths** - for DXA, ensure they add up exactly389- **Always add cell margins** - use `margins: { top: 80, bottom: 80, left: 120, right: 120 }` for readable padding390- **Use `ShadingType.CLEAR`** - never SOLID for table shading391- **Never use tables as dividers/rules** - cells have minimum height and render as empty boxes (including in headers/footers); use `border: { bottom: { style: BorderStyle.SINGLE, size: 6, color: "2E75B6", space: 1 } }` on a Paragraph instead. For two-column footers, use tab stops (see Tab Stops section), not tables392- **TOC requires HeadingLevel only** - no custom styles on heading paragraphs393- **Override built-in styles** - use exact IDs: "Heading1", "Heading2", etc.394- **Include `outlineLevel`** - required for TOC (0 for H1, 1 for H2, etc.)395396---397398## Editing Existing Documents399400**Follow all 3 steps in order.**401402### Step 1: Unpack403```bash404python scripts/office/unpack.py document.docx unpacked/405```406Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (`“` etc.) so they survive editing. Use `--merge-runs false` to skip run merging.407408### Step 2: Edit XML409410Edit files in `unpacked/word/`. See XML Reference below for patterns.411412**Use "Claude" as the author** for tracked changes and comments, unless the user explicitly requests use of a different name.413414**Use the Edit tool directly for string replacement. Do not write Python scripts.** Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.415416**CRITICAL: Use smart quotes for new content.** When adding text with apostrophes or quotes, use XML entities to produce smart quotes:417```xml418<!-- Use these entities for professional typography -->419<w:t>Here’s a quote: “Hello”</w:t>420```421| Entity | Character |422|--------|-----------|423| `‘` | ‘ (left single) |424| `’` | ’ (right single / apostrophe) |425| `“` | “ (left double) |426| `”` | ” (right double) |427428**Adding comments:** Use `comment.py` to handle boilerplate across multiple XML files (text must be pre-escaped XML):429```bash430python scripts/comment.py unpacked/ 0 "Comment text with & and ’"431python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0432python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name433```434Then add markers to document.xml (see Comments in XML Reference).435436### Step 3: Pack437```bash438python scripts/office/pack.py unpacked/ output.docx --original document.docx439```440Validates with auto-repair, condenses XML, and creates DOCX. Use `--validate false` to skip.441442**Auto-repair will fix:**443- `durableId` >= 0x7FFFFFFF (regenerates valid ID)444- Missing `xml:space="preserve"` on `<w:t>` with whitespace445446**Auto-repair won't fix:**447- Malformed XML, invalid element nesting, missing relationships, schema violations448449### Common Pitfalls450451- **Replace entire `<w:r>` elements**: When adding tracked changes, replace the whole `<w:r>...</w:r>` block with `<w:del>...<w:ins>...` as siblings. Don't inject tracked change tags inside a run.452- **Preserve `<w:rPr>` formatting**: Copy the original run's `<w:rPr>` block into your tracked change runs to maintain bold, font size, etc.453454---455456## XML Reference457458### Schema Compliance459460- **Element order in `<w:pPr>`**: `<w:pStyle>`, `<w:numPr>`, `<w:spacing>`, `<w:ind>`, `<w:jc>`, `<w:rPr>` last461- **Whitespace**: Add `xml:space="preserve"` to `<w:t>` with leading/trailing spaces462- **RSIDs**: Must be 8-digit hex (e.g., `00AB1234`)463464### Tracked Changes465466**Insertion:**467```xml468<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">469<w:r><w:t>inserted text</w:t></w:r>470</w:ins>471```472473**Deletion:**474```xml475<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">476<w:r><w:delText>deleted text</w:delText></w:r>477</w:del>478```479480**Inside `<w:del>`**: Use `<w:delText>` instead of `<w:t>`, and `<w:delInstrText>` instead of `<w:instrText>`.481482**Minimal edits** - only mark what changes:483```xml484<!-- Change "30 days" to "60 days" -->485<w:r><w:t>The term is </w:t></w:r>486<w:del w:id="1" w:author="Claude" w:date="...">487<w:r><w:delText>30</w:delText></w:r>488</w:del>489<w:ins w:id="2" w:author="Claude" w:date="...">490<w:r><w:t>60</w:t></w:r>491</w:ins>492<w:r><w:t> days.</w:t></w:r>493```494495**Deleting entire paragraphs/list items** - when removing ALL content from a paragraph, also mark the paragraph mark as deleted so it merges with the next paragraph. Add `<w:del/>` inside `<w:pPr><w:rPr>`:496```xml497<w:p>498<w:pPr>499<w:numPr>...</w:numPr> <!-- list numbering if present -->500<w:rPr>501<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>502</w:rPr>503</w:pPr>504<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">505<w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>506</w:del>507</w:p>508```509Without the `<w:del/>` in `<w:pPr><w:rPr>`, accepting changes leaves an empty paragraph/list item.510511**Rejecting another author's insertion** - nest deletion inside their insertion:512```xml513<w:ins w:author="Jane" w:id="5">514<w:del w:author="Claude" w:id="10">515<w:r><w:delText>their inserted text</w:delText></w:r>516</w:del>517</w:ins>518```519520**Restoring another author's deletion** - add insertion after (don't modify their deletion):521```xml522<w:del w:author="Jane" w:id="5">523<w:r><w:delText>deleted text</w:delText></w:r>524</w:del>525<w:ins w:author="Claude" w:id="10">526<w:r><w:t>deleted text</w:t></w:r>527</w:ins>528```529530### Comments531532After running `comment.py` (see Step 2), add markers to document.xml. For replies, use `--parent` flag and nest markers inside the parent's.533534**CRITICAL: `<w:commentRangeStart>` and `<w:commentRangeEnd>` are siblings of `<w:r>`, never inside `<w:r>`.**535536```xml537<!-- Comment markers are direct children of w:p, never inside w:r -->538<w:commentRangeStart w:id="0"/>539<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">540<w:r><w:delText>deleted</w:delText></w:r>541</w:del>542<w:r><w:t> more text</w:t></w:r>543<w:commentRangeEnd w:id="0"/>544<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>545546<!-- Comment 0 with reply 1 nested inside -->547<w:commentRangeStart w:id="0"/>548<w:commentRangeStart w:id="1"/>549<w:r><w:t>text</w:t></w:r>550<w:commentRangeEnd w:id="1"/>551<w:commentRangeEnd w:id="0"/>552<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>553<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>554```555556### Images5575581. Add image file to `word/media/`5592. Add relationship to `word/_rels/document.xml.rels`:560```xml561<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>562```5633. Add content type to `[Content_Types].xml`:564```xml565<Default Extension="png" ContentType="image/png"/>566```5674. Reference in document.xml:568```xml569<w:drawing>570<wp:inline>571<wp:extent cx="914400" cy="914400"/> <!-- EMUs: 914400 = 1 inch -->572<a:graphic>573<a:graphicData uri=".../picture">574<pic:pic>575<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>576</pic:pic>577</a:graphicData>578</a:graphic>579</wp:inline>580</w:drawing>581```582583---584585## Dependencies586587- **pandoc**: Text extraction588- **docx**: `npm install -g docx` (new documents)589- **LibreOffice**: PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)590- **Poppler**: `pdftoppm` for images591